ebook img

Discovering Global Patterns in Linguistic Networks through Spectral Analysis: A Case Study of the Consonant Inventories PDF

0.39 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Discovering Global Patterns in Linguistic Networks through Spectral Analysis: A Case Study of the Consonant Inventories

Discovering Global Patterns in Linguistic Networks through Spectral Analysis: A Case Study of the Consonant Inventories ∗ Animesh Mukherjee Indian InstituteofTechnology,Kharagpur [email protected] MonojitChoudhury and RaviKannan MicrosoftResearch India monojitc,kannan @microsoft.com { } 9 Abstract Most of the existing studies on linguistic 0 0 networks, however, focus only on the local Recent research has shown that language 2 structural properties such as the degree and and the socio-cognitive phenomena asso- n clustering coefficient of the nodes, and shortest a ciated with it can be aptly modeled and paths between pairs of nodes. On the other hand, J visualized through networks of linguistic 5 entities. However, most of the existing although it is a well known fact that the spectrum 1 of a network can provide important information works on linguistic networks focus only about its global structure, theuse of this powerful ] on the local properties of the networks. L mathematical machinery to infer global patterns This study is an attempt to analyze the C in linguistic networks is rarely found in the liter- structure of languages via a purely struc- . s ature. Note that spectral analysis, however, has tural technique, namely spectral analysis, c been successfully employed in the domains of bi- [ which isideally suited for discovering the ological and social networks (Farkasetal.,2001; 1 global correlations in a network. Appli- Gkantsidisetal.,2003; BanerjeeandJost,2007). v cation of this technique to PhoNet, the 6 In the context of linguistic networks, co-occurrence network of consonants, not 1 (BelkinandGoldsmith,2002) is the only work 2 onlyrevealsseveralnaturallinguisticprin- we are aware of that analyzes the eigenvectors 2 ciples governing the structure of the con- . to obtain a two dimensional visualize of the 1 sonantinventories,butisalsoabletoquan- 0 network. Nevertheless, the work does not study tify their relative importance. We believe 9 thespectrum ofthegraph. 0 that this powerful technique can be suc- : cessfully applied, in general, to study the The aim of the present work is to demon- v i structure ofnaturallanguages. strate the use of spectral analysis for discover- X ing the global patterns in linguistic networks. r 1 Introduction a These patterns, in turn, are then interpreted in Language and the associated socio-cognitive the light of existing linguistic theories to gather phenomena can be modeled as networks, where deeper insights into the nature of the under- the nodes correspond to linguistic entities and the lying linguistic phenomena. We apply this edges denote the pairwise interaction or relation- rather generic technique to find the principles shipbetweentheseentities. Thestudyoflinguistic that are responsible for shaping the consonant networks has been quite popular in the recent inventories, which is a well researched prob- timesandhasprovided uswithseveral interesting lem in phonology since 1931 (Trubetzkoy, 1931; insights into the nature of language (see Choud- LindblomandMaddieson, 1988; Boersma,1998; hury and Mukherjee (toappear) for an extensive Clements,2008). The analysis is carried out survey). Examples include study of the Word- on a network defined in (Mukherjee etal.,2007), Net (SigmanandCecchi,2002), syntactic depen- where the consonants are the nodes and there is dency network of words (Ferrer-i-Cancho, 2005) an edge between two nodes u and v if the con- and network of co-occurrence of consonants sonants corresponding to them co-occur in a lan- in sound inventories (Mukherjee etal.,2008; guage. The number of times they co-occur across Mukherjee etal.,2007). languages define the weight of the edge. We ex- ∗Thisresearchhasbeenconductedduringtheauthor’sin- plain the results obtained from the spectral analy- ternshipatMicrosoftResearchIndia. sisofthenetworkpost-facto usingthreelinguistic principles. Themethod alsoautomatically reveals isknownastheadjacencymatrix,issymmetricfor the quantitative importance of each of these prin- anundirected graphandhavebinaryentriesforan ciples. unweightedgraph. λisaneigenvalue ofAifthere It is worth mentioning here that earlier re- isann-dimensional vectorxsuchthat searchers have also noted the importance of the Ax = λx aforementioned principles. However, what was not known was how much importance one should AnyrealsymmetricmatrixAhasn(possiblynon- associate with each of these principles. We also distinct) eigenvalues λ0 λ1 ... λn−1, and ≤ ≤ ≤ notethatthetechnique ofspectralanalysisneither correspondingneigenvectorsthataremutuallyor- explicitlynorimplicitlyassumesthattheseprinci- thogonal. Thespectrumofagraphisthesetofthe plesexistorareimportant,butdeducesthemauto- distinct eigenvalues of the graph and their corre- matically. Thus, we believe that spectral analysis sponding multiplicities. It is usually represented is a promising approach that is well suited to the as a plot with the eigenvalues in x-axis and their discovery of linguistic principles underlying a set multiplicities plotted inthey-axis. of observations represented as a network of enti- The spectrum of real and random graphs dis- ties. The fact that the principles “discovered” in play several interesting properties. Banerjee and thisstudyarealreadywellestablishedresultsadds Jost(2007)reportthespectrumofseveralbiologi- to the credibility of the method. Spectral analysis cal networks that are significantly different from oflargelinguisticnetworksinthefuturecanpossi- the spectrum of artificially generated graphs2. blyrevealhitherto unknownuniversal principles. Spectral analysis is also closely related to Prin- The rest of the paper is organized as follows. cipal Component Analysis and Multidimensional Sec. 2 introduces the technique of spectral anal- Scaling. If the first few (say d) eigenvalues of a ysis of networks and illustrates some of its ap- matrix are muchhigher than the rest ofthe eigen- plications. The problem of consonant inventories values, then it can be concluded that the rows of and how itcanbe modeled and studied within the the matrix can be approximately represented as framework oflinguistic networks aredescribed in linear combinations of dorthogonal vectors. This Sec. 3. Sec. 4 presents the spectral analysis of further implies that the corresponding graph has the consonant co-occurrence network, the obser- a few motifs (subgraphs) that are repeated a large vations and interpretations. Sec. 5 concludes by number of time to obtain the global structure of summarizing the work and the contributions and thegraph(BanerjeeandJost,toappear). listingoutfutureresearchdirections. Spectral properties are representative of an n- dimensional average behavior of the underlying 2 A Primerto Spectral Analysis system, thereby providing considerable insight intoitsglobalorganization. Forexample,theprin- Spectral analysis1 is a powerful tool capable of cipaleigenvector(i.e.,theeigenvectorcorrespond- revealing the global structural patterns underly- ing to the largest eigenvalue) is the direction in ing an enormous and complicated environment which the sum of the square of the projections of interacting entities. Essentially, it refers to of the row vectors of the matrix is maximum. In the systematic study of the eigenvalues and the fact,theprincipaleigenvectorofagraphisusedto eigenvectors of the adjacency matrix of the net- compute the centrality of the nodes, which is also work of these interacting entities. Here we shall knownasPageRankinthecontextofWWW.Sim- brieflyreviewthebasicconceptsinvolvedinspec- ilarly, the second eigen vector component is used tral analysis and describe some of its applications forgraphclustering. (see (Chung,1994; KannanandVempala,2008) In the next twosections wedescribe how spec- fordetails). tral analysis can be applied to discover the orga- Anetworkoragraphconsisting ofnnodes(la- nizing principles underneath the structure of con- beledas1throughn)canberepresentedbyan n sonantinventories. × squarematrixA,wheretheentrya representsthe ij 2Banerjee and Jost (2007) report the spectrum of the weightoftheedgefromnodeitonodej. A,which graph’s Laplacian matrix rather than the adjacency matrix. It isincreasingly popular these days to analyze the spectral 1Thetermspectralanalysisisalsousedinthecontextof propertiesofthegraph’sLaplacianmatrix.However,forrea- signalprocessing,whereitreferstothestudyofthefrequency sonsexplainedlater,herewewillbeconductspectralanalysis spectrumofasignal. oftheadjacencymatrixratherthanitsLaplacian. Figure1: Illustration ofthenodesandedgesofPlaNetandPhoNetalongwiththeirrespectiveadjacency matrixrepresentations. 3 Consonant Co-occurrence Network languagenodev V (representing thelanguage l L ∈ l)andaconsonantnodev V (representing the c C The most basic unit of human languages are the ∈ consonant c) iff the consonant c is present in the speech sounds. The repertoire of sounds that inventoryofthelanguagel. Thisnetworkiscalled make up the sound inventory of a language are the Phoneme-Language Network or PlaNet and not chosen arbitrarily even though the speak- represent the connections between the language ers are capable of producing and perceiving a and the consonant nodes through a 0-1 matrix A plethora of them. In contrast, these invento- asshownbyahypotheticalexampleinFig.1. Fur- ries show exceptionally regular patterns across ther, in (Mukherjee etal.,2007), the authors de- the languages of the world, which is in fact, fine the Phoneme-Phoneme Network or PhoNet a common point of consensus in phonology. astheone-modeprojectionofPlaNetontothecon- th Right from the beginning of the 20 century, sonant nodes, i.e., a network G = hVC,Ecc′i, there have been a large number of linguisti- wherethenodesaretheconsonants andtwonodes cally motivated attempts (Trubetzkoy, 1969; vc andvc′ arelinkedbyanedgewithweightequal LindblomandMaddieson, 1988; Boersma,1998; to the number of languages in which both c and Clements,2008) to explain the formation c′ occur together. In other words, PhoNet can be of these patterns across the consonant in- expressed as a matrix B (see Fig. 1) such that ventories. More recently, Mukherjee and B = AAT D where D is a diagonal matrix his colleagues (Choudhury etal.,2006; − with its entries corresponding to the frequency of Mukherjee etal.,2007; Mukherjee etal.,2008) occurrence of the consonants. Similarly, we can studiedthisproblemintheframeworkofcomplex also construct the one-mode projection of PlaNet networks. Since here we shall conduct a spectral onto the language nodes (which we shall refer to analysis of the network defined in Mukherjee et as the Language-Language Graph or LangGraph) al. (2007), we briefly survey the models and the can be expressed as B′ = ATA D′, where D′ importantresultsoftheirwork. − isadiagonalmatrixwithitsentriescorresponding Choudhury et al. (2006) introduced a bipartite to the size of the consonant inventories for each networkmodelfortheconsonantinventories. For- language. mally,asetofconsonantinventoriesisrepresented asagraph G = V ,V ,E ,wherethenodes in The matrix A and hence, B and B′ have L C lc h i onepartitioncorrespondtothelanguages(V )and been constructed from the UCLA Phono- L thatintheotherpartitioncorrespond totheconso- logical Segment Inventory Database (UP- nants (V ). There is an edge (v , v ) between a SID) (Maddieson, 1984) that hosts the consonant C l c inventories of 317 languages with a total of scale. Someoftheimportantobservationsthatone 541 consonants found across them. Note that, canmakefromtheseresultsareasfollows. UPSID uses articulatory features to describe First,themajorbulkoftheeigenvaluesarecon- the consonants and assumes these features to be centrated at around 0. This indicates that though binary-valued, which in turn implies that every the order of B is 541 541, its numerical rank is × consonant can be represented by a binary vector. quite low. Second, there are at least a few very Later on, we shall use this representation for our large eigenvalues that dominate the entire spec- experiments. trum. In fact, 89% of the spectrum, or the square By construction, we have VL = 317, VC = of the Frobenius norm, is occupied by the princi- | | | | 541, Elc = 7022, and Ecc′ = 30412. Con- pal(i.e.,thetopmost)eigenvalue,92%isoccupied | | | | sequently, the order of the matrix A is 541 by the first and the second eigenvalues taken to- 317 and that of the matrix B′ is 541 gether, while 93% is occupied by the first three × × 541. It has been found that the degree distri- taken together. The individual contribution of the bution of both PlaNet and PhoNet roughly in- other eigenvalues to the spectrum is significantly dicate a power-law behavior with exponential lower than that of the top three. Third, the eigen- cut-offs towards the tail (Choudhury etal.,2006; valuesoneitherendsofthespectrumtendtodecay Mukherjee etal.,2007). Furthermore, PhoNet is gradually,mostlyindicatingapower-lawbehavior. also characterized by a very high clustering co- The power-law exponents at the positive and the efficient. The topological properties of the two negative ends are -1.33 (the R2 value of the fit is networks and the generative model explaining 0.98)and-0.88(R2 0.92)respectively. ∼ the emergence of these properties are summa- The numerically low rank of PhoNet suggests rizedin(Mukherjeeetal.,2008). However,allthe that there are certain prototypical structures that above properties are useful in characterizing the frequently repeatthemselvesacrosstheconsonant local patterns ofthe network and provide very lit- inventories, thereby, increasing the number of 0 tleinsightaboutitsglobalstructure. eigenvalues to a large extent. In other words, all therowsofthematrixB(i.e.,theinventories) can 4 Spectral AnalysisofPhoNet be expressed as the linear combination of a few In this section we describe the procedure and re- independent rowvectors,alsoknownasfactors. sults ofthespectral analysis ofPhoNet. Webegin Furthermore, the fact that the principal eigen- with computation of the spectrum of PhoNet. Af- valueconstitutes89%oftheFrobeniusnormofthe tertheanalysisofthespectrum,wesystematically spectrum implies that there exist one very strong investigate the top few eigenvectors of PhoNet organizing principle which should be able to ex- and attempt to characterize their linguistic signif- plainthebasicstructureoftheinventoriestoavery icance. In the process, we also analyze the corre- good extent. Since the second and third eigen- sponding eigenvectors of LanGraph that helps us values are also significantly larger than the rest incharacterizing theproperties oflanguages. of the eigenvalues, one should expect two other organizing principles, which along with the basic 4.1 SpectrumofPhoNet principle,shouldbeabletoexplain,(almost)com- Using a simple Matlab script we compute the pletely, the structure of the inventories. In order spectrum (i.e., the list of eignevalues along with to “discover” these principles, we now focus our their multiplicities) of the matrix B correspond- attention tothefirstthreeeigenvectors ofPhoNet. ing to PhoNet. Fig. 2(a) shows the spectral plot, which has been obtained through binning3 with a 4.2 TheFirstEigenvector ofPhoNet fixedbinsizeof20. Inordertohaveabettervisu- Fig. 2(d) shows the first eigenvector component alization ofthe spectrum, in Figs. 2(b) and (c) we for each consonant node versus its frequency of furtherplotthetop50(absolute)eigenvaluesfrom occurrenceacrossthelanguageinventories(i.e.,its thetwoendsofthespectrumversustheindexrep- degreeinPlaNet). Thefigureclearlyindicatesthat resenting their sorted order in doubly-logarithmic the two are highly correlated (r = 0.99), which in 3Binning istheprocess of dividing theentirerange of a turn means that 89% of the spectrum and hence, variable into smaller intervals and counting the number of the organization of the consonant inventories, can observationswithineachbinorinterval. Infixedbinning,all theintervalsareofthesamesize. be explained to a large extent by the occurrence Figure 2: Eigenvalues and eigenvectors of B. (a) Binned distribution of the eigenvalues (bin size =20) versustheirmultiplicities. (b)thetop50(absolute)eigenvaluesfromthepositiveendofthespectrumand their ranks. (c) Sameas(b) forthe negative end ofthespectrum. (d), (e) and (f)respectively represents thefirst,secondandthethirdeigenvectorcomponentsversustheoccurrencefrequencyoftheconsonants. frequency of the consonants. Thequestion arises: trix X with X = Cf f for some vector f = i,j i j ⊤ Doesthistellussomethingspecialaboutthestruc- (f ,f ,...f ) that represents the frequency of 1 2 n tureofPhoNetorisitalwaysthecaseforanysym- thenodes andanormalization constant C. Thisis metric matrix that the principal eigenvector will whatwerefertoas”proportionate co-occurrence” be highly correlated with the frequency? We as- because the extent of co-occurrence between the sert that the former is true, and indeed, the high nodes i and j (which is X or the weight of the i,j correlation between the principal eigenvector and edge between i and j) is exactly proportionate to the frequency indicates high “proportionate co- the frequencies of the two nodes. The principal occurrence” -atermwhichwewillexplain. eigenvector inthiscaseisf itself, andthus, corre- Toseethis,considerthefollowing2n 2nma- lates perfectly with the frequencies. Unlike this × trixX hypothetical matrix X, PhoNet has all 0 entries in the diagonal. Nevertheless, this perturbation, 0 M 0 0 0 ...  1  which is equivalent to subtracting f2 from the ith M 0 0 0 0 ... i 1 diagonal,seemstobesufficientlysmalltopreserve X =  0 0 0 M2 0 ...  the“proportionate co-occurrence” behavior ofthe    0 0 M2 0 0 ...  adjacencymatrixtherebyresultingintoahighcor-  .. .. .. .. .. ..   . . . . . .  relationbetweentheprincipaleigenvector compo- nentandthefrequencies. where X = X = M for all odd i,i+1 i+1,i (i+1)/2 On the other hand, to construct the Lapla- i and 0 elsewhere. Also, M1 > M2 > ... > cianmatrix,wewouldhavesubtracted f n f i j=1 j M 1. Essentially, this matrix represents a P n ≥ from the ith diagonal entry, which is a much graph which is a collection of n disconnected larger quantity than f2. In fact, this operation edges, each having weights M , M , and so on. i 1 2 would have completely destroyed the correlation It is easy to see that the principal eigenvector of between the frequency and the principal eigen- this matrix is (1/√2,1/√2,0,0,...,0)⊤, which vector component because the eigenvector corre- ofcourseisverydifferentfromthefrequencyvec- spondingtothesmallest4 eigenvalue oftheLapla- ⊤ tor: (M ,M ,M ,M ,...,M ,M ) . 1 1 2 2 n n At the other extreme, consider an n n ma- 4Theroleplayedbythetopeigenvaluesandeigenvectors × ⊤ cianmatrixis[1,1,...,1] . positive component of the second eigenvector as Sincethefirsteigenvector ofBisperfectly cor- MAX and the absolute maximum value of the + related with the frequency of occurrence of the negative component as MAX−. If the absolute consonants across languages it is reasonable to value of apositive component is less than 15% of argue that there is a universally observed innate MAX then assign a neutral class to the corre- + preference towards certain consonants. This pref- spondingconsonant; elseassignitapositiveclass. erence is often described through the linguistic Denote the set of consonants in the positive class concept of markedness, which in the context of by C . Similarly, if the absolute value of a nega- + phonology tells us that the substantive conditions tive component is less than 15% of MAX− then that underlie the human capacity of speech pro- assign aneutral class tothecorresponding conso- ductionandperception renderscertainconsonants nant;elseassignitanegativeclass. Denotetheset morefavorabletobeincludedintheinventorythan ofconsonants inthenegative classbyC−. some other consonants (Clements,2008). We ob- (iii) Using the above training set of the clas- serve that markedness plays avery important role sified consonants (represented as boolean fea- inshapingtheglobalstructureoftheconsonantin- ture vectors) learn a decision tree (C4.5 algo- ventories. Infact,ifwearrangetheconsonantsina rithm (Quinlan, 1993)) to determine the features non-increasing order of the first eigenvector com- that are responsible for the split of the medium ponents (which is equivalent to increasing order frequency zone into the negative and the positive of statistical markedness), and compare the set of classes. consonants present in an inventory of size s with Fig. 3(a) shows the decision rules learnt from that of the first s entries from this hierarchy, we the above training set. It is clear from these rules find that the two are, on an average, more than that the split into C− and C+ has taken place 50% similar. This figure is surprisingly high be- mainly based on whether the consonants have cause, in spite of the fact that s 541, on an ∀s ≪ 2 the combined “dental alveolar” feature (negative average s consonants in an inventory are drawn 2 class) or the “dental” and the “alveolar” features fromthefirstsentriesofthemarkednesshierarchy separately (positive class). Such a combined fea- (asmallset),whereastherest s aredrawnfromthe 2 tureisoftentermedambiguousanditspresencein remaining(541 s)entries(amuchlargerset). − a particular consonant c of a language l indicates Thehighdegreeofproportionateco-occurrence thatthespeakers oflareunabletomakeadistinc- in PhoNet implied by this high correlation be- tion as to whether c is articulated with the tongue tweentheprincipaleigenvectorandfrequencyfur- against the upper teeth or the alveolar ridge. In ther indicates that the innate preference towards contrast,ifthefeaturesarepresentseparately then certain phonemes is independent of the presence the speakers are capable of making this distinc- ofotherphonemesintheinventory ofalanguage. tion. In fact, through the following experiment, we find that the consonant inventories of almost 4.3 TheSecondEigenvector ofPhoNet allthelanguagesinUPSIDgetclassifiedbasedon Fig.2(e)showsthesecondeigenvectorcomponent whethertheypreservethisdistinction ornot. foreachnodeversustheiroccurrencefrequency. It ExperimentII isevidentfromthefigurethattheconsonantshave ′ T ′ (i) Construct B = A A – D (i.e., the adjacency been clustered into three groups. Those that have matrixofLangGraph). averyloworaveryhighfrequency clubaround0 ′ (ii) Compute the second eigenvector of B. Once whereas, the medium frequency zone has clearly again, the positive and the negative components split into twoparts. In order to investigate the ba- splitthelanguagesintotwodistinctgroupsL and + sisforthissplitwecarryoutthefollowingexperi- L− respectively. ment. (iii) For each language l L count the num- + ExperimentI ∈ ber of consonants in C that occur in l. Sum up + (i)Removeallconsonants whosefrequencyofoc- the counts for all the languages in L and nor- + currence acrosstheinventories isverylow(<5). malize this sum by L C . Similarly, perform + + (ii) Denote the absolute maximum value of the | || | thesamestepforthepairs(L+,C−),(L−,C+)and inthespectral analysisof theadjacencymatrixiscompara- (L−,C−). bletothatofthesmallesteigenvaluesandthecorresponding eigenvectorsoftheLaplacianmatrix(Chung,1994) Fromtheaboveexperiment,thevaluesobtained Figure 3: Decision rules obtained from the study of (a) the second, and (b) the third eigenvectors. The classification errorsforboth(a)and(b)arelessthan15%. for the pairs (i) (L+,C+), (L+,C−) are 0.35, 0.08 the consonants from C+ and C− in the languages respectively, and (ii) (L−,C+), (L−,C−) are 0.07, of L−. Therefore, it can be argued that the pres- 0.32 respectively. This immediately implies that enceoftheconsonants fromC− inalanguagecan almost all the languages in L preserve the den- (phonologically) implythepresence oftheconso- + tal/alveolar distinction whilethoseinL− donot. nantsfromC+,butnotviceversa. Wedonotfind anysuchaforementionedpatternforthefourthand 4.4 TheThirdEigenvector ofPhoNet thehighereigenvector components. We next investigate the relationship between the 4.5 ControlExperiment third eigenvector components of B and the occur- rencefrequencyoftheconsonants (Fig.2(f)). The Asacontrolexperimentwegeneratedasetofran- consonants are once again found to get clustered dom inventories and carried out the experiments into three groups, though not as clearly as in the I and II on the adjacency matrix, B , of the ran- R previouscase. Therefore,inordertodeterminethe dom version of PhoNet. We construct these in- basis of the split, we repeat experiments I and II. ventories as follows. Let the frequency of occur- Fig.3(b)clearlyindicatesthatinthiscasethecon- rence for each consonant c in UPSID be denoted sonants in C lack the complex features that are byf . Lettherebe317binseachcorresponding to + c considered difficult for articulation. On the other alanguageinUPSID.f binsarethenchosenuni- c hand, the consonants in C− are mostly composed formly at random and the consonant c is packed ofsuchcomplexfeatures. Thevaluesobtained for into these bins. Thus the consonant inventories the pairs (i) (L+,C+), (L+,C−) are 0.34, 0.06 re- of the 317 languages corresponding to the bins spectively, and (ii) (L−,C+), (L−,C−) are 0.19, are generated. Note that this method of inventory 0.18 respectively. This implies that while there is constructionleadstoproportionate co-occurrence. aprevalenceoftheconsonantsfromC inthelan- Consequently, thefirsteigenvector components of + guagesofL+,theconsonants fromC− arealmost BR are highly correlated to the occurrence fre- absent. However, there is an equal prevalence of quency of the consonants. However, the plots of the second and the third eigenvector components pologies that are predominant in this case con- versustheoccurrencefrequencyoftheconsonants sist of (a) languages using only those sounds that indicateabsolutely nopatternthereby, resulting in have simple features (e.g., plosives), and (b) lan- a large number of decision rules and very high guages using sounds with complex features (e.g., classification errors(upto50%). lateral,ejectives,andfricatives)thatautomatically imply the presence of the sounds having sim- 5 DiscussionandConclusion ple features. The distinction between the simple andcomplexphonological features isaverycom- Are there any linguistic inferences that can be monhypothesis underlying theimplicational hier- drawn from the results obtained through the archy and the corresponding typological classifi- study of the spectral plot and the eigenvectors of cation (Clements,2008). In this context, Locke PhoNet? In fact, one can correlate several phono- and Pearson (1992) remark that “Infants heavily logical theories to the aforementioned observa- favor stop consonants over fricatives, and there tions, which have been construed by the past re- arelanguages thathavestopsandnofricativesbut searchers throughveryspecificstudies. no languages that exemplify the reverse pattern. Oneofthemostimportantproblemsindefining [Such] ‘phonologically universal’ patterns, which a feature-based classificatory system is to decide cut across languages and speakers are, in fact, the when a sound in one language is different from phonetic properties of Homo sapiens.” (as quoted a similar sound in another language. According in(Valleeetal.,2002)). to Ladefoged (2005) “two sounds in different Therefore, it turns out that the methodology languages should be considered as distinct if we presented here essentially facilitates the induction can point to a third language in which the same of linguistic typologies. Indeed, spectral anal- twosounds distinguish words”. Thedental versus ysis derives, in a unified way, the importance alveolar distinction that we find to be highly in- of these principles and at the same time quanti- strumental in splitting the world’s languages into fies their applicability in explaining the structural two different groups (i.e., L+ and L− obtained patterns observed across the inventories. In this from the analysis of the second eigenvectors of ′ context, there are at least two other novelties of B and B) also has a strong classificatory basis. this work. The first novelty is in the systematic It may well be the case that certain categories of study of the spectral plots (i.e., the distribution of sounds like the dental and the alveolar sibilants the eigenvalues), which is in general rare for lin- are not sufficiently distinct to constitute a reli- guistic networks, although there have been quite able linguistic contrast (see (Ladefoged, 2005) a number of such studies in the domain of bi- for reference). Nevertheless, by allowing the ological and social networks (Farkasetal.,2001; possibility for the dental versus alveolar distinc- Gkantsidisetal.,2003; BanerjeeandJost,2007). tion, one does not increase the complexity or The second novelty is in the fact that there is introduce any redundancy in the classificatory not much work in the complex network literature system. This is because, such a distinction is thatinvestigatesthenatureoftheeigenvectorsand prevalent in many other sounds, some of which theirinteractions toinfertheorganizingprinciples are (a) nasals in Tamil (Shanmugam,1972) ofthesystemrepresented throughthenetwork. and Malayalam (Shanmugam,1972; LadefogedandMaddieson, 1996), (b) laterals To summarize, spectral analysis of the com- in Albanian (Ladefoged andMaddieson, 1996), plex network of speech sounds is able to provide and (c) stops in certain dialectal variations of a holistic as well as quantitative explanation of Swahili (Haywardetal.,1989). Therefore, it theorganizing principles ofthesoundinventories. is sensible to conclude that the two distinct Although this natural mathematical technique has groups L+ and L− induced by our algorithm are beenheavilyusedinvariousotherdomains,wedo true representatives of two important linguistic not know of any work that uses spectral analysis typologies. for induction and understanding of linguistic ty- The results obtained from the analysis of the pologies. This scheme for typology induction is ′ third eigenvectors of B and B indicate that im- notdependentonthespecificdatasetusedaslong plicational universals also play a crucial role in as it is representative of the real world. Thus, we determining linguistic typologies. The two ty- believethattheschemeintroducedherecanbeap- plied as a generic technique for typological clas- [Ferrer-i-Cancho2005] R. Ferrer-i-Cancho. 2005. The sifications of phonological, syntactic and seman- structureofsyntacticdependencynetworks:Insights fromrecentadvancesinnetworktheory. InLevickij tic networks; each of these are equally interesting V. and Altmman G., editors, Problems of quantita- from the perspective of understanding the struc- tivelinguistics,pages60–75. tureandevolutionofhumanlanguage,andaretop- [Gkantsidisetal.2003] C. Gkantsidis, M. Mihail, and icsoffutureresearch. E. Zegura. 2003. Spectral analysis of internet topologies. InINFOCOM’03,pages364–374. Acknowledgement [Haywardetal.1989] K.M.Hayward,Y.A.Omar,and We would like to thank Kalika Bali for her valu- M. Goesche. 1989. Dental and alveolar stops ableinputstowardsthelinguistic analysis. in Kimvita Swahili: An electropalatographicstudy. AfricanLanguagesandCultures,2(1):51–72. [KannanandVempala2008] R. Kannan and References S. Vempala. 2008. Spectral Al- gorithms. Course Lecture Notes: [BanerjeeandJost2007] A.BanerjeeandJ.Jost. 2007. http://www.cc.gatech.edu/˜vempala/spectral/spectral.pdf. Spectral plots and the representation and interpre- tation of biological data. Theory in Biosciences, [LadefogedandMaddieson1996] P. Ladefoged and 126(1):15–21. I. Maddieson. 1996. Sounds of the Worlds Languages. Oxford:Blackwell. [BanerjeeandJosttoappear] A.BanerjeeandJ.Jost. to appear. Graphspectra as a systematic toolin com- [Ladefoged2005] P.Ladefoged. 2005. Featuresandpa- putationalbiology. DiscreteAppliedMathematics. rametersfordifferentpurposes. InWorking Papers inPhonetics,volume104,pages1–13.Dept.ofLin- [BelkinandGoldsmith2002] M. Belkin and J. Gold- guistics,UCLA. smith. 2002. Using eigenvectors of the bigram [LindblomandMaddieson1988] B. Lindblom and graph to infer morpheme identity. In Proceed- I. Maddieson. 1988. Phoneticuniversalsin conso- ingsoftheACL-02WorkshoponMorphologicaland nant systems. In M. Hyman and C. N. Li, editors, Phonological Learning, pages 41–47. Association Language,Speech,andMind,pages62–78. forComputationalLinguistics. [LockeandPearson1992] J. L. Locke and D. M. Pear- [Boersma1998] P.Boersma. 1998. FunctionalPhonol- son. 1992. Vocal learning and the emergence of ogy. TheHague:HollandAcademicGraphics. phonologicalcapacity.Aneurobiologicalapproach. In Phonological development. Models, Research, [ChoudhuryandMukherjeetoappear] M. Choudhury Implications,pages91–129.YorkPress. and A. Mukherjee. to appear. The structure and dynamics of linguistic networks. In N. Ganguly, [Maddieson1984] I. Maddieson. 1984. Patterns of A. Deutsch, and A. Mukherjee, editors, Dynamics Sounds. CambridgeUniversityPress. on and of Complex Networks: Applications to Biology, Computer Science, Economics, and the [Mukherjeeetal.2007] A. Mukherjee, M. Choudhury, SocialSciences.Birkhauser. A. Basu, andN. Ganguly. 2007. Modelingthe co- occurrence principles of the consonant inventories: [Choudhuryetal.2006] M. Choudhury, A. Mukherjee, A complex network approach. Int. Jour. of Mod. A. Basu, and N. Ganguly. 2006. Analysis and Phys.C,18(2):281–295. synthesisofthedistributionofconsonantsoverlan- [Mukherjeeetal.2008] A. Mukherjee, M. Choudhury, guages:Acomplexnetworkapproach. InCOLING- A.Basu,andN.Ganguly. 2008. Modelingthestruc- ACL’06,pages128–135. ture and dynamics of the consonantinventories: A complexnetwork approach. In COLING-08, pages [Chung1994] F. R. K. Chung. 1994. Spectral Graph 601–608. Theory. Number2 in CBMS RegionalConference SeriesinMathematics.AmericanMathematicalSo- [Quinlan1993] J. R. Quinlan. 1993. C4.5: Programs ciety. forMachineLearning. MorganKaufmann. [Clements2008] G. N. Clements. 2008. The role of [Shanmugam1972] S. V. Shanmugam. 1972. Dental features in speech sound inventories. In E. Raimy andalveolarnasalsin Dravidian. In Bulletin ofthe andC. Cairns, editors, ContemporaryViewson Ar- SchoolofOrientalandAfricanStudies,volume35, chitectureandRepresentationsinPhonologicalThe- pages74–84.UniversityofLondon. ory.Cambridge,MA:MITPress. [SigmanandCecchi2002] M. Sigman and G. A. Cec- [Farkasetal.2001] E. J. Farkas, I. Derenyi, A. -L. chi. 2002. Globalorganizationofthe wordnetlex- Baraba´si,andT.Vicseck. 2001. Real-worldgraphs: icon. Proceedingsofthe NationalAcademyofSci- Beyondthesemi-circlelaw. Phy.Rev.E,64:026704. ence,99(3):1742–1747. [Trubetzkoy1931] N. Trubetzkoy. 1931. Die phonolo- gischensysteme. TCLP,4:96–116. [Trubetzkoy1969] N. Trubetzkoy. 1969. Principles of Phonology. University of California Press, Berke- ley. [Valleeetal.2002] N.Vallee,L.J.Boe,J.L.Schwartz, P. Badin, and C. Abry. 2002. The weight of pho- neticsubstanceinthestructureofsoundinventories. ZASPiL,28:145–168.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.