Evolving networks and the development of neural systems 0 1 Samuel Johnson, J. Marro and Joaqu´ın J. Torres 0 2 Departamento de Electromagnetismo y F´ısica de la Materia and Instituto Carlos I de F´ısica Teo´rica y Computacional, Facultad de Ciencias, Universidad de Granada, n a 18071 Granada, Spain. J E-mail: [email protected],[email protected] and [email protected] 7 2 Abstract. It is now generally assumed that the heterogeneity of most networks in ] nature probably arises via preferential attachment of some sort. However, the origin O of various other topological features, such as degree-degree correlations and related A characteristics, is often not clear and attributed to specific functional requirements. . n We show how it is possible to analyse a very general scenario in which nodes gain i l or lose edges according to any (e.g., nonlinear) functions of local and/or global n degree information. Applying our method to two rather different examples of brain [ development – synaptic pruning in humans and the neural network of the worm C. 3 Elegans – we find that simple biologically motivated assumptions lead to very good v 9 agreementwith experimentaldata. In particular,many nontrivialtopologicalfeatures 5 of the worm’s brain arise naturally at a critical point. 7 3 . 5 PACS numbers: 64.60.aq, 89.75.Fb, 87.85.dm, 05.40.-a 0 9 0 : v Submitted to: J. Stat. Mech. (accepted, 2010) i X r a Evolving networks and the development of neural systems 2 1. Introduction The conceptual simplicity of a network – a set of nodes, some pairs of which connected by edges – often suffices to capture the essence of cooperation in complex systems. Ever since Barab´asi and Albert presented their evolving network model [1], in which linear preferential attachment leads asymptotically to a scale-free degree distribution (the degree, k, of a node being its number of neighbouring nodes), there have been many variations or refinements to the original scenario [2, 3, 4, 5, 6, 7] (for a review, see [8]). In [9], we showed how topological phase transitions and scale-free solutions could emerge in the case of nonlinear rewiring in fixed-size networks. Now we extend our scope to more general and realistic situations, considering the evolution of networks making only minimal assumptions about the attachment/detachment rules. In fact, all we assume is that these probabilities factorize into two parts: a local term which depends on node degree, and a global term, which is a function of the mean degree of the network. Our motivation can be found in the mechanisms behind many real-world networks, but we focus, for the sake of illustration, on the development of biological neural networks, where nodes represent neurons and edges play the part of synaptic interaction [10, 11, 12]. Experimental neuroscience has shown that enhanced electric activity induces synaptic growth and dendritic arborization [13, 14, 15, 16]. Since the activity of a neuron depends on the net current received from its neighbours, which is higher the more neighbours it has, we can see node degree as a proxy for this activity – accounting for the local term alluded to above. On the other hand, synaptic growth and death also depend on concentrations of various molecules, which can diffuse through large areas of tissue and therefore cannot in general be considered local. A feature of brain development in many animals is synaptic pruning – the large reduction in synaptic density undergone throughout infancy. Chechik et al. [17, 18] have shown that via an elimination of less needed synapses, this can reduce the energy consumed by the brain (which in a human at rest can account for a quarter of total energy used) while maintaining near optimal memory performance. Going on this, we will take the mean degree of the network – or mean synaptic density – to reflect total energy consumption, hence the global terms in our attachment/detachment rules. An alternative approach would be to consider some kind of model neurons explicitely and couple the probabilities of synaptic growth and death to neuronal dynamic variables, such as local and global fields. In a Hopfield network, for example, the expected value of the field (total incoming current) at node i is proportional to i’s degree [19], the total current (energy consumption) in the network therefore being proportional to the mean degree; qualitatively, these observations are likely to hold also in more realistic situations [20], although relations need not be linear. Co-evolving networks ofthissortarecurrently attractingattention, withdynamics such asPrisoner’s Dilemma [21], Voter Model [22] or Random Walkers [23]. Although we consider this line of work particularly interesting, for generality and analytical tractability we opt here Evolving networks and the development of neural systems 3 to use only topological information for the attachment/detachment rules, although our results can be applied to any situation in which the dynamical states of the elements at the nodes can be functionally related to degrees‡. Following a brief general analysis, we show how appropriate choices of functions inducethesystemtoevolvetowardsheterogeneous(sometimesscale-free)networkswhile undergoing synaptic pruning in quantitative agreement with experiments. At the same time, degree-degree correlations emerge naturally, thus making the resulting networks disassortative – as tends to be the case for most biological networks – and leading to realistic small-world parameters. 2. Basic considerations Consider a simple undirected network with N nodes defined by the adjacency matrix aˆ, the element aˆ representing the existence or otherwise of an edge between nodes i ij and j. Each node can be characterized by its degree, k = aˆ . Initially, the degrees i j ij follow some distribution p(k,t = 0) with mean κ(t). We wish to study the evolution P of networks in which nodes can gain or lose edges according to stochastic rules which only take into account local and global information on degrees. So as to implement this in the most general way, we will assume that at every time step, each node has a gain probability of gaining a new edge, P , to a random node; and a probability of losing i lose gain a randomly chosen edge, P . We assume these factorize as P = u(κ)π(k ) and i i i lose P = d(κ)σ(k ), where u, d, π and σ can be arbitrary functions, but impose nothing i i else other than normalization. For each edge that is withdrawn from the network, two nodes decrease in degree: i, chosen according to σ(k ), and j, a random neighbour of i’s; so there is an added i effective probability of loss k /(κN). Similarly, for each edge placed in the network, j not only l chosen according to π(k ) increases its degree; a random node m will also l gain, with the consequent effective probability N−1 (though see§). Let us introduce the notation π˜(k) ≡ π(k) + N−1 and σ˜(k) ≡ σ(k) + k/(κN). Network evolution can now be seen as a one step process [24] with transition rates u(κ)π˜(k) and d(κ)σ˜(k). The expected value for the increment in a given p(k,t) at each time step (which we equate with a temporal derivative) defines a master equation for the degree distribution [9]: dp(k,t) = u(κ)π˜(k −1)p(k −1)+d(κ)σ˜(k +1)p(k +1) (1) dt −[u(κ)π˜(k)+d(κ)σ˜(k)]p(k,t). Assuming now that p(k,t) evolves towards a stationary distribution, p (k), then st this must necessarily satisfy detailed balance since it is a one step process [24]; i.e., the ‡ Forinstance,thestationarydistributionofwalkersusedforedgedynamicsin[23]isactuallyobtained purely from topological information, although it can only be written in terms of local degrees for undirected networks. § We are ignoring the small corrections that arise because j 6= i and l 6= m, which in any case would disappear if self-connections were allowed. Evolving networks and the development of neural systems 4 flux of probability from k to k + 1 must equal the flux from k + 1 to k, for all k [25]. This condition (sufficient for (1) to be zero) can be written as ∂p (k) u(κ ) π˜(k) st st = −1 p (k), (2) st ∂k d(κ )σ˜(k +1) st (cid:20) (cid:21) where we have substituted a difference for a partial derivative and κ ≡ kp (k). st st k Setting π andσ soastobenormalized toone(i.e., p(k)π(k) = p(k)σ(k) = 1, ∀t), k k P which isequivalent tosaying thatateachtimestepexactly u(κ)nodesarechosentogain P P edges and d(κ) to lose them, then in the stationary state we will have u(κ ) = d(κ ) st st since the total number of edges will be conserved. From (2) we can see that p (k) will st have anextremum atsomevaluek ifit satisfies π˜(k ) = σ˜(k +1). k will beamaximum e e e e (minimum) if the numerator in (2) is smaller (greater) than the denominator for k > k , e and viceversa for k < k . Assuming, for example, that there is one and only one such e k , then a maximum implies a relatively homogeneous distribution, while a minimum e means p (k) will be split in two, and therefore highly heterogeneous. More intuitively, st if for nodes with large enough k there is a higher probability of gaining edges than of losing them, the degrees of these nodes will grow indefinitely, leading to heterogeneity. If, on the other hand, highly connected nodes always lose more edges than they gain, we will obtain quite homogeneous networks. From this reasoning we can see that a particularly interesting case (which turns out to be critical) is that in which π(k) and σ(k) are such that π˜(k) = σ˜(k) ≡ v(k), ∀k. (3) According to (2), condition (3) means that for large k, ∂p (k)/∂k → 0, and p (k) st st flattens out – as for example a power-law does. The standard Fokker-Planck approximation for the one step process defined by (1) is [24]: ∂p(k,t) 1 ∂2 = {[d(κ)σ˜(k)+u(κ)π˜(k)]p(k,t)} ∂t 2∂k2 ∂ + {[d(κ)σ˜(k)−u(κ)π˜(k)]p(k,t)}. (4) ∂k For transition rates which meet condition (3), (4) can be written as: ∂p(k,t) 1 ∂2 = [d(κ)+u(κ)] [v(k)p(k,t)] ∂t 2 ∂k2 ∂ +[d(κ)−u(κ)] [v(k)p(k,t)]. (5) ∂k Ignoring boundary conditions, the stationary solution must satisfy, on the one hand, v(k)p (k) = Ak + B, so that the diffusion is stationary, and, on the other, u(κ ) = st st d(κ ), to cancel out the drift. For this situation to be reachable from any initial st condition, u(κ) and d(κ) must be monotonous functions, decreasing and increasing respectively. Evolving networks and the development of neural systems 5 3. Synaptic pruning As a simple example, we will first consider global probabilities which have the linear forms: n κ(t) n κ(t) u[κ(t)] = 1− and d[κ(t)] = , (6) N κmax N κmax (cid:18) (cid:19) where n is the expected value of the number of additions and deletions of edges per time step, and κmax is the maximum value the mean degree can have. This choice describes a situation in which the higher the density of synapses, the less likely new ones are to sprout and the more likely existing ones are to atrophy – a situation that might arise, for instance, in the presence of a finite quantity of nutrients. Again taking gain lose into account that π and σ are normalized to one, summing over P −P we find i i that the increment in κ(t) is dκ(t)/dt = 2{u[κ(t)]−d[κ(t)]} = 2(n/N)[1−2κ(t)/κmax] (independently of the local probabilities). Therefore, the mean degree will increase or decrease exponentially with time, from κ(0) to 1κmax. Assuming that the initial 2 conditionis, say, κ(0) = κmax, andexpressing thesolutionintermsofthemean synaptic density – i.e., ρ(t) ≡ κ(t)N/(2V), with V the total volume considered – we have −t/τp ρ(t) = ρ 1+e , (7) f (cid:16) (cid:17) where we have defined ρf ≡ ρ(t → ∞) and the time constant for pruning is τp = ρfN/n. This equation was fitted in figure 1 to experimental data on layers 1 and 2 of the human auditory cortexk obtained during autopsies by Huttenlocher and Dabholkar [26]. It seems reasonable to assume that the initial overgrowth of synapses is due to the transient existence of some kind of growth factors. If we account for these by including a nonlinear, time-dependent term g(t) ≡ aexp(−t/τg) in the probability of growth, i.e., u[κ(t),t] = (n/N)[1 − κ(t)/κmax + g(t)], leaving d[κ(t)] as before, we find that ρ(t) becomes −t−t0 ρ(t) = ρ 1+e−t/τp − 1+e−t0/τp e τg , (8) f (cid:20) (cid:21) (cid:16) (cid:17) where t is the time at which synapses begin to form (t = 0 corresponds to the moment 0 of conception) and τg is the time constant related to growth. The inset in figure 1 shows the best fit to the auditory cortex data. Since the contour conditions ρ and (for (8))f f t are simply taken as the value of the last data point and the time of the first one, in 0 each case, the time constants τp and τg are the only parameters needed for the fit. 4. Phase transitions The drift-like evolution of the mean degree we have just illustrated with the example of synaptic pruning is independent of the local probabilities π(k) and σ(k). The effect of these is rather in the diffusive behaviour which can lead, as mentioned, either to k Data points for three particular days (smaller symbols) are omitted from the fit, since we believe these must be from subjects with inherently lower synaptic density. Evolving networks and the development of neural systems 6 88880000 8888888800000000 Cortical layer 1 Cortical layer 2 7777777700000000 ρρρρ(t)(t)(t)(t) 33333333 mmmmmmmm µµµµµµµµ 6666666600000000 0000 s/s/s/s/s/s/s/s/ 111100000000 1111000000000000 11110000000000000000 eeeeeeee ssssssss pppppppp 5555555500000000 CCCCoooonnnncccceeeeppppttttuuuuaaaallll aaaaggggeeee ((((ddddaaaayyyyssss)))) aaaaaaaa nnnnnnnn yyyyyyyy SSSSSSSS ρρρρρρρρ(t): (t): (t): (t): (t): (t): (t): (t): 4444444400000000 3333333300000000 2222222200000000 00000000 55555555000000000000000000000000 1111111100000000000000000000000000000000 1111111155555555000000000000000000000000 2222222200000000000000000000000000000000 2222222255555555000000000000000000000000 CCCCCCCCoooooooonnnnnnnncccccccceeeeeeeeppppppppttttttttuuuuuuuuaaaaaaaallllllll aaaaaaaaggggggggeeeeeeee ((((((((ddddddddaaaaaaaayyyyyyyyssssssss)))))))) Figure 1. Synaptic densities in layers 1 (red squares) and 2 (black circles) of the human auditory cortex against time from conception. Data from [26], obtained by directly counting synapses in tissues from autopsies. Lines follow best fits to (7), where the parameters were: for layer 1, τp = 5041 days; and for layer 2, τp = 3898 days (for ρ we have used the last data pints: 30.7 and 40.8 synapses/µm3, for layers f 1 and 2 respectively). Data pertaining to the first year and to days 4700, 5000 7300, shown with smaller symbols, where omitted from the fit. Assuming the existence of transient growth factors, we can include the data points for the first year in the fit by using (8). This is done in the inset (where time is displayed logarithmically). The best fits were: for layer 1, τg = 151.0 and τp = 5221; and for layer 2, τg = 191.1 and τp = 4184, all in days (we have approximated t0 to the time of the first data points, 192 days). homogeneous or to heterogeneous final states. A useful bounded order parameter to characterize these phases is therefore m ≡ exp(−σ2/κ2), where σ2 = hk2i − κ2 is the variance of the degree distribution (h·i ≡ N−1 (·) represents an average over i nodes). Wewillusemst ≡ limt→∞m(t)todistinguishPbetweenthedifferentphases, since m = 1 for a regular network and m → 0 for one following a highly heterogeneous st st distribution. Although there are particular choices of probabilities which lead to (5), these are not the only critical cases, since the transition from homogeneous to heterogeneous stationary states can come about also with functions which never meet condition (3). Rather, this is a classic topological phase transition, the nature of which depends on the choice of functions [27, 28, 29] . Evolution of the degree distribution is shown in figure2 for critical and supercritical choices for the probabilities, as given by MC simulations (starting from regular random graphs) and contrasted with theory. (The subcritical regime is not shown since the stationary state has a distribution similar to the ones at t = 103 MCS in the other regimes.) The disparity between the theory and the simulations for the final distributions is due to the build up of certain correlations not taken into account in our Evolving networks and the development of neural systems 7 111111100000000000000 CCCCCCCrrrrrrriiiiiiitttttttiiiiiiicccccccaaaaaaalllllll 2200 111111100000000000000 SSSSSSSuuuuuuupppppppeeeeeeerrrrrrrcccccccrrrrrrriiiiiiitttttttiiiiiiicccccccaaaaaaalllllll 2200 κκ(t)(t) κκ(t)(t) 1100 1100 00 22..55 110033 00 22..55 110033 11 11 kkkkkkk-------2222222 m(t)m(t)m(t)m(t)m(t)m(t)m(t)m(t)m(t) m(t)m(t) p(k,t)p(k,t)p(k,t)p(k,t)p(k,t)p(k,t)p(k,t) 11111110000000-------2222222 00 00 55 110044 11111110000000-------2222222 00 00 55 110044 TTiimmee ((MMCCSS)) TTiimmee ((MMCCSS)) t=102 kkkkkkkkkkk-----------44444444444 t=102 t=103 t=103 11111110000000-------4444444 t=106 11111110000000-------4444444 t=105 111111100000000000000 111111100000001111111 111111100000002222222 111111100000003333333 111111100000000000000 111111100000001111111 111111100000002222222 111111100000003333333 kkkkkkk kkkkkkk Figure 2. Evolution of the degree distributions of networks beginning as regular randomgraphswithκ(0)=20inthecritical(top)andsupercritical(bottom)regimes. Local probabilities are σ(k) = k/(hkiN) in both cases, and π(k) = 2σ(k)−N−1 and π(k) = k3/2/(hk3/2iN) for the critical and supercritical ones, respectively. Global probabilities as in (6), with n = 10 and κmax = 20. Symbols in the main panels correspondto p(k,t) at different times as obtained from MC simulations. Lines result from numerical integration of (1). Insets show typical time series of κ and m. Light blue lines are from MC simulations and red lines are theoretical, givenby (7) and (1), respectively. N =1000. analysis. This is because the existence of some very highly connected nodes reduces the probability of there being very low degree nodes. In particular, if there are, say, x nodes connected to the rest of the network, then a natural cutoff, k = x, emerges. min Note that this occurs only when we restrict ourselves to simple networks, i.e., with only one edge allowed for each pair of nodes. This topological phase transition is shown in figure 3, where m is plotted against parameter α for global probabilities as in (6) and st local ones π(k) ∼ kα and σ(k) ∼ k. This situation corresponds to one in which edges are eliminated randomly while nodes have a power-law probability of sprouting new ones (note that power-laws are good descriptions of a variety of monotonous response functions, yet only require one parameter). Although, to our knowledge, there are not yet enough empirical data to ascertain what degree distribution the structural topology of the human brain follows, it is worth noting that its functional topology, at the level of brain areas, has been found to be scale-free with an exponent very close to 2 [30]. In general, most other measures can be expected to undergo a transition along with its variance. For instance, highly heterogeneous networks (such as scale-free ones) exhibit the small-world property, characterized by a high clustering coefficient, C ≫ hki/N, and a low mean minimum path, l ∼ ln(N) [31]. A particularly interesting topological feature of a network is its synchronizability – i.e., given a set of oscillators placed at the nodes and coupled via the edges, how wide a range of coupling strengths will result inthemall becoming synchronized. BarahonaandPecora showed analytically that, for linear oscillators, a network is more synchronizable the lower the relation Evolving networks and the development of neural systems 8 111111111 000000 N=1000 rrrrrr N=1500 ------111111 N=2000 000000......555555 111111 111111......555555 222222 ststststststststst αααααα mmmmmmmmm 22 55 1100 110033 QQ λλ NN ΝΝ QQ λλ 00 00 0000....5555 1111 1111....5555 2222 000000000 αααα 000000000.........555555555 111111111 111111111.........555555555 222222222 ααααααααα Figure 3. Phase transitions in mst for π(k) ∼ kα and σ(k) ∼ k, and u(κ) and d(κ) as in (6). N =1000 (blue squares), 1500 (red triangles) and 2000 (black circles); κ(0)=κmax =2n=N/50. Correspondinglinesarefromnumericalintegrationof(1). The bottom left inset shows values of the highest eigenvalue of the Laplacian matrix (red squares) and of Q = λN/λ2 (black circles), a measure of unsynchronizability; N =1000. The top right inset shows transitions for the same parameters in the final values of Pearson’s correlation coefficient r (see section 5), both for only one edge allowed per pair of nodes (red squares) and without this restriction (black circles). Q = λ /λ – where λ and λ are the highest and lowest non-zero eigenvalues of N 2 N 2 the Laplacian matrix (Λˆ ≡ δ k − aˆ ), respectively [32]. The bottom left inset in ij ij i ij figure 3 displays the values of Q and λ obtained for the different stationary states. N There is a peak in Q at the critical point. It has been argued that this tendency of heterogeneous topologies to be particularly unsynchronizable poses a paradox given the wide prevalence of scale-free networks in nature, a problem that has been deftly got around by considering appropriate weighting schemes for the edges [33, 34] (see also¶, and [36] for a review). However, there is no generic reason why high synchronizability should always be desirable. In fact, it has recently been shown that heterogeneity can improvethedynamical performanceofmodelneuralnetworksprecisely becausethefixed pointsareeasilydestabilised[37](aswellasconferringrobustnesstothermalfluctuations and improving storage capacity [19]). This makes intuitive sense, since, presumably, one would not usually want all the neurons inone’s brain tobe doing exactly the same thing. Therefore, this point of maximum unsynchronizability at the phase transition may be a particularly advantageous one. Onthewhole, wefindthatthreeclassesofnetwork–homogeneous, scale-free(atthe critical point) and ones composed of starlike structures, with a great many small-degree ¶ Usingpacemakernodes,scale-freenetworkshavealsobeenshowntoemergeviaruleswhichmaximize synchrony [35]. Evolving networks and the development of neural systems 9 nodes connected to a few hubs – can emerge for any kind of attachment/detachment rules. It follows that a network subject to some sort of optimising mechanism, such as Natural Selection for the case of living systems, could thus evolve towards whichever topology best suits its requirements by tuning these microscopic actions. 5. Correlations Most real networks have been found to exhibit degree-degree correlations, also known as mixing by degree [38, 39]. They can thus be classified as assortative, when the degree of a typical node is positively correlated with that of its neighbours, or disassortative, when the correlation is negative. This property has important implications for network characteristics such as connectedness and robustness [40, 41]. A useful measure of this phenomenon is Pearson’s correlation coefficient applied to the edges [39, 41, 8]: r = ([k k′]−[k ]2)/([k2]−[k ]2), where k and k′ are the degrees of each of the two nodes l l l l l l l pertaining to edge l, and [·] ≡ (hkiN)−1 (·) represents an average over edges; |r| ≤ 1. l Writing (·) = aˆ (·), r can be expressed in terms of averages over nodes: l ij ij P P Phkihk2knn(k)i−hk2i2 r = , (9) hkihk3i−hk2i2 −1 where k (k) is the mean nearest-neighbour-degree function; i.e., if k ≡ k aˆ k nn nn,i i j ij j is the mean degree of the neighbours of node i, k (k) is its average over all nodes such nn P that k = k. Whereas most social networks are assortative (r > 0) – due, probably, i to mechanisms such as homophily [39] – almost all other networks, whether biological, technological or information-related, seem to be generically disassortative. The top right inset in figure 3 displays the stationary value of r obtained in the same networks as in the main panel and lower inset. It turns out that the heterogeneous regime is disassortative, the more so the larger α. (Note that a completely homogeneous network cannot have degree-degree correlations, since all degrees are the same.) It is known that the restriction of having at most one edge per pair of nodes induces disassortativity [42, 43]. However, in our case this is not the sole origin of the correlations, as can also be seen in the same inset of figure 3, where we have plotted r for networks in which we have lifted the restriction and allowed any number of edges per pair of nodes. In fact, when multiple edges are allowed, the correlations are slightly stronger. To understand how these correlations come about, consider a pair of nodes (i,j), which, at a given moment, can either be occupied by an edge or unoccupied. We will o u call the expected times of permanence for occupied and unoccupied states τ and τ , ij ij respectively. After sufficient evolution time (so that occupancy becomes independent of the initial state+), the expected value of the corresponding element of the adjacency matrix, E(aˆ ) ≡ ˆǫ , will be ij ij o τ ij ˆǫ = . ij o u τ +τ ij ij + Note that this will always happen eventually since the process is ergodic. Evolving networks and the development of neural systems 10 + − If p (p ) is the probability that (i,j) will become occupied (unoccupied) given that it ij ij o − u + is unoccupied (occupied), then τ ∼ 1/p and τ ∼ 1/p , yielding ij ij ij ij − −1 p ij ǫˆ = 1+ . ij + p ij! Taking into account the probability that each node has of gaining or losing an edge, we obtain∗: p+ = u(hki)N−1[π(k ) + π(k )] and p− = d(hki)[σ(k )/k + σ(k )/k ]. ij i j ij i i j j − + Then, assuming that the network is sparse enough that p ≫ p (since the number ij ij of edges is much smaller than the number of pairs), and particularising for power-law local probabilities π(k) ∼ kα and σ(k) ∼ kβ, the expected occupancy of the pair is p+ u(hki) hkβi kα +kα ij i j ˆǫ ≃ = . ij p−ij d(hki)hkαiN kiβ−1 +kjβ−1! Considering the stationary state, when u(hki) = d(hki), and for the case of random deletion of edges, β = 1 (so that the only nonlinearity is due to α), the previous expression reduces to hki ˆǫ ≃ kα +kα . (10) ij 2hkαiN i j (cid:0) (cid:1) (Note that this matrix is not consistent term by term, since ǫˆ 6= k , although it is j ij i globally consinstent: ǫˆ = hkiN.) The nearest-neighbour-degree function is now ij ij P P 1 hki k (k ) = ˆǫ k = (hkikα−1 +hkα+1ik−1) nn i k ij j 2hkαi i i i j X (a decreasing function for any α), with the result that Pearson’s coefficient becomes 1 hki3hkα+1i−hk2i2hkαi r = . (11) hkαi hkihk3i−hk2i2 (cid:18) (cid:19) More generally, one can understand the emergence of these correlations in the following way. For the network to become heterogeneous, we must have π(k)+N−1 ≥ σ(k) + k/(hkiN) for large enough k, so that highly connected nodes do not lose more edges than they can acquire (see section 2). This implies that π(k) must be increasing and approximately linear or superlinear. The expected value of the degree of a node i, chosen according to π(k ), is then E(k ) = N−1 π(k)k & hk2i/hki, while that of its i i k new, randomly chosen neighbour, j, is only E(k ) = hki. This induces disassortative Pj correlations which can never be compensated by the breaking of edges between nodes whose expected degree values are N−1 σ(k)k and hk2i/hki if σ(k) is an increasing k function. It thus ensues that a scenario such as the one analysed in this paper will P never lead to assortative networks except for some cases in which σ(k) is a decreasing function – meaning that less connected nodes should be more likely to lose edges. ∗ Again, we are ignoring corrections due to the fact that i is necessarily different from j.