Working Paper Series Department of Economics University of Verona Measuring the interaction dimension of segregation: the Gini- Exposure index Francesco Andreoli, Claudio Zoli WP Number: 30 December 2015 ISSN: 2036-2919 (paper), 2036-4679 (online) Measuring the interaction dimension of segregation: the Gini-Exposure index∗ Francesco Andreoli, Claudio Zoli† Luxembourg Institute of Socio-Economic Research and University of Verona December 2015 ∗This paper benefitted from the comments and suggestions by Eugenio Peluso, Alain Trannoy, Dirk Van de gaer and participants to the IT10 Winter School (Canazei 2015) and PET Conference (Luxembourg2015). The financialsupport from the Fonds National de la Recherche Luxembourg (AFR grant 5932132) is kindly acknowledged by Francesco Andreoli. †Contact details: F. Andreoli is at Luxembourg Institute of Socio-Economic Re- search (LISER), 11 Porte des Sciences, L4366 Esch-sur-Alzette, Luxembourg. E-mail: [email protected]. C. Zoli is at University of Verona, Department of Eco- nomics, Via dell’Artigliere 19, 37129 Verona, Italy. E-mail: [email protected]. 1 Abstract We study the heterogeneity of social interaction profiles among individuals and define the extent of the interaction dimension of seg- regation. An interaction profile quantifies the probabilities that one individual has to interact with different social groups. It can be in- ferred, for instance, from observation of social ties through networks data. Heterogeneity is minimal if everybody exhibit the same pro- file, and is maximal if everybody interacts with only one group. All the in-between configurations can be ordered on the bases of an intu- itiveprinciplebasedonoperationthatgeneratemixturesofinteraction profiles. We proposes a characterization of the Gini-exposure index to assessheterogeneityininteractionpattersinasociety. Onekeyadvan- tageofthisindexisthatoverallheterogeneitycanbedecomposedinto the segregation experienced by every individual with respect to other people in his own group (isolation) or in other groups (exposure). An preliminary empirical investigation of interaction patterns of natives and immigrants across Italian municipalities reveals connections and differences with other exposure measures. Keywords: Interaction, segregation, dissimilarity, Gini index. JEL Codes: J71, D31, D63, C16 2 1 Introduction In their seminal analysis of segregation measures, Massey and Denton (1988) define five dimensions of analysis for residential segregation: evenness, expo- sure, clustering, centralization and concentration. The measurement of these phenomena requires to partition the population into groups and to know the distribution of these groups across organizational units, such as neigh- borhoods (Reardon and O’Sullivan 2004, Cutler and Glaeser 1997), school assignment (Frankel and Volij 2011, Echenique, Fryer and Kaufman 2006) or job types (Flu¨ckiger and Silber 1999, Hutchens 1991, Hutchens 2004). We focus on multi-group measures of segregation in the exposure dimension that can be used to assess segregation in networks. This paper is interested in the distributional information that can be re- tained from a network, and not on the network’s structure itself. We consider interaction profiles, thatcorrespondtovectorsofprobabilitiesthateveryunit in a network has to interact with each of the groups that compose the society. We contribute to the literature on segregation measurement by proposing a multi-group index of segregation, the Gini Exposure index, which measures segregation in a network as a form of inequality in the distribution of inter- action profiles. Following Massey and Denton (1988), exposure measures should capture the differences across groups in the likelihood that any randomly selected individual from one of these groups interacts with a person/unit from his own group or from another group. Segregation is zero when the chances that any two randomly selected individuals interact are made independent on their respective groups of origin. On the contrary, segregation is maximized whenever every individual interactions are limited to the members of the same group. Segregationmeasurement(MasseyandDenton1988,ReardonandFirebaugh 2002, Reardon and O’Sullivan 2004, Frankel and Volij 2011) has mainly fo- cused on the rankings produced by segregation indices for populations parti- tioned into two or many groups. None of these indices, however, has been de- signed to deal with problems of segregation that use individual level data and the axiomatization of these indices, where it exists (see for instance Hutchens 1991, Flu¨ckiger and Silber 1999, Reardon and O’Sullivan 2004, Frankel and Volij 2011), cannot be meaningfully adapted to capture segregation patterns across individuals interaction profiles. We fill this gap by proposing a framework to study segregation at individ- 3 ual level, conceptualized as a form of inequality in interaction profiles. We provide an axiomatic characterization of the Gini Exposure index that gener- alizes to the multi-group case the traditional Gini index of segregation. The index can be interpreted as the Gini volume index discussed in the multivari- ate inequality measurement literature (Koshevoy and Mosler 1996, Koshevoy and Mosler 1997, Arnold 2005). We also provide a decomposition result il- lustrating how the Gini Exposure index can be used to keep track of changes in group or individual specific patterns of segregation within the network. The axiomatic characterization of the Gini Exposure index is mainly based on operations defined on interaction profiles that preserve or decrease segregation. Our analysis is grounded on a simple principle: when the num- ber of units equals the number of the groups, if a portion of a unit interac- tion profile is merged with another unit, this mixture operation should not increase the segregation, and indeed should reduce it in proportion of the quota of the initial unit that is merged. The following example clarifies this point. Consider, for instance, a large population that can be partitioned into two group of equal size, the “Reds” and “Greens”, and interaction profiles can be inferred from network data. If everyindividualinteractswithhalfoftheremainingindividuals, thedegreeof segregation depends exclusively on how different types of individuals interact among them. Two possible configurations are of particular interest. In the first configuration, each individual interacts with half of the Reds and half of the Greens. In this case the population is made of all homogeneous units that exhibit the same interaction profile, so there is no segregation. In a second configuration, every individual of the Reds interacts with all the Reds and exclusively with them, and analogously for the Greens. In this case we can consider the population as composed by two units that collect all the individuals that interact with a specific groups. Admittedly this is an highly segregated distribution. If a proportion 1 − α of the unit of Reds, is joining the unit of Greens and shares proportionally its interaction links then segregation is reduced in the proportion 1 − α. In fact as α tends to 1 the overall segregation should be eliminated because all the individuals will share the same average interaction profile. Our main result will show that this property will play a crucial role for the characterization of the Gini Exposure index for a large class of distribution matrices representing interactions profiles. Alternative indicators have been proposed and adopted in the literature to measure the exposure dimension of segregation (see Hutchens 1991, Silber 4 1989, Flu¨ckiger and Silber 1999, Reardon and Firebaugh 2002). An extensive qualitative comparisons of these indicators with the Gini Exposure index on the base of the properties that they satisfy is not possible. We construct a quantitative analysis to recover empirical correlations in the rankings pro- duce by different indices of exposure proposed in the literature and the Gini Exposure index. The closer is this correlation to zero, the more likely it is that the two indices capture very different segregation patterns underlying the data. We make use of Italian data by ISTAT to study the degree of spatial segregation of immigrant groups across municipalities in Italy. We use a spa- tial model to identify interaction probabilities across Italian municipalities (nearly 8400), for each of the Italian provinces separately (101 provinces are considered in this study) in an interval of eight years (from 2003 to 2010). Our main assumption is that the chances for two individuals to interact de- crease with the spatial distance between the area where the two individuals reside. We consider segregation among three groups: the groups of immi- grants coming from low HDI and high HDI countries and the natives group.1 The empirical analysis reveals two broad categories of indicators: the indicators measuring the overall dissimilarity in interaction profiles and the entropy indicators, measuring how far profiles are from their average. The Gini Exposure index is mostly rank correlated with the dissimilarity-type indicators, and this correlation is fairly robust to the demographic variability of the data. 2 Notation In this paper, we consider the problem of ranking configurations A,B ∈ C(G) according to the level of segregation in the exposure dimensions that they exhibit. Definition 1 (Configuration) A configuration A ∈ C(G) is a triplet N(A), G, ((π (A)) ,ξ (A)) gi g∈G i i∈N(A) h i where N(A) is a finite, nonempty set of units of cardinality N(A), G is a finite, nonempty set of G population groups, with variable demographic size 1In this setting, we treat municipalities as the basic units of our analysis. 5 denoted by N (A). For each unit i ∈ N(A) and group g ∈ G, the variable g π ∈ [0,1] denotes the probability that i interacts with a randomly selected gi individual from group g. Unit i’s demographic weight is denoted by ξ (A), i with ξ (A) = 1.2 i∈N(A) i Toavoidcumbersomenotation, referencestotheconfigurationAaredropped P in what follows, unless disambiguation is needed. Thus, we denoted π , N, ig N , N and ξ for configuration A ∈ C(G). g i A configuration can be constructed, for instance, from empirical observa- tionofthesocialconnectionsbetweenindividuals,orfromaggregatestatistics ofexpectedinteractionpatters. ForaconfigurationA ∈ C(G),theinteraction profile of i ∈ N is a column vector: π := (π ,...,π )t ∈ [0,1]G, .i 1i Gi such that π = 1 for any i ∈ N. Hence, π represents the social ties of g∈G gi .i unit i in terms of the probabilities that the individuals associated with this P unit have to interact with members of each of the groups in G. The G×N interaction matrix π represents a collection of the N interaction profiles (by column). The rows of the interaction matrix are denoted group profiles and are indicated with row vectors π := (π ,...,π ) ∈ [0,1]N. g. g1 gN The expected interaction profile associated with group g is the expected probability that a randomly drawn individual interacts with group g: πe(A) = ξ (A)π (A). g i gi i∈N(A) X Again, we write πe in shorthand notation. For configuration A, we make g use of expected interaction profiles to normalize the entries of the interaction matrix π. This leads to define a G×N interaction matrix A (always denoted with boldface letters) such that A := (a ,...,a ,...a ) where a := πgi(A). 1 i N gi πe(A) g 3 The Gini Exposure index 3.1 The index The Gini inequality index of a univariate income distribution, represented by the N-dimensional vector x, is defined as the average income gap between 2One particular case is the uniform weighting scheme, where ξ (A) = 1/N(A) for all i i∈N(A). 6 any pair of realizations in the income distribution x, scaled by the overall average income: N N 1 G(x) := |x −x |. i j 2N2( x /N) i i i=1 j=1 XX P Alternatively,theGiniindexcanberelatedtotheLorenzcurve: itisequal to twice the area between the Lorenz curve and the diagonal, representing the equal distribution.3 As illustrated by Shephard (1974), the overall area delimited by the Lorenz curve can be represented as the sum of the areas spanned by every pair of vectors (x ,1) and (x ,1), corresponding to the i j determinant of a 2 × 2 matrix formed by these vectors. The gap |x −x | i j corresponds, in fact, to the determinant of these matrices. It follows that the Gini inequality index rewrites:4 1 1 1 x /( x /N) x /( x /N) G(x) := det i i i j i i 2 N N 1 1 ∀{i,j}⊆{1,...,N} (cid:12) (cid:18) P P (cid:19)(cid:12) X (cid:12) (cid:12) (cid:12) (cid:12) In practice, the Gini inequality(cid:12) index can be conceptualized as a weigh(cid:12)ted average of the dissimilarity between the incomes of pairs of units and the two units’ weights. The function measuring the intensity of this dissimilarity is the determinant, while the weights corresponds to the probability of drawing the pair of units i and j from the sample. Since every pair of incomes can be comparedtwice, theindexmustbestandardizedbytwo, sothatitsmaximum is equal to one. Asimilarlogiccanbeadaptedtothemeasurementofthedegreeofdissim- ilarity in interaction profiles, where income realizations have to be replaced by probabilities of interaction. Segregation assessments boil down to check 3The Lorenz Zonotope of distribution is defined as the area between the Lorenz curve and its dual. It can be written as a Minkowski sum of line segments, hence its area equals the sum of the areas spanned by each pair of bi-dimensional vectors xi , 1 xi N and xj , 1 , for all i,j. This area coincides with a parallelogram and it co(cid:16)rPrespond(cid:17)s xi N to a(cid:16)measure o(cid:17)f inequality between incomes shares received by two individuals i,j equally P weighted 1 in the population. N 4The terms 1 disappears at it is incorporated in the determinant calculation. N2 ixi/N Thecomparisonisnowexpressedinrelative,ratherthanabsolute,incomes. Moreover,the P determinant is a measure of linear dependence, and therefore similarity, between oriented vectors. 7 how much dissimilar is each group’s interaction probability deviation from the mean from the value 1. A configuration exhibiting no segregation corre- sponds to the case in which i’s interaction profile with group g is such that π = πe, for every g ∈ G. gi g An obvious extension of the Gini inequality index is the expected Gini (EG) segregation index analyzed in Flu¨ckiger and Silber (1999) and Alonso- Villar and del Rio (2010). The EG index is an average of local Gini indices G , weighted by groups size: g EG(A) := s G (A), g g g∈G X where s = Ng is the share of individuals in the network associated to group g N g. EachlocalGiniindexismeanttocapturetheinequalityinthedistribution of interaction probability with a group, say g, across the population: 1 πgi1(A) πgi2(A) Gg(A) := 2 ξi1(A)ξi2(A)(cid:12)det πge(1A)] πge(1A)] !(cid:12). ∀{i1,iX2}⊆N(A) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) The expected Gini index assumes that ev(cid:12)aluations of segregation(cid:12)can be separated across dimensions. This strong assumption leaves aside concerns about the composition of the interaction profiles. To overcome these limita- tions, we propose a multi-group extension of the local Gini index presented above, denoted the Gini Exposure index of segregation. The Gini Exposure index of segregation, G : C(G) → [0,1] captures E the dispersion in the normalized interaction profiles across units in the same configuration. The index is a weighted mean of a measure of dissimilarity between G-tuples of interaction profiles, as captured by the determinant of a square G × G matrix. The weight attached to each G-tuple corresponds to its probability of being observed. The index is standardized by G!, the overall number of possible G-tuples, so that the index maximum is equal to one. Definition 2 (The Gini Exposure segregation index) 1 G (A) := ξ (A)·...·ξ (A) det a ... a . E G! i1 iG i1 iG ∀{i1,...,XiG}⊆N(A) (cid:12) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) 8 3.2 A geometric illustration of the index Anequivalentwayofassessingheterogeneityininteractionprofilesconsistsin looking at the likelihood that any randomly chosen individual from group g interacts with individual i, given the original information about the distribu- tion of interaction profiles across the population. For configuration A, define the interaction likelihood LA ∈ [0,1] as this probability. The sequence of gi probabilities LA,...,LA defines a distribution of interaction likelihoods of g1 gN group g with all the units in the distribution, hence satisfying LA = 1 i∈N gi for every g ∈ G. The interaction likelihood is tied to interaction profiles and P individual weights through the Bayes’ rule: π ξ LA := a ξ = gi i. gi gi i πe g Heterogeneity in interaction profiles always implies that a form of dissim- ilarity between interaction likelihoods prevails. When all interaction profiles coincide, then a = 1 and LA = ξ for any i and g, meaning that the gi gi i interaction likelihoods coincide across groups. This does not necessary im- ply, however, that the interaction likelihoods are constant across individuals. Conversely, when each individual interacts with exactly one group, say g, then the knowledge of the group allows to infer with certainty the individ- uals that will interact with it, because LA > LA = 0 for all g′ 6= g. All gi g′i in-between situations display some form of dissimilarity between the rows of the interaction likelihood matrix LA associated with configuration A and defined as: LA ... LA 11 1N(A) LA := (ℓ1,...,ℓN(A)) = ... ... , LA ... LA G1 GN(A) where LA is a row stochastic matrix (i.e. the entries add up to one by row, but not necessarily by column) of the type analyzed in Andreoli and Zoli (2014). Andreoli and Zoli show that the dissimilarity between the rows of a G× N stochastic matrix (depicting sets of G discrete probabilities distributions defines over n classes of realizations) can be visually represented through the Zonotope set Z of the interaction likelyhood matrix LA, denoted Z(LA). The Zonotope is a centrally symmetric polytope in the G-dimensional space 9
Description: