ebook img

Socially Relevant Identity: Addressing Selection Bias Issues and Introducing the AMAR PDF

52 Pages·2016·1.07 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Socially Relevant Identity: Addressing Selection Bias Issues and Introducing the AMAR

Socially Relevant Identity: Addressing Selection Bias Issues and Introducing the AMAR (All Minorities at Risk) Data.1 Jóhanna K Birnir Univeristy of Maryland David D Laitin Stanford University Jonathan Wilkenfeld University of Maryland Agatha Hultquist University of Maryland David M. Waguespack University of Maryland Ted R. Gurr University of Maryland © Draft please cite but do not quote. Abstract: The paper introduces the AMAR (All Minorities at Risk) data, a coded sample of socially recognized and salient ethnic groups. We describe the data and review under-explored selection issues arising with truncated ethnic group data, especially when moving between levels of analysis. Next we suggest some directions for the future study of ethnicity and conflict using our bias corrected data, including a better estimate of overall group propensity for ethnic violence. We also correlate group violence and some prominent group and country level variables proposed as causes of ethnic violence. Our correlations suggest that some group level relationships likely are missed and/or incorrectly specified in the literature. Furthermore, country level measures such as ethnic heterogeneity and economic development, while related to absolute levels of violence in a given country, in and of themselves may not be as significant correlates of ethnic group propensity for rebellion as has been previously reported. 1 We thank the National Science Foundation for supporting this work (grant #SES0718957). Parts of the work have been presented at Juan March Institute, Folke Bernadotte Academy, Penn State University, Uppsala University, International Studies Association, University of California Los Angeles, Stanford University, University of Maryland, Midwest Political Science Association, the Pentagon and Yale University. We thank the many discussants and others who have generously read and commented including James Fearon, Andreas Wimmer, Stephen Saideman, Lars-Erik Cederman, and Kathleen Cunningham. 1 Introduction This paper addresses the well-known selection bias issue plaguing the Minorities at Risk (MAR) dataset that has nonetheless been widely used to examine the association between ethnic diversity and violent ethnic political mobilization. Precise measurement of this association has been challenging2 due in large part to the absence of a group-level sample - free from known selection issues - that would allow us to estimate the probability of any ethnic group to be engaged in violent confrontation with the state. The paper introduces the AMAR (All Minorities at Risk) sample of socially recognized and salient ethnic groups, which we call the AMAR Phase I data. Guided by theories of ethnic politics that help drive the selection of the appropriate sampling frame, the AMAR sample frame3 (Birnir et. al 2015) enumerates 1202 ethnic groups, including over 900 groups that were not included in the MAR groups data project. 4 From this set of new groups, we code in this paper a random sample of 74 groups, stratified by region and size, for the suite of extant MAR variables. With statistical weighting, we combine this random set with the current MAR data, allowing us to 2 For an overview of the literature on the detrimental effects of ethnicity see Chandra (2012) chapter 1. 3 For an extended discussion of the challenges to constructing a sample frame for ethnic see Birnir et al. (2015) 4 The total number of groups in the current AMAR sample frame differs from the total number listed in Birnir et al. 2015, which was 1196, because six new groups were added since the paper’s publication. These new groups were added based on updated information, and include the Afromexicans in Mexico, Bantenese in Indonesia, Bemba/Shila in the Democratic Republic of Congo, French in Belgium, Italians in Germany, and Irish in the United Kingdom. 2 address selection bias concerns5 that have been a nemesis for existing studies of the relationship between ethnicity and violence. In this paper, we first review the selection concerns in the study of ethnic conflict with an emphasis on the underexplored selection issue that arises with truncated data especially when moving between levels of analysis. Next we describe our sampling solution and the resulting AMAR data (taking into account the impact of error in the sampling frame) with an eye to assessing the prevalence of group participation in violence. Finally, to illustrate issues of selection bias and suggest some directions for the future study of ethnicity using our bias corrected data, we use the coded sample to better estimate overall group propensity for ethnic violence in the world. With that goal, we correlate group violence and some prominent group level and country level variables that have been proposed as causes of ethnic violence, including political, economic, and cultural grievances, group concentration, wealth, and ethnolinguistic fractionalization. Substantively, along the lines of Fearon and Laitin (1996), but with results for the entire world, our descriptive findings suggest that only a minority of widely recognized ethnic groups ever engage in conflict against the state. The preliminary group level correlations support the concerns in the literature that selection bias decreases the likelihood that relationships are detected. Moreover, our suggestive correlations using data collected at different levels indicate that ethnic heterogeneity and economic development, while related to absolute levels of violence in a given country, in and of themselves may not be as significant correlates of ethnic group propensity for rebellion as has been previously reported (Reagan and Norton 2005; Olzak 2006; Walter 2006; Cetinyan 2002). In sum, the AMAR sample data permits estimations in future 5 For a discussion of the selection bias in MAR data see Fearon & Laitin, 1996; Fearon & Laitin, 2002, 2003; Fearon, 2003; Öberg, 2002a; Hug, 2003, 2013; Birnir, 2007; Brancati, 2006, 2009. 3 research of conflict potential at the group level with a greater degree of confidence in our results. In order to make valid inferences about ethnic conflict, there is an urgency in addressing the group level data used in the study of ethnic conflict. As social scientists in many fields have sought to understand the mechanisms underlying their causal claims, they have found cross- country regressions to be unhelpful, and have sought greater levels of disaggregation that would permit better controls and easier identification. While progress was fruitful in earlier studies of ethnic rebellion relying on the country/year as the unit of analysis (Collier and Hoeffler 2004; Fearon and Laitin 2003), critics have demanded more attention to disaggregated studies at the group level (e.g. Cederman et al 2013). However, group level studies face a fundamental problem not encountered in the country/year set-up. While there is little disagreement on the number of countries in the world in any given year, there is no such agreement on the number of ethnic groups. Any sampling of the near infinite number of groups (if you permit dialects and sub-dialects to differentiate groups) will be arbitrary and subject to claims of bias. One goal of this paper is to present an approach that is sampling from one possible frame of ethnic groups, which will permit cumulative research on ethnic conflict based on disaggregated group-level data. Empirical obstacles to examining the route to ethnic war Established Selection Issues In the study of ethnic conflict, selection issues are a recurring concern because the principal data used for empirical analysis, thus far, is based on the selection of groups that have already engaged with the state, as in the MAR data6 and more recently groups that are “politically 6 For a more extensive account of the selection bias problem as it pertains to the MAR data see Fearon and Laitin 2002, 2003; Fearon 2003; Birnir et al. 2015. 4 relevant” as in the Ethnic Power Relations (EPR) data (Wimmer, Cederman and Min 2009). Both data sets have been used to reveal patterns of conflict. But researchers need to be concerned about their sampling criterion and the conditions under which those patterns hold. Selection issues become especially problematic when we ask questions about what makes an ethnic group prone to violent conflict since both samples are selected on criteria that are likely to be correlated with a propensity for conflict. 7 Selection bias is a fundamental problem for drawing either descriptive or causal inferences from data (Geddes; 2003; Shively, 2006; Hug 2010; Weidman, 2016). Selection biases are of many different types and cause distinct problems. One problem is that independent of concerns about estimating relationships between variables, much interest often centers on simple descriptive statistics about base rates in a population, which obviously cannot be estimated from a biased sample. It is likely, therefore, that we know less than we think about the prevalence of outcomes such as ethnic conflict. A second selection concern focuses on detecting relationships between variables. When unrelated to the explanatory variable(s), selection on the dependent variable obscures a statistical effect where there really is one. For example, in the case of over selection of groups engaged in ethnic conflict, the reduction in variation on the dependent variable implies that we cannot, even if we had reasonable instruments, confidently determine the causes of rebellion using only the original MAR data. Without a representative sample of “ethnic groups” for each country, we cannot get confident estimates on the relationship between ethnic diversity and the frequency and type of ethnic conflict. Hug (2013) notes that it is likely that many “true” relationships go entirely undetected because of the aforementioned bias in the data. In particular he makes the 7 See Vogt et al. 2015 for a discussion specifically pertaining to EPR. 5 case that contrary to the null finding of Gurr and Moore (1997), grievances likely do affect group propensity for rebellion. A third selection concern is reporting bias.8 Reporting bias happens in different ways. Weidman describes reporting bias in event data where the outcome is missed at random and some cases, therefore, get incorrectly coded. Random reporting errors likely impact estimates of relationships between variables as would noise. Alternatively, some outcomes are systematically more likely to be reported, sometimes as a function of explanatory variables (Weidman 2016). The principal concern with reporting bias centers on the latter types of cases where the reporting of the outcome is systematically related to purported explanatory variables. This type of bias may affect both the magnitude and direction of a correlation between an independent and a dependent variable (Hug 2010; Weidman, 2016). Truncation of Group Data A special class of selection concerns is the problem of truncated data.9 In cases of truncated data, and unlike instances of selection on the dependent variable, values are included both where the outcome of interest occurs and does not occur. Furthermore, the dependent variable is not erroneously coded for some cases as in cases of reporting bias. In the case of truncation, what is especially worrisome with ethnic data lacking a coherent sample frame (Birnir et all 2015) is that data collection projects could be attuned to more obscure ethnic groups in one country but less so in another. Indeed, researchers have limited their selection of group 8 Importantly, reporting bias presumes there exists a sample frame for coding of non- occurrences. 9 The resultant problems in many cases likely resemble those occurring with reporting bias but the types of data suffering from each possibly differ. 6 level data by circumscribing the types of groups included – e.g., groups that are politically mobilized, discriminated against, or politically relevant – without estimating the implications of these limitations for their statistical estimations. The effects of data truncation are of special interest in this paper because in data on ethnic groups this is likely a bigger problem than is reporting bias. Indeed, exploring group violence Fearon (2003) found little evidence of reporting bias in the MAR data – at least with respect to violent outcomes. Specifically, among the 539 groups not in MAR added by Fearon (2003), there were only 11 instances of rebellion between 1945 and 1998. 10 However, as shown in Figure 1, when comparing MAR with the AMAR sample frame where groups were selected irrespective of any political criteria (Birnir et al. 2015) a high percentage of socially relevant AMAR groups is missing from the original MAR data, especially in some of the most heterogeneous countries in the world. Figure 1: By country: Proportion of Socially Relevant Groups in the AMAR Sample Frame but Missing From MAR. A significant selection concern associated with truncated data, as with data suffering from 10 This low reporting bias was independently confirmed by Brancati (personal conversation). 7 reporting bias, is that systematic truncation may render estimates of the relationship between the independent and dependent variable unreliable. To better understand how truncation affects these relationships, we constructed a generic simulation that systematically truncates data to drop more observations where the outcome did not occur – in ways that are also related to the explanatory variable. In short, we found that when compared to results from the “true” un-truncated data, coefficients estimating relationships between an independent and a dependent variable in the data that was systematically truncated varied substantially in size, sometimes even changing signs, when compared to the “true” correlation. Furthermore, we found standard errors to be invariably larger than in the “true” data, though this sometimes rendered correlations more significant and sometimes less, depending on the corresponding size of the biased coefficient. Judging by the simulation, truncation of data is, therefore, a significant threat to the accuracy of inference using uncorrected data on ethnic groups. (For details on the simulation see data Appendix). Group Level Truncation in Country Level Analysis. One of the potential problems resulting from truncation of group data (likely also a problem in data suffering from reporting bias) that has not been widely explored in the literature is error in inference when moving between levels of analysis. Despite receiving little attention, this type of error is possibly a serious problem in the literature because many studies use biased (by way of truncation) group level statistics to show correlations with a number of aggregate causal variables that do not vary within a country. Specifically, the problem is that because of truncation, MAR and other datasets that select groups on some limited criteria provide an incorrect estimate of average group propensity to engage in outcomes such as violence, at levels more aggregated than the group, such as the country. If this limited ethnic group information is then regressed on measures that do not vary 8 within the country but only between countries, the resulting association is not necessarily an accurate indicator of ethnic group level propensity of engaging in the outcome of interest in a given country when compared to other countries. Instead, in many cases (at least where the total number of groups engaging in violence is high but the group proportion engaging in this activity is low) we will see positive correlations at the country level that henceforth have often been mistaken as indicators of group level propensity to engage in violence in any country. This problem is best demonstrated with an example, as illustrated in Table 1. Suppose that in two countries X and Y there live 10 and 100 groups respectively. Hypothetical biased group-level data including information on all violent groups and some peaceful groups contains information on 8 groups from country X, 2 of which are violent and information about 20 groups from country Y, 10 of which are violent. The aggregate country level measure of group violence in countries X and Y would then show that 25% and 50% of groups engaging in violence respectively. Suppose now that we were to collect information on the remaining two groups in country X and the remaining eighty groups in country Y, and find that the remaining groups in both countries are peaceful. Calculating the proportion of violent groups in each country we now find that 20% of all groups in country X engage in violence while only 10% of all groups in country Y ever engage in any violence. Consequently, while it is still true that country Y experiences greater levels of violence than country X, it is also true that any one group in country Y is less likely to engage in violence than is any one group in country X. 9 Table 1: Country vs. Group Propensity for Violence Aggregate measure of violence in a Group measure of violence in a biased or incomplete sample of representative or complete sample of Country groups groups X 25% (2/8) 20% (2/10) Y 50% (10/20) 10% (10/100) This problem is not commonly discussed in the literature on ethnic conflict that often uses biased group data to make inferences about group propensities. 11 Consider examples of country level measures that in the literature have been associated with group propensity to engage in violence. These include ethnic fractionalization measures (Reagan and Norton 2005; Olzak 2006), measures of political institutions (Saideman et al 2002; Alonso and Ruiz-Rufino 2007), and measures of country level development (Cetinyan 2002; Walter 2006). While inferences have been made from these measures for group level measures of violence, in all likelihood these studies are really measuring country level probabilities. All MAR (AMAR) Sample Frame. Weidman (2016) laments that current probes of selection bias issues alternately assume away the problem; only focus on sensitivity analyses assuming the direction of bias without any evidence; or rely on estimators that require strong statistical assumptions. Instead, he suggests that whenever possible, using real data to establish and solve the problem is a preferable solution. The sensitivity analyses in the simulation described above confirmed our suspicion that truncation of data likely presents a problem for analytical inference. Heeding Weidman’s call, we 11 The problem of truncation is also separate from the problem of reporting bias because even in the instances of substantial reporting bias the average values between more aggregate units may still maintain their relative order. 10

Description:
This paper addresses the well-known selection bias issue plaguing the 5 For a discussion of the selection bias in MAR data see Fearon & Laitin, .. Egyptians. Serbia. Serbs. Somalia. Bantu (Non-Somali). Spain. Valencian .. than groups in homogeneous societies to engage in violence or that their
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.