Eurasian Journal of Educational Research 68 (2017) 1-17 Eurasian Journal of Educational Research www.ejer.com.tr Applying Bootstrap Resampling to Compute Confidence Intervals for Various Statistics with R C. Deha DOGAN1 A R T I C L E I NF O A B S T R A C T Article History: Background: Most of the studies in academic journals use p values to represent statistical Received: 13 October 2016 significance. However, this is not a good Received in revised form: 02 December 2016 indicator of practical significance. Although Accepted: 22 January 2017 confidence intervals provide information about the precision of point estimation, they are, DOI: http://dx.doi.org/10.14689/ejer.2017.68.1 unfortunately, rarely used. The infrequent use of Keywords confidence intervals might be due to estimation Confidence Interval difficulties for some statistics. The bootstrap P value method enables researchers to calculate confidence intervals for any statistics. Bootstrap Bootstrapped resampling resampling is an effective method of computing Methods of bootstrapping confidence intervals for nearly any estimate, but R software it is not very commonly used. This may be because this method is not well known or people may think that it is complex to calculate. On the other hand, researchers may not be familiar with R and be unable to write proper codes. Purpose: The purpose of this study is to present the steps in the bootstrap resampling method to calculate confidence intervals using R. It is aimed toward guiding graduate students and researchers who wish to implement this method. Computations of bootstrapped confidence interval for mean, median and Cronbach’s alpha coefficients were explained with the R syntax step-by-step. Moreover, traditional and bootstrapped confidence intervals and bootstrapped methods were compared in order to guide researchers. Main Argument and Conclusions: With the help of statistical software today it is easy to compute confidence intervals for almost any statistics of interest. In this study R syntax were used as an example so that beginners can use R to compute confidence intervals. Results showed that traditional and bootstrapped confidence intervals have very similar results for normally distributed data sets. Moreover different bootstrapped methods produce different results with skewed data sets. This is because bias corrected and accelerated interval methods are suggested for use with skewed data sets. Implications for Research and Practice: R codes presented in this study guide researchers and graduate students while computing bootstrap confidence intervals. Furthermore findings about the comparison of bootstrap methods help researchers choose the most appropriate bootstrap methods. Results and the main argument of this study may encourage researchers to compute bootstrap confidence intervals in their studies. © 2017 Ani Publishing Ltd. All rights reserved 1 Ankara University, Faculty of Educational Science, Department of Measurement and Evaluation, [email protected] C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 2 Introduction The p value is the probability under the assumption that there is no true effect or no true difference of collecting data that shows a difference equal to or more extreme than what it is actually observed (Reinhart, 2015). Hypothesis testing uses p value to get statistical significance. It is the most widely reported statistic in academic journals. M. Marshall et al. (2000) highlighted that significance of testing is reported in 97% of the research papers in experimental psychology research journals. However the p value has some limitations and it is not an indicator of practical significance. It is well known that confidence intervals provide more information than p values (Haukas & Lewis, 2005). Effect sizes, confidence intervals, and confidence intervals of effect sizes are indicators of practical significance (Banjanovic & Osborne, 2015). Editors of many scientific journals require the use of confidence intervals (Cooper, Wears & Schriger, 2003). Moreover APA’s Publication Manual (2001) highlighted the importance of calculating and reporting confidence intervals and effect sizes in academic research. Unfortunately confidence intervals are rarely reported in academic papers. Reinhart (2015) stated the reasons for this as follows: It’s best to do statistics the same way everyone else does, or else the reviewers might reject your paper. Or maybe the widespread confusion about p values obscures the benefits of confidence intervals. Or the overemphasis on hypothesis testing in statistics courses means most scientists don’t know how to calculate and use confidence intervals. According to Banjanovic and Osborne (2015) the infrequent use of confidence intervals is due to estimation difficulties for some statistics. Some statistics may require multi-step formulas with assumptions that might not always be viable for calculating confidence intervals. The bootstrap method enables researchers to calculate confidence intervals for any statistics regardless of the data’s underlying distribution. The empirical bootstrap was introduced in 1979 (Efron 1988), but it was feasible to implement it without modern computing power. However, computers and statistical software have improved a lot, and today it is possible to calculate confidence intervals using the bootstrap method. Moreover free and open source R software enables researchers to write their own syntax to calculate confidence intervals for various statistics. Bootstrapping Briefly, bootstrap methods are resampling techniques for assessing uncertainty. In a broad sense the bootstrap is a widely applicable and extremely powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method (James et al., 2014). Bootstrap resampling is a method of computing confidence intervals for nearly any estimate. In most studies 3 C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 the researchers begin with the population and take a sample from the population and run an analysis on that sample. In bootstrap resampling additional sub-samplings and replications are implemented on the original sample. In other words in the beginning of the process thousands of “bootstrapped resamples” are generated from the original sampling using random sampling with replacements. Then the designated statistic (mean, median, regression, Cronbach’s alpha coefficient, etc.) is replicated in each of these resamples. Therefore, researchers may get thousands of estimates on the designated statistics. Distribution of those estimates is called “bootstrap distributions”. The bootstrap distribution may be used to estimate more robust empirical confidence intervals. In bootstrap sampling the number of replications is very important. Diciccio and Efron (1996) highlight the importance of using at least 2000 replications while conducting bootstrap resampling. A schematic description of the steps for estimating confidence intervals using bootstrap formed by Haukoos & Lewis (2005) is shown in Figure 1. Figure 1: Description of the steps in bootstrapping. Methods of Bootstrapping There are different methods for estimating confidence intervals from a bootstrapped distribution. The most frequently used methods are: • The normal interval method C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 4 • The percentile interval method • The basic interval method • The bias corrected and accelerated interval method. The normal interval method computes an approximate standard error using bootstrap distribution (sampling distributions resulting from bootstrap resamples). Then Z distribution is used to get the confidence interval. The percentile interval method uses a frequency histogram of the specific statistics computed from bootstrap samplings. The 2.5 and 97.5 percentiles constitute the limits of 95% confidence intervals (Haukoos & Lewis, 2005). The percentile interval method makes no adjustment while the student interval method corrects each statistic by its associated standard error and converts the distribution to studentized distribution. Then the confidence intervals are found at the 0.025 and 0.975 quantiles as done in the percentile interval method. The bias corrected and accelerated interval method corrects the distribution for bias and acceleration. This method adjusts the distribution based on two coefficient called “bias correction” and “acceleration”. The bias correction adjusts for the skewness in bootstrap distribution; it will be zero when bootstrap sampling is perfectly symmetric. On the other hand, coefficients of acceleration do corrections for non-constant variances within the resampled data set (Efron, 1988). Then confidence intervals are found at the 0.025 and 0.975 quantiles of the corrected distribution. The basic interval method corrects the distribution for bias and detects the lower and upper bounds which cover the desired confidence interval (Banjanovic & Osborne, 2015). Each bootstrapping method has advantages and disadvantages, and it is important to use the more appropriate method when computing confidence intervals for the statistic of interest. Unfortunately, it is not very common to use the bootstrap method to calculate confidence intervals. This may be because they are not well known or people may think that it is complex to calculate. There is statistical software that enables users to compute confidence intervals using bootstrap methods. R is one such software. It is a language and environment for statistical computing and graphics. Because it is free and open-sourced, R has become popular recently for statistical data analysis. Moreover R syntax may seem complicated for people and this may dissuade them from use bootstrapping methods to calculate confidence intervals. Furthermore, comparisons of bootstrapping methods may guide researchers while deciding the methods to be used. The purpose of this study is to present the steps in the bootstrap resampling method to calculate confidence intervals using R syntax. It is aimed to guide graduate students and researchers who wish to implement bootstrap resampling using R programming language. Computation of bootstrapped confidence intervals for mean, median and Cronbach’s alpha coefficient were explained step-by-step using the R syntax. Moreover, some comparisons have been made. Traditional and bootstrapped confidence intervals were compared while computing mean for normally distributed data and median for normally distributed and skewed data sets. 5 C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 In this study R codes were written on R consoles and directly copied and pasted on this manuscript. This is why R codes have different letter characters and sizes from the other texts in the paper; so many R codes were presented in the paper, they were not defined as figures. Traditional Confidence Interval for Mean Traditionally confidence intervals are computed using the formula (cid:1876)(cid:3399)1.96∗ (cid:3046) (cid:4666) (cid:4667) where (cid:1876)̅ is the mean and s is the standard deviation and n is the sample size. In √(cid:3041) this part at first a pedagogical example is presented to compare traditional confidence intervals and bootstrap confidence intervals. We will first generate random data with 100,000 observations. This data set is viewed as population. The population has random normal distribution (Mu=60, Sigma=7) with 100,000 observations. Next we take six random samples of 50 observations from the population in order to compute traditional confidence intervals. The next step is to write a simple function to calculate confidence intervals for the six samples taken randomly from the population. Now it is easy to compute 95% confidence intervals for the samples randomly taken from the population. We can round the results to two digits using the round () function. C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 6 We can also write another function to take six random samples from the population. It would be better to define set.seed () for each sample. We will use the same samples to generate six different bootstrap distributions so we can compare confidence intervals. Now we can compute 95% confidence intervals of six samples randomly taken from the population. We defined set.seeds as 10, 20, 30, 40, 50 and 60 for 6 samples respectively. Bootstrapped Confidence Interval for Mean In order to get bootstrap distribution, the “boot” package will be used (Canty & Ripley 2016). The ”boot” function is used to generate bootstrap distribution for specific samples, but this function requires writing simple functions about the statistics of interest. 7 C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 The first function is used for the first argument of “boot” function. It is used to take the specific sample from the population. The second one is the statistics of interest that is “mean” in this example. So we generated six bootstrap distributions with 2,000 resamples. We used the “select” and “mean.func” functions within the “boot” functions. Below is the bootstrap statistics for the first sample. The value original is the mean of the whole sample while bias is the difference between the original mean and the mean of the bootstrapped samples. Standard error is the standard deviation of the simulated values. The next step is calculating confidence intervals for each original and bootstrapped samples. We used the”boot.ci” function to compute confidence intervals for each bootstrapped sample. The “basic” title in the R output refers to “basic interval method”. In this method confidence intervals are estimated by correcting the bootstrap distribution for bias or skew. The “Bca” title in R output refers to the “bias corrected and accelerated interval method”. In the Bca method the bootstrap distribution is corrected for bias and C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 8 acceleration and the CI are found at the .025 and .975 quantiles of the corrected distribution (Carpenter & Bithell 2000; Banjanovic & Osborne, 2015). Comparison of Traditional and Bootstrapped Confidence Intervals for Mean The result of the “boot.ci” function produces confidence intervals for four main bootstrapped methods discussed earlier. Table 1 presents the comparison of confidence intervals of each original random sample and its associated bootstrap distributions. Table 1 Confidence Intervals for Original and Bootstrapped Samples Sample Mean Traditional 95% CI Bootstrapped 95% CI Lower Bound Upper Bound Lower Bound Upper Bound 1 61.33 59.76 62.90 59.79 62.92 2 60.46 58.27 62.65 58.34 62.59 3 59.73 57.55 61.91 57.54 61.94 4 59.21 57.33 61.09 57.32 61.08 5 61.71 59.63 63.89 59.53 63.88 6 61.24 59.03 63.45 58.97 63.46 As seen in Table 1 two methods produce very similar results. The differences are at the first or second decimal place. However, it is very important to note that those methods will yield very similar results unless the data violates parametric assumptions such as normality (Banjanovic & Osborne, 2015). Bootstrapped Confidence Intervals for Median Median is the observation at the 50th percentile in a set of data ordered from the lowest value to the highest value. It is commonly reported and considered a more valid definition of center when the frequency distribution of the variable is skewed. No simple formula exists for computing confidence intervals for median. According to central limit theorem, the number of resampled data sets increasing the distribution of the resulting statistic will become approximately normal (Zar, 1999). However, using the bootstrapped resampling method, it is possible to calculate the confidence interval for median. In this example we will calculate the confidence interval for median using two different data sets; one is normally distributed and the other is skewed. Next we will compare bootstrapped methods with each other. We had a normally distributed data set with 1,000,000 observations called “population”. Now we generate another data set called “population 2” with 9 C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 1,000,000 observations using skewed chi-square distribution. In the previous example the population had 100,000 observations. There is no technical reason for increasing the number of the observations from 100,000 to 1,000,000. Both are large enough. It is aimed to demonstrate to the reader how R could easily generate big data sets. First we will compute a confidence interval for median using a normally distributed population. We compute bootstrapped distributions and bootstrapped confidence intervals using the functions below, previously mentioned in detail. Additionally we added a new simple function called “median.func” to calculate the median within the “boot” function. Then we will compute bootstrapped confidence intervals for the median using skewed distributed population. Comparison of Bootstrapping Methods Regarding Distributions of Data Sets Table 2 summarizes the confidence intervals for normally distributed and skewed data sets regarding bootstrapping methods. C. Deha DOGAN / Eurasian Journal of Educational Research 68(2017) 1‐17 10 Table 2 Comparison Confidence Intervals Regarding Bootstrapping Methods Normally Distributed Skewed Data Data Methods for Bootstrapping 95% Confidence 95% Confidence Interval Interval Lower Upper Lower Upper Bound Bound Bound Bound The normal interval method 56.80 59.04 4.133 6.071 The percentile interval method 56.87 59.35 4.081 5.865 The basic interval method 56.66 59.14 4.295 6.078 Bias corrected and accelerated 56.86 59.32 3.740 5.848 interval method Table 2 shows that all bootstrapping methods for normally distributed data sets produce very similar confidence intervals. The only differences are at the first or second decimal place. On the other hand, for skewed data sets bootstrapping methods produce different confidence intervals. Therefore, different methods of skewed data sets of bootstrapping may produce different results. Before deciding the bootstrapping method to be used, their assumptions should be taken into consideration. The bias corrected and accelerated interval method requires no assumptions about the distribution of the data sets while others do. So it is better to consider using this method to compute confidence intervals with skewed data set. Bootstrapped Confidence Intervals for Cronbach’s Alpha Coefficient Cronbach’s alpha coefficient is an indicator of reliability that is commonly used, especially in psychological tests. In fact, it is the indicator of internal consistency. Many researchers use Cronbach’s alpha coefficient for a set of items to construct a scale. Coefficient alpha (commonly called Cronbach’s alpha) was developed by Lee Cronbach in 1951 to provide a measure of the internal consistency of a test or scale; it is expressed as a number between 0 and 1. Calculating alpha has become common practice because it is easier to use than other estimates (e.g. test retest reliability estimates) as it only requires one test administration (Tavakol & Dennick, 2011). Item-total correlations are the correlation between an item and all other items, where the total of the other items is achieved by summing and averaging them (Banjanovic & Osborne, 2015). Moreover computing confidence intervals for Cronbach’s alpha coefficient and item total correlations provides a very good indication of the generalizability of the results.