Journal of Modern Applied Statistical Methods Volume 12|Issue 2 Article 5 11-1-2013 Constructing Confidence Intervals for Effect Sizes in ANOVA Designs Li-Ting Chen Indiana University, Bloomington, IN, [email protected] Chao-Ying Joanne Peng Indiana University, Bloomington, IN, [email protected] Follow this and additional works at:http://digitalcommons.wayne.edu/jmasm Part of theApplied Statistics Commons,Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Chen, Li-Ting and Peng, Chao-Ying Joanne (2013) "Constructing Confidence Intervals for Effect Sizes in ANOVA Designs,"Journal of Modern Applied Statistical Methods: Vol. 12 : Iss. 2 , Article 5. DOI: 10.22237/jmasm/1383278640 Available at:http://digitalcommons.wayne.edu/jmasm/vol12/iss2/5 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Constructing Confidence Intervals for Effect Sizes in ANOVA Designs Cover Page Footnote This research was supported in part by the Maris M. Proffitt and Mary Higgins Proffitt Endowment Grant of Indiana University, awarded to the second author while the first author worked on the project as a research assistant. This regular article is available in Journal of Modern Applied Statistical Methods:http://digitalcommons.wayne.edu/jmasm/vol12/ iss2/5 Journal of Modern Applied Statistical Methods Copyright © 2013 JMASM, Inc. November 2013, Vol. 12, No. 2, 82-104. ISSN 1538 − 9472 Constructing Confidence Intervals for Effect Sizes in ANOVA Designs Li-Ting Chen Chao-Ying Joanne Peng Indiana University Indiana University Bloomington, IN Bloomington, IN A confidence interval for effect sizes provides a range of plausible population effect sizes (ES) that are consistent with data. This article defines an ES as a standardized linear contrast of means. The noncentral method, Bonett’s method, and the bias-corrected and accelerated bootstrap method are illustrated for constructing the confidence interval for such an effect size. Results obtained from the three methods are discussed and interpretations of results are offered. Keywords: Confidence interval, linear contrast, effect size, bootstrap, noncentral Introduction The importance of reporting effect sizes (ESs) and confidence intervals (CIs) has been strongly emphasized in the debate over null hypothesis significance testing as a methodology in social science research (Cohen, 1994; McCartney & Rosenthal, 2000; Nix & Barnette, 1998; Schmidt, 1996, although see Sawilowsky & Yoon, 2002, in this journal for a contrary view). Cumming (2012) characterized the shift from reliance on null hypothesis significance testing to the use of ESs, CIs, and meta-analyses as new statistics. Thompson (2002) stated, “An improved quantitative science would emphasize the use of confidence intervals (CIs), and especially CIs for effect sizes” (p.25), and constructing CIs for ESs facilitates meta-analytic thinking and interpretation. Thompson explained that reporting CIs allows future researchers to incorporate prior knowledge into the estimation of the same population ES. Furthermore, CI is directly related to the precision of ES estimates obtained from different studies. (See Knapp & Sawilowsky, 2001a, 2001b for a contrary view.) Professional organizations such as the American Psychological Association (APA) and the American Educational Research Association (AERA) have both Dr. Chen is a recent PhD graduate. Email her at: [email protected]. Dr. Peng is a professor of inquiry methodology and adjunct professor of statistics in the Department of Counseling and Educational Psychology. Email her at: [email protected]. 82 CHEN & PENG stressed the importance of reporting CIs for ESs, particularly since 1999. According to the APA Task Force Report, “Interval estimates should be given for any effect sizes involving principal outcomes” (Wilkinson and the Task Force on Statistical Inference, 1999, p. 599). The fifth and sixth editions of the APA Publication Manual stress that “The inclusion of confidence intervals (for estimates of parameters, for functions of parameters such as differences in means, and for effect sizes) can be an extremely effective way of reporting results” (APA, 6th edition, 2010, p.34). In addition, the sixth edition of the APA Publication Manual emphasizes, “Whenever possible, provide confidence interval for each effect size reported to indicate the precision of estimation of the effect size” (APA, 6th edition, 2010, p.34). Likewise, the AERA’s Standards for Reporting on Empirical Social Science Research suggest that, “For each of the statistical results that is crucial to the logic of the design and analysis, there should be included: … An indication of the uncertainty of that index of effect size …” (AERA, 2006, p. 37). According to the sixth edition of the APA Publication Manual, it is crucial to report confidence intervals because “confidence intervals combine information on location and precision and can often be directly used to infer significance levels, they are, in general, the best reporting strategy” (p. 34). For ways to report CIs, the same APA manual states, “As a rule, it is best to use a single confidence level, specified on an a priori basis (e.g., a 95% or 99% confidence interval), throughout the manuscript. Wherever possible, base discussion and interpretation of results on point and interval estimates” (p. 34). Despite these efforts, the reporting rate of CIs for ESs in empirical studies is still low (Odgaard & Fowler, 2010; Peng, Chen, Chiang, & Chiang, 2013). This phenomenon may be due to a lack of understanding of the statistical properties of CIs for ESs, or a lack of suitable algorithms for the construction of CIs implemented in commercial statistic software (e.g., SPSS, SAS). Thus, this article aims to present three methods and algorithms for constructing the CI for a standardized linear contrast of means in a one-way fixed-effects univariate ANOVA design. This article defines a standardized linear contrast of means as a measure of ES for fixed-effects ANOVA designs. And the three methods are: the noncentral method, Bonett’s method, and the bias-corrected and accelerated bootstrap method. To facilitate the understanding of standardized linear contrasts of means and to illustrate the three methods, a sleep deprivation example from Kirk (1995) is used. This example serves as a template for discussing the construction of CI for a standardized linear contrast of means using the three methods. 83 CONSTRUCTING CONFIDENCE INTERVALS IN ANOVA DESIGNS A sleep deprivation example This example examines the effects of sleep deprivation on hand-steadiness. According to Kirk (1995): Assume an interest in the effects of sleep deprivation, treatment A, on hand-steadiness. The four levels of sleep deprivation of interest are 12, 18, 24, and 30 hours, which are denoted by a , a , a , and a , respectively. 1 2 3 4 An experiment is conducted in which 32 subjects are randomly assigned to the four levels of sleep deprivation, with the restriction that eight subjects are assigned to each level. The dependent variable is the number of time during a 2- minute interval that a stylus makes contact with the side of a ½-inch hole (p. 166). The independent variable is hours of sleep deprivation and the dependent variable is the number of times that a stylus held by a participant makes contact with the side of a ½-inch hole. The higher the number, the worse the performance, presumably affected by the deprivation of sleep. Data gathered from this study are shown in Table 1. Table 1. The number of times that a stylus held by a participant makes contact with a ½- inch hole during a 2-minute interval from the sleep deprivation sample. Hours of Sleep Deprivation 12 hours 18 hours 24 hours 30 hours Treatment Level a1 a2 a3 a4 4 4 5 3 6 5 6 5 3 4 5 6 3 3 4 5 1 2 3 6 3 3 4 7 2 4 3 8 2 3 4 10 Group Sizes (n ) 8 8 8 8 j Group Means (Y ) 3 3.5 4.25 6.25 j σˆ Standard deviation ( ) 1.51 0.93 1.04 2.12 j Consider the hypothesis that a human’s fine motor skill decreases dramatically after being deprived of sleep for 24 hours or longer. Thus, interest lies in the contrast between the average performance of participants after 24 and 84 CHEN & PENG 30 hours of sleep deprivation versus the average performance of participants after 12 and 18 hours. The linear contrast of means (ψ) is written as ψ=0.5×(µ +µ )−0.5×(µ +µ )=∑k c µ, (1) 24 hours 30 hours 12 hours 18 hours j=1 j j where μ is the population mean for the jth group, k is the number of independent j groups (= 4 for the sleep deprivation example), and c is the coefficient or weight j assigned to the jth group (= 0.5, 0.5, ̶ 0.5, and ̶ 0.5 for 24 hours, 30 hours, 12 hours, and 18 hours of sleep deprivation, respectively). Equation 1 and all subsequent equations are written specifically to suit the sleep deprivation example first, followed by a general formulation (in blue). The value obtained from Equation 1 based on sampled data is an estimate of the corresponding population ES in original units. If a researcher wishes to standardize this ES, he/she needs to divide ψ with a standardizer. Such a standardizer is usually the population standard deviation, assumed to be equal and expressed as σ. For the specific ψ defined in Equation 1, its standardized form (δ) is written as follows: ψ 0.5×(µ +µ )−0.5×(µ +µ ) ∑k c µ δ= = 24 hours 30 hours 12 hours 18 hours = j=1 j j . (2) σ σ σ Reporting a standardized linear contrast of means is more informative than reporting a linear contrast of means in original units, when (1) the original unit of the dependent variable is not familiar to readers, or (2) a researcher intends to compare ESs obtained from studies that employ different dependent variables. The following three sections introduce three methods for constructing CIs for standardized linear contrasts of means as ESs. The three methods are the noncentral method, Bonett’s method, and the BCa (or the bias-corrected and accelerated bootstrap) method. After obtaining CIs results are compared and proper interpretations of CIs in this context are discussed. Methods Noncentral Method Within the null hypothesis significance testing framework, a linear contrast ψ defined in Equation 1 is tested with a t-statistic defined as: 85 CONSTRUCTING CONFIDENCE INTERVALS IN ANOVA DESIGNS 0.5×(Y +Y )−0.5×(Y +Y ) ∑k c Y t = 24 hours 30 hours 12 hours 18 hours = j=1 j j , (3) (0.5)2 +(0.5)2 +(−0.5)2 +(−0.5)2 c2 σˆ× σˆ× ∑k j 8 j=1n j where Y is the sample mean for the jth group (= 4.25, 6.25, 3.00, and 3.50 for 24, j 30, 12, and 18 hours of sleep deprivation, respectively), σˆ is the pooled standard deviation that is used to estimate the equal population standard deviation ( ) (= 1.512 +0.932 +1.042 +2.122 /4 =1.48), and n is the sample size for the jth j group (= 8 for each of the four groups in sleep deprivation example). Under the null hypothesis of a 0 linear contrast of means, the t statistic is distributed as a symmetric central t distribution with a mean of 0. When the null hypothesis is false (meaning the population linear contrast of means does not equal 0), the t statistic follows a noncentral t distribution that is centered approximately at the noncentrality parameter λ, when the degree of freedom is large (see Cumming & Finch, 2001). The noncentral t distribution has two parameters: the degrees of freedom (or df = the number of participants ̶ the number of independent groups) and λ. When λ is zero, the noncentral t distribution is the central t distribution, or simply the t distribution. One way to construct the CI for δ defined in Equation 2, is to use the noncentral t distribution. The noncentrality parameter λ of the noncentral t distribution is related to δ as follows, 0.52 0.52 (−0.5)2 (−0.5)2 c2 δ=λ× + + + =λ× ∑k j . (4) 8 8 8 8 j=1n j And λ is defined as follows, 0.5×(µ +µ )−0.5×(µ +µ ) λ= 24 hours 30 hours 12 hours 18 hours = 0.52 0.52 (−0.5)2 (−0.5)2 σ× + + + 8 8 8 8 (5) δ ∑k c µ = j=1 j j . 0.52 +0.52 +(−0.5)2 +(−0.5)2 σ× ∑k c2j 8 8 8 8 j=1n j 86 CHEN & PENG Steiger and Fouladi (1997) illustrated how to derive λ from the observed t statistic obtained from a sample. From λ, using Equation 4, δ can be derived. To construct a 95% confidence interval for δ, first compute the lower and the upper limits of λ from the observed t statistic. The lower limit for λ is the noncentrality parameter of the noncentral t distribution in which the observed t statistic is at the 97.5th percentile. The upper limit for λ is the noncentrality parameter of the noncentral t distribution in which the observed t statistic is at the 2.5th percentile. From the two limits of λ, the limits for δ can be derived. The use of noncentral distributions in constructing the CI for ESs involves sequence of iterations. In recent years, the computational difficulty for the noncentral t distribution has been overcome by algorithmic improvement. For example, the lower and upper limits of λ can be obtained in SAS® with the following syntax: lamda_lower=TNONCT (t_observed, df , .975); and lamda_upper=TNONCT (t_observed, df , .025); The df for the current example is 32 ̶ 4 = 28. Once the lower limit and the upper limit of λ are obtained from SAS®, the lower limit and the upper limit of δ can be computed from the following according to Equation 4: 0.52 0.52 (−0.5)2 (−0.5)2 c2 δ =λ × + + + =λ × ∑k j , and (6) lower lower 8 8 8 8 lower j=1n j 0.52 0.52 (−0.5)2 (−0.5)2 c2 δ =λ × + + + =λ × ∑k j . (7) upper upper 8 8 8 8 upper j=1n j Applying the noncentral method for constructing the CI for a standardized linear contrast of means is discussed in the literature (Cumming & Finch, 2001; Kline, 2004; Steiger, 2004). Liu (2010) illustrated the geometric meaning of the noncentrality parameter for a linear contrast in a Euclidian space. Kelley and Rausch (2006) and Lai and Kelley (2012) considered the sample size required to achieve the desired accuracy in CI estimations. Readers should note that there are 87 CONSTRUCTING CONFIDENCE INTERVALS IN ANOVA DESIGNS two statistical assumptions associated with the noncentral confidence intervals for δ. These two assumptions are (1) normality for each population distribution and (2) equal variances for all population distributions. The SAS® macro “cinoncentral” (See Appendix A) yields the noncentral CI for a standardized linear contrast of means (δ). To execute this SAS® macro, readers first create a SAS data set in the DATA step of SAS®, or import the data into SAS®. This step is followed by the specification of a level of confidence, such as .95, and a coefficient for each group. Bonett’s Method Bonett (2008) proposed a more general definition of the standardized linear contrast of means in order to deal with unequal variances across populations: δ = 0.5×(µ24 hours+µ30 hours)−0.5×(µ12 hours+µ18 hours) = ∑kj=1cjµj = ψ . (8) Bonett σ2 +σ2 +σ2 +σ2 ∑k σ2 ∑k σ2 24 hours 30 hours 12 hours 18 hours j=1 j j=1 j 4 k k It is worth noting that, when population variances are equal (i.e., σ2 = 24 hours σ2 = σ2 = σ2 =σ2), 30 hours 12 hours 18 hours 0.5×(µ +µ )−0.5×(µ +µ ) δ = 24 hours 30 hours 12 hours 18 hours = Bonett 4σ2 4 (9) 0.5×(µ +µ )−0.5×(µ +µ ) ∑k c µ 24 hours 30 hours 12 hours 18 hours = j=1 j j =δ. σ σ In other words, δ is a special case of δ when population variances are all Bonett equal. Based on a large sample approximation, Bonett derived the CI for δ as Bonett follows: σˆ ±z var(δˆ )1/2, (10) Bonett critical Bonett where z is the critical value from the standard normal distribution, δˆ is critical Bonett the sample estimate for the corresponding population δ , and var(δˆ ) is Bonett Bonett 88 CHEN & PENG the sample variance of δˆ . The sample estimate for Bonett’s standardized Bonett linear contrast of means, i.e., δˆ , is obtained from the following equation: Bonett ( ) ( ) 0.5× Y +Y −0.5× Y +Y δˆ = 24 hours 30 hours 12 hours 18 hours = Bonett σˆ2 +σˆ2 +σˆ2 +σˆ2 24 hours 30 hours 12 hours 18 hours 4 (11) ∑k c Y ∑k c Y j=1 j j = j=1 j j . ∑k σˆ2 σˆ j=1 j Bonett k It is worth noting that when sample sizes are all equal, σˆ =σˆ . And Bonett var(δˆ ) is obtained from the following equation: Bonett var(δˆBonett)=4δ2ˆσB2ˆo4nett ×2σ׈24(48 ho−ur1s)+ 2σ׈34(08 ho−urs1)+ 2σ׈14(28 ho−urs1)+ 2σ׈14(88 ho−urs1) Bonett (0.5)2×σˆ2 (0.5)2×σˆ2 (−0.5)2×σˆ2 (−0.5)2×σˆ2 24 hours + 30 hours + 12 hours + 18 hours (8−1) (8−1) (8−1) (8−1) + (12) σˆ2 Bonett c2σˆ2 ∑k j j δˆ2 σˆ4 j=1 df = Bonett ∑k j + j , k2σˆ4 j=12×df σˆ2 Bonett j Bonett where df = the number of participants in the jth group minus 1. j Bonett’s method assumes normality, but not equal variances for the population distributions. When population variances are equal, δ becomes a special case of δ . The SAS® macro “cibonett” (See Appendix B) yields Bonett Bonett’s CI for a standardized linear contrast of means (δ ). To execute this Bonett SAS® macro, readers first create a SAS data set in the DATA step of SAS®, or import the data into SAS®. This step is followed by the specification of a level of confidence, such as .95, and a coefficient for each group. 89
Description: