ebook img

van der Ark, LA PDF

26 Pages·2017·0.21 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview van der Ark, LA

Tilburg University Investigation and treatment of missing scores in test and questionnaire data Sijtsma, K.; van der Ark, L.A. Published in: Multivariate Behavioral Research Publication date: 2003 Document Version Publisher's PDF, also known as Version of record Link to publication in Tilburg University Research Portal Citation for published version (APA): Sijtsma, K., & van der Ark, L. A. (2003). Investigation and treatment of missing scores in test and questionnaire data. Multivariate Behavioral Research, 38(4), 505-528. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 18. mrt.. 2023 This article was downloaded by:[Universiteit van Tilburg] On:25 April 2008 Access Details:[subscription number 776119207] Publisher:Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Multivariate Behavioral Research Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t775653673 Investigation and Treatment of Missing Item Scores in Test and Questionnaire Data Klaas Sijtsmaa; L. Andries van der Arka aTilburg University. Online Publication Date:01 January 2003 To cite this Article:Sijtsma, Klaas and van der Ark, L. Andries (2003) 'Investigation and Treatment of Missing Item Scores in Test and Questionnaire Data', Multivariate Behavioral Research, 38:4, 505 - 528 To link to this article: DOI:10.1207/s15327906mbr3804_4 URL:http://dx.doi.org/10.1207/s15327906mbr3804_4 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use:http://www.informaworld.com/terms-and-conditions-of-access.pdf Thisarticlemaybeusedforresearch,teachingandprivatestudypurposes.Anysubstantialorsystematicreproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independentlyverifiedwithprimarysources.Thepublishershallnotbeliableforanyloss,actions,claims,proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Multivariate Behavioral Research, 38 (4), 505-528 Copyright © 2003, Lawrence Erlbaum Associates, Inc. 8 0 Investigation and Treatment of Missing Item Scores 0 2 pril in Test and Questionnaire Data A 5 2 0 2:1 Klaas Sijtsma and L. Andries van der Ark 1 At: Tilburg University g] bur This article first discusses a statistical test for investigating whether or not the pattern of n Til missing scores in a respondent-by-item data matrix is random. Since this is an asymptotic va test, we investigate whether it is useful in small but realistic sample sizes. Then, we discuss eit two known simple imputation methods, person mean (PM) and two-way (TW) ersit imputation, and we propose two new imputation methods, response-function (RF) and v ni mean response-function (MRF) imputation. These methods are based on few assumptions U y: [ about the data structure. An empirical data example with simulated missing item scores B shows that the new method RF was superior to the methods PM, TW, and MRF in d e recovering from incomplete data several statistical properties of the original complete data. d a o Methods TW and RF are useful both when item score missingness is ignorable and nl w nonignorable. o D Introduction A well known problem in data collection using tests and questionnaires is that several item scores may be missing from the n respondents by J items data matrix, X. This may occur for several reasons, often unknown to the researcher. For example, the respondent may have missed a particular item, missed a whole page of items, saved the item for later and then forgot about it, did not know the answer and then left it open, became bored while making the test or questionnaire and skipped a few items, felt the item was embarrassing (e.g., questions about one’s sexual habits), threatening (questions about the relationship with one’s children), or intrusive to privacy (questions about one’s income and consumer habits), or felt otherwise uneasy and reluctant to answer. The literature is abundant with methods for handling missing data. For example, Little and Schenker (1995) and Smits, Mellenbergh, and Vorst (2002) discuss and compare a large number of simple and more advanced methods. Several methods are rather involved and, as a result, sometimes perhaps beyond the reach of individual psychological and educational researchers who are not trained statisticians or psychometricians. One Correspondence concerning this article should be addressed to Klaas Sijtsma, Department of Methodology and Statistics, FSW, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands; e-mail: [email protected] MULTIVARIATE BEHAVIORAL RESEARCH 505 K. Sijtsma and L. van der Ark example is the EM method (Dempster, Laird, & Rubin, 1977; Rubin, 1991) that alternately estimates the missing data, then updates the parameter estimates of interest, uses these to re-estimate the missing data, and so on, 8 00 until the algorithm converges to, for example, maximum likelihood estimates. 2 pril Another example is multiple imputation (e.g., Little & Rubin, 1987). Here, w A 5 complete data matrices are estimated by imputing for a respondent having 2 0 missing data, for example, scores of sets of other respondents with complete 1 2: 1 data that are similar to the respondent’s available data. Then, statistics based g] At: on the w (usually a surprisingly small number; see Rubin, 1991) complete data ur matrices, are averaged to obtain parameter estimates and standard errors. b n Til Data augmentation (Schafer, 1997; Tanner & Wong, 1987) is an iterative a v Bayesian procedure that resembles the EM method and also incorporates eit sit features of multiple imputation (Little & Schenker, 1995). er v Our starting point was that many researchers do not have a statistician or a ni y: [U psychometrician in their vicinity who is available to help them implement these B superior but complex and involved missing data handling methods. Those d de researchers may be better off using simpler methods, that are easy to implement a o nl and lead to results approaching the quality of EM and multiple imputation. A w Do circumstance favorable for these simpler methods to succeed is that the items in a test measure the same underlying ability or trait and, thus, the observed item scores contain much information about the missing item scores. This helps to obtain reasonable estimates of missing item scores, even with simple methods. However, first we investigated whether an asymptotic statistical test (Huisman, 1999) for the hypothesis that the pattern of missing item scores in a data matrix X is random (to be explained later on), is useful in small but realistic sample sizes. This test may be seen as a useful precursor for item score imputation: When its conclusion is that item score missingness is random, the researcher can safely use a sensible item score imputation method to produce a complete data matrix. When item score missingness is not random, imputation methods must be robust so as to produce a data matrix that is not heavily biased. We investigated this robustness issue in a real data example for four imputation methods. Two simple methods were known (e.g., Bernaards & Sijtsma, 2000), and two others were new proposals based on concepts from item response theory (IRT), but without using strong assumptions about the data structure. Before we continue, it may be noted that a purely statistical approach of the missing data problem may be too simple in some cases. For example, when one item produces most of the missing scores then, depending on the research context, the item may simply be deleted from further research (e.g., it was printed on the back of the page and therefore missed by many), it may be reformulated (e.g., positively worded instead of negatively, which caused 506 MULTIVARIATE BEHAVIORAL RESEARCH K. Sijtsma and L. van der Ark confusion) in future research, or it may be replaced (e.g., respondents did not understand what was asked of them). Thus, the statistical treatment of missing item scores should be considered in combination with other courses of action. 8 0 0 2 pril Types of Missing Item Scores A 5 2 0 The next example item was taken from a questionnaire that measures 1 2: 1 people’s tendency to cry (Vingerhoets & Cornelius, 2001): At: g] ur I cry when I experience opposition from someone else b n Til Never (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) Always a v eit sit In general, for a particular respondent or group of respondents nonresponse er v may depend on: ni y: [U 1. The missing value on that item. For example, belonging to the right-most B “Always” group may imply a stronger nonresponse tendency than belonging d de to the left-most “Never” group. Consequently, any missing data method based a o nl on available item scores would underestimate the missing value. w Do 2. Values of the other observed items or covariates. For example, for men it may be more difficult to give a rating in the three boxes to the right (showing endorsement or partial endorsement) than for women. Thus, gender has a relation with item score missingness and this can be used for estimating the missing item scores. 3. Values of variables that were not part of the investigation. For example, nonresponse may depend on the unobserved verbal comprehension level of the respondents or on their general intelligence. This kind of missingness is relevant only if the unobserved variables are related to the observed variables, and have an impact on the answers to the items in the test. Item scores are missing completely at random (MCAR; see Little & Rubin, 1987, pp. 14-17) if the cause of missingness is unrelated to the missing values themselves, the scores on the other observed items and the observed covariates, and the scores on unobserved variables. Thus, item score missingness is ignorable because the observed data are a random sample from the complete data. After listwise deletion, statistical analysis of the resulting smaller data set results in less statistical accuracy and less power when testing hypotheses, but unbiased parameter estimates. When nonresponse depends on another variable from the data set, but not on values of the item itself or on unobserved variables, item scores are missing at random (MAR; see Little & Rubin, 1987, pp. 14-17). For example, men may find it more difficult to answer “always” to the example item than women, resulting in more missing item scores for men. The distributions of MULTIVARIATE BEHAVIORAL RESEARCH 507 K. Sijtsma and L. van der Ark item scores are different between men and women, but the distributions are the same for respondents and nonrespondents in both groups. Note that within the groups of men and women we have MCAR (given that no other 8 00 variables relate to item score missingness). This means that if, for example, 2 pril a regression analysis contains gender as a dummy variable the estimates of A 5 the regression coefficients for both groups are unbiased. Thus, when 2 0 missingness is of the MAR type it is also ignorable. 1 2: 1 When missingness is not MCAR or MAR, the observed data are not a g] At: random sample from the original sample or from subsamples. Thus, the ur missingness is nonignorable. In practice, a researcher can only observe that b n Til item scores are missing. To decide whether item score missingness is a v ignorable or nonignorable, he/she has to rely on the pattern of item score eit sit missingness in the data matrix, X. When he/she finds no relationships to other er v observed variables, he/she may decide that the missingness is of the MCAR ni y: [U type. When a relationship to other observed variables is found, he/she may B use these variables as covariates in multivariate analyses or to impute d de scores. When a more complex pattern of relationships is found, item score a o nl missingness may be considered nonignorable. A reasonable solution is to w Do impute scores when the imputation method is backed up by robustness studies (e.g., Bernaards & Sijtsma, 2000, for factor analysis of rating scale data; and Huisman & Molenaar, 2001, in the context of test construction). Missing Item Score Analysis Theory for Analysis of the Whole Data Matrix The scores on the J items are collected in J random variables X, j = 1, ..., j J. For respondent i (i = 1, ..., n), the J item scores, X , have realizations x . Let ij ij M be an indicator of a missing score with realization m ; m = 0 if X is ij ij ij ij observed and m = 1 if X is missing. These missingness indicators are ij ij collected in an n × J matrix M. Huisman (1999; Kim & Curry, 1978) investigated whether or not the pattern of missingness in the data matrix X is unrelated among items. This is called random missingness and is defined as follows. Frequency counts of observed missing scores and expected missing scores are compared, given statistical independence of the missingness between the items. Thus, whether a respondent misses the score on item j is unrelated to whether he (or she) misses the score on item k. Items j and k may have different proportions of missing scores. A more restricted assumption, to be used later on, is that the proportions for all J items are equal, as is typical of MCAR. It may be noted that MCAR implies random missingness. 508 MULTIVARIATE BEHAVIORAL RESEARCH K. Sijtsma and L. van der Ark Huisman (1999) classifies each respondent in the sample into one of J + 2 classes: (a) NM (No Missing): none of the item scores in a pattern are missing; (b) M (Missing on item j): a score is missing only on item j; and (c) 8 j 00 MM (Multiple Missings): scores are missing on at least two items. 2 April Let qj = (cid:1)iMij/n be the proportion of missing values on item j in the 5 sample and let p = 1 – q be the proportion of observed values on item j. Then, 2 j j 0 under the assumption of random missingness (as defined above), the 1 2: 1 expected values for NM, M, and MM are At: j g] ur b Til J an E(NM)=n∏p ; v j eit j=1 versit E(M )= qj E(NM); and Uni j p d By: [ E(MM)=n−j E(NM)−∑J E(M ). de j oa j=1 nl w o D The observed frequencies in these J + 2 classes are denoted by O(NM), O(M), and O(MM). Under the assumption of random missingness j Pearson’s chi-squared statistic, (1) X2 =[O(NM)(−E()NM)]2 +∑J O(Mj)(−E()Mj)2 +[O(MM)(−E()MM)]2, E NM E M E MM j=1 j has a (cid:1)2 distribution with J + 1 degrees of freedom as n → (cid:2) (see, e.g., Agresti, 1990, pp. 44-45). For n = 8, Table 1 shows an incomplete data matrix X and the corresponding missingness indicator matrix, M. This example is used to calculate the X2 statistic (Equation 1). Because p = 1, we have that 2 E(M ) = 0; this is a structural zero, which is ignored in the computation of X2 2 at the cost of one degree of freedom. Table 2 shows the observed and the expected frequencies that result in X2 = 1.65 (df = 5). Given the small sample size, it makes no sense to draw any inferences on the basis of the outcome. Robustness of X2 Statistic for Small Samples Problem Definition. The robustness of Huisman’s (1999) asymptotic test for small (realistic) samples is important. For similar expected frequencies in each of the J + 1 classes, Koehler and Larntz (1980) found that MULTIVARIATE BEHAVIORAL RESEARCH 509 K. Sijtsma and L. van der Ark 8 0 0 2 Table 1 pril Artificial Data Matrix X Containing Missing Scores (Blanks), and A 25 Corresponding Missingness Indicator Matrix M 0 1 2: 1 At: Case Variables Missingness Indicators g] ur b X X X X X M M M M M Til 1 2 3 4 5 1 2 3 4 5 n a v eit 1 2 1 1 0 0 0 1 1 ersit 2 3 5 4 5 5 0 0 0 0 0 v ni 3 4 3 3 4 0 0 1 0 0 U y: [ 4 1 1 1 3 2 0 0 0 0 0 B d 5 3 3 4 1 0 0 1 0 e d a 6 5 5 3 5 0 0 0 1 0 o nl w 7 1 3 2 2 2 0 0 0 0 0 o D 8 3 3 1 2 0 0 0 0 1 q .125 .0 .125 .375 .25 j p .875 1.0 .875 .625 .75 j Table 2 Expected and Observed Frequencies for the Data in Table 1 Frequency Expected Observed NM 2.87 3 M 0.41 0 1 M 0.41 1 3 M 1.72 1 4 M 0.96 1 5 MM 1.63 2 510 MULTIVARIATE BEHAVIORAL RESEARCH K. Sijtsma and L. van der Ark statistic X2 approximates a chi-squared distribution when n > (cid:2)10 × (J + 1), given that n > 10 and J > 2. This rule does not apply when expected 8 frequencies are dissimilar, as in Huisman’s derivation of the expected 0 0 frequencies assuming random missingness. Now, if we assume the stronger 2 pril null-hypothesis of MCAR, under Huisman’s classification the expected 5 A frequencies depend on the mean proportion of missing values, q =∑q /J , 2 j 10 and test length, J, resulting in 2: 1 At: burg] E(NM)=n(1−q)J , n Til E(M )=nq(1−q)J−1, and va (2) j siteit E(MM)=n1−(1−q)J −Jq(1−q)J−1. er v ni U y: [ Note that as with Koehler and Larntz’s study the E(M)s are all equal, but that B ed the other two expected frequencies are different frojm this value. Because d a o of this dissimilarity, we investigated whether the conditions given by Koehler nl ow and Larntz for X2 to approximate a chi-squared statistic also hold here. D Simulation Study on Robustness. For different combinations of n, q, and J (i.e., n = 10, 20, 50, 100, 200, 500, 1000, 2000; q = 0.01, 0.05, 0.10; and J = 10, 20), missingness indicator matrices, M, were simulated. The elements of M were drawn from the multinomial distribution with probabilities based on Equation 2. Table 3 shows the multinomial distributions of the expected scores for q = 0.01, 0.05, 0.10; and J = 10, 20 (these distributions are the same for different n). The last two rows give evenly distributed classes, corresponding to Koehler and Larntz’s (1980) study. The last two columns give the sample sizes needed such that the Type I error rate approximates well the nominal significance level, (cid:3) = 0.05, under a chi-squared distribution. Column n gives the sample sizes that resulted in a relatively close accurate approximation (Type I error rates between 0.050 and 0.055), and Column n gives the sample sizes that resulted in less accurate Type I error inaccurate rates (between 0.050 and 0.080). If the sample size was smaller than indicated in the last two columns, the Type I error rate was less accurate and always exceeded 0.05. This means that for smaller sample sizes MCAR was supported too often. Table 3 shows that the required sample size for X2 is smallest when the expected proportions are evenly distributed, as in Koehler and Larntz’s study. Moreover, if the E(M)s are small (e.g., when q = 0.01) j the required sample size increases rapidly. MULTIVARIATE BEHAVIORAL RESEARCH 511 K. Sijtsma and L. van der Ark Table 3 Distribution of the Multinomial Resulting from Huisman’s Classification, and Sample Sizes Needed to Approximate the Correct Nominal Type I Error Rate 8 0 0 2 April q J E(NM)/n E(Mj)/n E(MM)/n naccurate ninaccurate 5 2 0 .01 10 .9044 .0091 .0046 1000 100 1 2: 1 20 .8179 .0083 .0161 1000 100 At: g] ur .05 10 .5987 .0315 .0863 100 20 b n Til 20 .3585 .0187 .2675 500 50 a v eit sit .10 10 .3487 .0387 .2543 100 20 er v 20 .1216 .0135 .6084 500 100 ni U y: [ B 10 .0833 .0833 .0833 50 10 d de 20 .0455 .0455 .0455 100 20 a o nl w o D Discussion. For a test of reasonable length (J = 20) and for little nonresponse (q = 0.01, as in a rather well-controlled data collection procedure), n = 1000 is needed for the Type I error rate to match the nominal error rate. For higher percentages of nonresponse, smaller samples (n = 500) will yield this result. Given the limitations of this simulation, as a rule of the thumb for trusting the p-values of the chi-squared statistics one can compute various power divergence statistics (Cressie & Read, 1984) and compare the differences. Power divergence statistics for Huisman’s classification are given by, S=(cid:1)((cid:1)2+1)∑jJ=1O(Mj)OE((MMjj))(cid:1)+O(NM)OE((NNMM))(cid:1)+O(MM)OE((MMMM))(cid:1). The power divergence statistic S equals X2 for (cid:4) = 1, the likelihood ratio statistic G2 for (cid:4) → 0, Neyman’s modified X2 for (cid:4) = –2, the Cressie-Read statistic (CR) for (cid:4) = 2/3, and the Freeman-Tukey statistic for (cid:4) = –1/2 (see, e.g., Agresti, 1990, p. 249). Asymptotically, all power divergence statistics converge to a chi-squared distribution. Differences between the various power divergence statistics may occur when the sample size is too small, and then the resulting p-values should be mistrusted. Koehler and Larntz (1980; 512 MULTIVARIATE BEHAVIORAL RESEARCH

Description:
Multivariate Behavioral Research. Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t775653673. Investigation and Treatment of Missing Item Scores in. Test and Questionnaire Data. Klaas Sijtsma a; L. Andries van der
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.