Optimal Resource Allocation in Adaptive Survey Designs Calinescu Melania, 1983 - Optimal Resource Allocation in Adaptive Survey Designs ISBN 978-90-820349-1-2 ⃝c I. M. Calinescu, Amsterdam 2013 Allrightsreserved. Nopartofthispublicationmaybereproducedunanyformorby anyelectronicormechanicalmeans(includingphotocopying,recordingorinformation storage and retrieval systems) without permission in writing from the author. Cover design by Jakub Peˇc`anka and Melania Calinescu. The front cover visualizes world internet connectivity using data from 2009 (sources: www.nationmaster.com and CIA’s World Factbook). Each country is depicted as a circle with radius given by a logarithmic transformation of the country’s population. In each pie chart the green colour indicates the percentage of the country’s population that has access to internet. Printed by GVO Drukkers & Vormgevers B.V. | Ponsen & Looijen, Ede VRIJE UNIVERSITEIT Optimal Resource Allocation in Adaptive Survey Designs ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. F.A. van der Duyn Schouten, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de Faculteit der Exacte Wetenschappen op woensdag 13 november 2013 om 9.45 uur in de aula van de universiteit, De Boelelaan 1105 door Ionela Melania Calinescu geboren te Cimpulung, Roemeni¨e promotor: prof.dr. G.M. Koole copromotoren: dr. S. Bhulai dr.ir. J.G. Schouten Acknowledgements Obtaining a PhD degree is a long journey full of tough and sweet moments that requires curiosity and enthusiasm for discovery, but frustration and desperation are rarelyfaraway. Onthisjourney,manypeoplehavehelpedmegetthroughthetougher moments, but also shared with me the satisfaction of a job well done. The time has come for me to thank them for all the support I received. Mydeepestgratitudegoestomysupervisors,SandjaiBhulaiandBarrySchouten. Sandjai, if it was not for your encouragement to take on this project I would not be in this position today. There are many things I value about working with you. I admireyourconstantenthusiasmandtalentfordevisingelegantsolutionstocomplex problems. ItwasessentialtomysuccessthatIcouldalwayscountonyoutoanswermy questions, whether they regarded research, administration or personal development. I very much appreciated our weekly meetings, which made me work hard to raise interesting questions and even harder to find suitable answers. I hope there will be many occasions in the future to continue our discussions. Barry,Ihavealwaysappreciatedandfeltinspiredbyyourpatienceanddedication toyourwork. Yourcarefordetailtaughtmehowtobemoreprecisewhileyourdiverse perspectivesontheprojecthelpedourresearchgrowataveryfastpace. Ihavelearned a lot about survey methodology during our regular meetings “op de gang” as well as the “do’s” and “don’ts” in survey practice during the surprisingly adventurous trips to Heerlen. Your constant interest in promoting our results enabled me to become a contributor to a Wiley Series book and meet some of the most influential researchers in the field. To my promotor, Ger Koole, thank you for giving me the opportunity to join the OBP group and for the inspiring brainstorming sessions in the Alps. I also want to thank the reading committee members, James Wagner, Annemieke Luiten,MathiscadeGunst,BertZwartandRommertDekkerfortheircarefulreading of my thesis, interesting feedback and for dealing with the flood of emails about my vi Acknowledgements defence date. Additional thanks go to my research group at VU for the great time we had to- gether. ThankyouAlexforkeepingacriticaleyeonmyDutchemailsandassembling a fluent samenvatting from the chaotic pieces I sent you. To the one-day-a-week re- searchers,formerandcurrent,thankyouformakingThursdaytheliveliestdayofthe week. Masha, Demeter, thank you for fun pancake evenings. Alwin, thank you for the pleasant three years in R-550 and lovely time together with Sylvia. To the statistics group at VU, thank you for the joyful lunches and coffee breaks, alas far too rarely enriched with cakes. I believe I owe at least one piece of cake to many of you, and I promise to pay my dues at my defence party. Beata, thank you for lovely mid-afternoon chats and for helping me find my passion by taking me to the best dance class ever. To my colleagues at Centraal Bureau voor de Statistiek, thank you for making my time as a PhD student sometimes seem like a serious office job and for trying to teach me a little Dutch as well. Nino, you brightened up my cloudiest days, I am truly grateful for having had you there. Henk, Fatima, thank you both for the many humorous moments in the office. Jan, Sander, thank you for solving my CBS- beginner questions. Marriette, Martijn, thank you for clarifying some of the many data collection mysteries. ToTargol,youhaveknownmygoodsandbadsforthelongest,thankyouforbeing my best friend all this time. To my Romanian friends, thank you for making home seem just a step away, especially on December 1st. To my dear friends from home, thank you for having me over and finding time on short notice to dine and chat. To my Dutch friends, Laurens, Jeanine, Max, Ivar, thank you for great parties, patient Dutch lessons and tasty dinners. Thank you Jan, Jarda, Tom´aˇs, Nikke, Fulvio, Raja for fun trips and dinners together. ToJakub,withwhomIhavesharedmypastsixyears,thankyouforyourconstant and creative help, for stirring my curiosity to learn many new things and for a spicy combination of happy moments and tough life lessons. Last but surely not least, to my family, Sister, Mom and Dad, thank you for your unconditional love and support and for the goodies packages that so often brought a little bit of home all the way to Amsterdam! Melania Calinescu September 2013 Contents 1 Introduction 1 1.1 Surveys in practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Adaptive survey designs and Markov decision theory: an intro- duction 11 2.1 Resource allocation problems . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Markov decision theory . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 A brief introduction to sample surveys . . . . . . . . . . . . . . . . . . 20 2.4 Adaptive survey designs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 The survey resource allocation problem 31 3.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Adaptive survey design policies . . . . . . . . . . . . . . . . . . . . . . 36 3.3 Budget and capacity constraints . . . . . . . . . . . . . . . . . . . . . 39 3.4 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 The survey resource allocation problem for multiple quality indi- cators 49 4.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 The two-step algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5 The survey resource allocation problem and measurement errors 65 5.1 Measurement errors in surveys: an introduction . . . . . . . . . . . . . 66 viii Contents 5.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Problem solving technique . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Case study: the Dutch Labor Force Survey . . . . . . . . . . . . . . . 73 5.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6 Appendix: additional optimization results . . . . . . . . . . . . . . . . 86 6 Adaptive survey designs to minimize survey mode effects 91 6.1 Survey mode effects: an introduction . . . . . . . . . . . . . . . . . . . 93 6.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.4 Case study: the Dutch Labor Force Survey . . . . . . . . . . . . . . . 100 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7 Dynamic learning in adaptive survey designs 119 7.1 Literature review multi-armed bandit problems . . . . . . . . . . . . . 120 7.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.3 Solving the budgeted MAB via dynamic programming . . . . . . . . . 127 7.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8 Future research directions 141 Bibliography 144 Summary 157 Samenvatting 161 1 Introduction How did the recent global financial crisis change the world’s economic landscape? What are the repercussions of current economic and political policies on the future societaldevelopment,asquantifiedthroughindicatorssuchastheunemploymentrate, average household income, consumer confidence index? To answer such questions, policy makers have to collect information from the population and summarize it in a meaningful way. This is where survey organizations and statistical bureaus play a crucial role. Collecting information from the entire population requires significant amountsoftimeandmoney. Alternatively,asamplesurveymaybeconducted,where onlyasamplefromthespecifiedpopulationisrequestedtoprovideinformation. Using the results from the survey sample, knowledge can be obtained about the population of interest. 1.1 Surveys in practice Surveysareusedallaroundtheworldtomeasuresocio-economicstatusandwell-being ofpeople,totesttheories,andmakepolicydecisions. However,thedifferentstatistics computed from the survey data are of interest only if they accurately describe the correspondingpopulationattribute. Multiplefactorsplayarolethroughoutthecourse of a survey from its planning to the final systematization of the results. Some factors may disrupt the framework of statistical inference theory and sampling theory that grant methods to describe accurately (enough) population’s characteristics given the surveysampleresults. Suchfactorsarepeople’slackofunderstanding(orinterest)as towhytheyhavebeenselectedtoparticipateinasurvey,theirattitudetowardsareas such as privacy and confidentiality of personal information, and the influence exerted by the attributes of the survey design on their decision to participate in the survey. Addressing these factors and related social science questions is an integral part of 2 Surveys in practice 3 0. Nonresponse rate=0.05 Nonresponse rate=0.3 Nonresponse rate=0.5 2 0. as 0.1 bi nt e d on 0 p s e nr No 0.1 − 2 0. − 3 0. − 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Nonrespondent mean Figure 1.1: Level of nonresponse bias for various nonresponse rates and nonre- sponse means; respondent mean fixed at 0.50. surveyresearch. Tremendousefforthasbeeninvestedintounderstandinghowhuman behavior and thought may impact the precision and accuracy of survey statistics and how the effects may be reduced or adjusted (see overviews in Groves et al. 2002, Lepkowski et al. 2007 and Bethlehem et al. 2011). In a perfect world, all sampled population units would be willing to participate in the survey and provide all the requested data. In practical situations, however, information from some sample units is missing due to factors such as those listed above. This is called nonresponse and it is one of the most studied errors in the survey literature. Classic inferential properties of sample estimates require these statistics be computed from the entire sample. One example statistic is the sample mean as an estimator of the population mean. In the presence of nonresponse the sample mean is reduced to the respondent mean, i.e., the sample mean is obtained based only on information coming from the pool of respondents. The deviation of the respondent mean from the full sample mean is called nonresponse bias and it is a function of the nonresponse rate (i.e., the proportion of nonrespondents in the entire sample)andthedifferencebetweentherespondentandnonrespondentmeans. Figure 1.1 (from Groves and Couper 1998) illustrates the consequences of nonresponse rates on the precision of the survey estimates. Given the respondent mean of 0.50,
Description: