Modern Mathematical Statistics with Applications Jay L. Devore California Polytechnic State University Kenneth N. Berk Illinois State University Australia ¥ Canada ¥ Mexico ¥ Singapore ¥ Spain ¥ United Kingdom ¥ United States Modern Mathematical Statistics with Applications Jay L. Devore and Kenneth N. Berk Acquisitions Editor: Carolyn Crockett Art Director: Lee Friedman Editorial Assistant: Daniel Geller Print Buyer: Rebecca Cross Technology Project Manager: Fiona Chong Permissions Editor: Joohee Lee Senior Assistant Editor: Ann Day Production Service and Composition: G&S Book Services Marketing Manager: Joseph Rogove Text Designer: Carolyn Deacy Marketing Assistant: Brian Smith Copy Editor: Anita Wagner Marketing Communications Manager: Cover Designer: Eric Adigard Darlene Amidon-Brent Cover Image: Carl Russo Manager,Editorial Production: Kelsey McGee Cover Printer: Phoenix Color Corp Creative Director: Rob Hugel Printer: RR Donnelley-Crawfordsville ' 2007 Duxbury,an imprint of Thomson Brooks/Cole,a part Thomson Higher Education of The Thomson Corporation. Thomson,the Star logo,and 10 Davis Drive Brooks/Cole are trademarks used herein under license. Belmont, CA 94002-3098 USA ALL RIGHTS RESERVED. No part of this work covered by the copyright hereon may be reproduced or used in any form or by any meansgraphic, electronic,or mechanical, including photocopying,recording,taping,web distribution, information storage and retrieval systems,or in any other mannerwithout the written permission of the publisher . Printed in the United States of America 1 2 3 4 5 6 7 09 08 07 06 05 For more information about our products,contact us at: Thomson Learning Academic Resource Center 1-800-423-0563 For permission to use material from this text or product, submit a request online at http://www.thomsonrights.com. Any additional questions about permissions can be submitted by e-mail to [email protected]. Library of Congress Control Number:2005929405 ISBN 0-534-40473-1 To my wife Carol whose continuing support of my writing efforts over the years has made all the difference. To my wife Laura who, as a successful author, is my mentor and role model. About the Authors Jay L. Devore Jay Devore received a B.S. in Engineering Science from the University of California, Berkeley, and a Ph.D. in Statistics from Stanford University. He previously taught at the University of Florida and Oberlin College, and has had visiting positions at Stanford, Harvard, the University of Washington, and New York University. He has been at California Polytechnic State University,San Luis Obispo,since 1977,where he is currently a professor and chair of the Department of Statistics. Jay has previously authored ve other books, including Probability and Statistics for Engineering and the Sciences, currently in its 6th edition. He is a Fellow of the American Statistical Association, an associate editor for the Journal of the American Statistical Association,and received the Distinguished Teaching Award from Cal Poly in 1991. His recreational interests include reading,playing tennis,traveling,and cook- ing and eating good food. Kenneth N. Berk Ken Berk has a B.S. in Physics from Carnegie Tech (now Carnegie Mellon) and a Ph.D. in Mathematics from the University of Minnesota. He is Professor Emeritus of Mathematics at Illinois State University and a Fellow of the American Statistical Association. He founded the Software Reviews section of The American Statistician and edited it for six years. He served as secretary/treasurer,program chair,and chair of the Statistical Computing Section of the American Statistical Association,and he twice co-chaired the Interface Symposium,the main annual meeting in statistical computing. His published work includes papers on time series, statistical computing, regression analysis,and statistical graphics and the book Data Analysis with Microsoft Excel(with Patrick Carey). iii Brief Contents 1 Overview and Descriptive Statistics 1 2 Probability 49 3 Discrete Random Variables and Probability Distributions 94 4 Continuous Random Variables and Probability Distributions 154 5 Joint Probability Distributions 229 6 Statistics and Sampling Distributions 278 7 Point Estimation 325iv 8 Statistical Intervals Based on a Single Sample 375 9 Tests of Hypotheses Based on a Single Sample 417 10 Inferences Based on Two Samples 472 11 The Analysis of Variance 539 12 Regression and Correlation 599 13 Goodness-of-Fit Tests and Categorical Data Analysis 707 14 Alternative Approaches to Inference 743 Appendix Tables 781 Answers to Odd-Numbered Exercises 809 Index 829 iv Contents Preface viii 1 Overview and Descriptive Statistics 1 Introduction 1 1.1 Populations and Samples 2 1.2 Pictorial and Tabular Methods in Descriptive Statistics 9 1.3 Measures of Location 25 1.4 Measures of Variability 33 2 Probability 49 Introduction 49 2.1 Sample Spaces and Events 50 2.2 Axioms, Interpretations, and Properties of Probability 56 2.3 Counting Techniques 65 2.4 Conditional Probability 73 2.5 Independence 83 3 Discrete Random Variables and Probability Distributions 94 Introduction 94 3.1 Random Variables 95 3.2 Probability Distributions for Discrete Random Variables 99 3.3 Expected Values of Discrete Random Variables 109 3.4 Moments and Moment Generating Functions 118 3.5 The Binomial Probability Distribution 125 3.6 *Hypergeometric and Negative Binomial Distributions 134 3.7 *The Poisson Probability Distribution 142 4 Continuous Random Variables and Probability Distributions 154 Introduction 154 4.1 Probability Density Functions and Cumulative Distribution Functions 155 4.2 Expected Values and Moment Generating Functions 167 4.3 The Normal Distribution 175 4.4 *The Gamma Distribution and Its Relatives 190 4.5 *Other Continuous Distributions 198 4.6 *Probability Plots 206 4.7 *Transformations of a Random Variable 216 5 Joint Probability Distributions 229 Introduction 229 5.1 Jointly Distributed Random Variables 230 5.2 Expected Values, Covariance, and Correlation 242 v vi Contents 5.3 *Conditional Distributions 249 5.4 *Transformations of Random Variables 262 5.5 *Order Statistics 267 6 Statistics and Sampling Distributions 278 Introduction 278 6.1 Statistics and Their Distributions 279 6.2 The Distribution of the Sample Mean 291 6.3 The Distribution of a Linear Combination 300 6.4 Distributions Based on a Normal Random Sample 309 Appendix: Proof of the Central Limit Theorem 323 7 Point Estimation 325 Introduction 325 7.1 General Concepts and Criteria 326 7.2 *Methods of Point Estimation 344 7.3 *Sufficiency 355 7.4 *Information and Efficiency 364 8 Statistical Intervals Based on a Single Sample 375 Introduction 375 8.1 Basic Properties of Confidence Intervals 376 8.2 Large-Sample Confidence Intervals for a Population Mean and Proportion 385 8.3 Intervals Based on a Normal Population Distribution 393 8.4 *Confidence Intervals for the Variance and Standard Deviation of a Normal Population 401 8.5 *Bootstrap Confidence Intervals 404 9 Tests of Hypotheses Based on a Single Sample 417 Introduction 417 9.1 Hypotheses and Test Procedures 418 9.2 Tests About a Population Mean 428 9.3 Tests Concerning a Population Proportion 442 9.4 P-Values 448 9.5 *Some Comments on Selecting a Test Procedure 456 10 Inferences Based on Two Samples 472 Introduction 472 10.1 zTests and Confidence Intervals for a Difference Between Two Population Means 473 10.2 The Two-Sample tTest and Confidence Interval 487 10.3 Analysis of Paired Data 497 10.4 Inferences About Two Population Proportions 507 10.5 *Inferences About Two Population Variances 515 10.6 *Comparisons Using the Bootstrap and Permutation Methods 520 11 The Analysis of Variance 539 Introduction 539 11.1 Single-Factor ANOVA 540 11.2 *Multiple Comparisons in ANOVA 552 11.3 *More on Single-Factor ANOVA 560 Contents vii 11.4 *Two-Factor ANOVA with K (cid:2)1 570 ij 11.5 *Two-Factor ANOVA with K > 1 584 ij 12 Regression and Correlation 599 Introduction 599 12.1 The Simple Linear and Logistic Regression Models 600 12.2 Estimating Model Parameters 611 12.3 Inferences About the Regression Coefficient b 626 1 12.4 Inferences Concerning m # and the Prediction of Future YValues 640 Yx* 12.5 Correlation 648 12.6 *Aptness of the Model and Model Checking 660 12.7 *Multiple Regression Analysis 668 12.8 *Regression with Matrices 689 13 Goodness-of-Fit Tests and Categorical Data Analysis 707 Introduction 707 13.1 Goodness-of-Fit Tests When Category Probabilities Are Completely Specified 708 13.2 *Goodness-of-Fit Tests for Composite Hypotheses 716 13.3 Two-Way Contingency Tables 729 14 Alternative Approaches to Inference 743 Introduction 743 14.1 *The Wilcoxon Signed-Rank Test 744 14.2 *The Wilcoxon Rank-Sum Test 752 14.3 *Distribution-Free Confidence Intervals 757 14.4 *Bayesian Methods 762 14.5 *Sequential Methods 770 Appendix Tables 781 A.1 Cumulative Binomial Probabilities 782 A.2 Cumulative Poisson Probabilities 784 A.3 Standard Normal Curve Areas 786 A.4 The Incomplete Gamma Function 788 A.5 Critical Values for tDistributions 789 A.6 Tolerance Critical Values for Normal Population Distributions 790 A.7 Critical Values for Chi-Squared Distributions 791 A.8 tCurve Tail Areas 792 A.9 Critical Values for FDistributions 794 A.10 Critical Values for Studentized Range Distributions 800 A.11 Chi-Squared Curve Tail Areas 801 A.12 Critical Values for the Ryan–Joiner Test of Normality 803 A.13 Critical Values for the Wilcoxon Signed-Rank Test 804 A.14 Critical Values for the Wilcoxon Rank-Sum Test 805 A.15 Critical Values for the Wilcoxon Signed-Rank Interval 806 A.16 Critical Values for the Wilcoxon Rank-Sum Interval 807 A.17 bCurves for tTests 808 Answers to Odd-Numbered Exercises 809 Index 829 Preface Purpose Our objective is to provide a postcalculus introduction to the discipline of statistics that ¥ Has mathematical integrity and contains someunderlying theory. ¥ Shows students a broad range of applications involving real data. ¥ Is very current in its selection of topics. ¥ Illustrates the importance of statistical software. ¥ Is accessible to a wide audience, including mathematics and statistics majors (yes, there are a few of the latter),prospective engineers and scientists,and those business and social science majors interested in the quantitative aspects of their disciplines. A number of currently available mathematical statistics texts are heavily orient- ed toward a rigorous mathematical development of probability and statistics, with much emphasis on theorems,proofs,and derivations. The emphasis is more on mathe- matics than on statistical practice. Even when applied material is included,the scenar- ios are often contrived (many examples and exercises involving dice, coins, cards, widgets,or a comparison of treatment A to treatment B). So in our exposition we have tried to achieve a balance between mathematical foundations and statistical practice. Some may feel discomfort on grounds that because a mathematical statistics course has traditionally been a feeder into graduate programs in statistics,students coming out of such a course must be well prepared for that path. But that view presumes that the mathematics will provide the hook to get students interested in our discipline. That may happen for a few mathematics majors. However, our experience is that the application of statistics to real-world problems is far more persuasive in getting quantitatively oriented students to pursue a career or take further coursework in statistics. Lets rst draw them in with intriguing problem scenarios and applications. Opportunities for exposing them to mathematical foundations will follow in due course. In our view it is more important for students coming out of this course to be able to carry out and interpret the results of a two-sample ttest or simple regres- sion analysis than to manipulate joint moment generating functions or discourse on var- ious modes of convergence. Content The book certainly does include core material in probability (Chapter 2),random vari- ables and their distributions (Chapters 3—5),and sampling theory (Chapter 6). But our desire to balance theory with application/data analysis is reected in the way the book starts out, with a chapter on descriptive and exploratory statistical techniques rather than an immediate foray into the axioms of probability and their consequences. After viii