ebook img

Principles of Biostatistics PDF

584 Pages·283.88 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Principles of Biostatistics

Principles of Biostatistics second edition Marcello Pagano Harvard School of Public Health Kimberlee Gauvreau Harvard Medical School This edition is a reprint of the second edition published in 2000 by Brooks/Cole and then Cengage Learning. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2018 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper International Standard Book Number-13: 978-1-138-59314-5 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reli- able data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright. com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for- profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com This book is dedicated with love to Phyllis, John-Paul, Marisa, Loris, Alice and Lilian. Neil and Eliza. Preface This book was written for students of the health sciences and serves as an introduction to the study of biostatistics, or the use of numerical techniques to extract information from data and facts. Because numbers are more precise than words, they are particu- larly well suited for communicating scientific results. However, just as one can lie with words, one can also lie with numbers. Indeed, numbers and lies have been linked for quite some time; there is even a book entitled How to Lie with Statistics. This association may owe its origin, or its affirmation at the very least, to the British prime minister Benjamin Disraeli. Disraeli is credited by Mark Twain as having said "There are three kinds of lies: lies, damned lies, and statis- tics." One has only to observe any modem political campaign to be convinced of the abuse of statistics. But enough about lies; this book adopts the position of Professor Frederick Mosteller, who said "It is easy to lie with statistics, but it is easier to lie with- out them." Background Principles of Biostatistics is aimed at students in the biological and health sciences who wish to learn modem research methods. It is based on a required course offered at the Harvard School of Public Health. In addition to these graduate students, a large num- ber of health professionals from the Harvard medical area attend as well. The course is as old as the School itself, which attests to its importance. It spans 16 weeks oflectures and laboratory sessions. Each week includes two 50-minute lectures and one 2-hour lab. The entire class is together for the lectures, but is divided into smaller groups headed by teaching assistants for the lab sessions. These labs reinforce the material covered in the lectures, review the homework assignments, and introduce the computer into the course. We have included the lab materials-except those dealing with the homework assignments and specific computer commands-in the sections labeled Further Appli- cations. These sections present either additional examples or a different perspective on the material covered in a chapter. They are designed to provoke discussion, although they are sufficiently complete for an individual who is not using the book as a course text to benefit from reading them. This book has evolved to include topics that we believe can be covered at some depth in one American semester. Clearly, some choices had to be made; we hope that we have chosen well. In our course, we have sufficient time to cover most of the topics in the first 20 chapters. However, there is enough material presented to allow the in- structor some flexibility. For example, some instructors may choose to omit the sections v vi Preface covering grouped data (Section 3.3), Chebychev's inequality (Section 3.4), and the Poisson distribution (Section 7.3), or the chapter on analysis of variance (Chapter 12), if they consider these concepts to be less important than others. Structure Some say that statistics is the study of variability and uncertainty. We believe there is truth to this adage, and have used it as a guide in dividing the book into three parts. The first five chapters deal with collections of numbers and ways in which to summarize, explore, and explain them. The next two chapters focus on probability and serve as an introduction to the tools needed for the subsequent investigation of uncertainty. It is only in the eighth chapter and thereafter that we distinguish between populations and sam- ples and begin to investigate the inherent variability introduced by sampling, thus pro- gressing to inference. We think that this modular introduction to the quantification of uncertainty is justified by the success achieved by our students. Postponing the slightly more difficult concepts until a solid foundation has been established makes it easier for the reader to comprehend them. Data Sets and Examples Throughout the text we have used data drawn from published studies to exemplify bio- statistical concepts. Not only is real data more meaningful, it is usually more interest- ing as well. Of course, we do not wish to use examples in which the subject matter is too esoteric or too complex. To this end, we have been guided by the backgrounds and interests of our students-primarily topics in public health and clinical research-to choose examples that best illustrate the concepts at hand. There is some risk involved in using published data. We cannot guarantee that all of the examples are honest and that the data were properly collected; for this we must rely on the reputations of our sources. We do not belittle the importance of this consid- eration. The value of our inference depends critically on the worth of the data, and we strongly recommend that a good deal of effort be expended on evaluating its quality. We assume that this is understood by the reader. More than once we have used examples in which the population of the United States is broken down along racial lines. In reporting these official statistics we follow the lead of the government agencies that release them. We do not wish to reify this racial categorization, since in fact the observed differences may well be due to socioeconomic factors rather than the implied racial ones. One option would be to ignore these statis- tics; however, this would hide inequities which exist in our health system-inequities that need to be eliminated. We focus attention on the problem in the hope of stimulat- ing interest in promoting solutions. We have minimized the use of mathematical notation because of its well-deserved reputation of being the ultimate jargon. If used excessively, it can intimidate even the most ardent scholar. We do not wish to eliminate it entirely, however; it has been de- veloped over the ages to be helpful in communicating results. We hope that in this re- spect we have written a succinct and understandable text. Preface vii Over and above their precision, there is something more to numbers-maybe a little magic-that makes them fun to study. The fun is in the conceptualization more than the calculations, and we are fortunate that we have the computer to do the drudge work. This allows students to concentrate on the ideas. In other words, the computer al- lows the instructor to teach the poetry of statistics and not the plumbing. Computing To take advantage of the computer, one needs a good statistical package. We use Stata, which is available from the Stata Corporation in College Station, Texas. We find this statistical package to be one of the best on the market today; it is user-friendly, accu- rate, powerful, reasonably priced, and works on a number of different platforms, in- cluding Windows, Unix, and Macintosh. Furthermore, the output from this package is acceptable to the Federal Drug Administration in New Drug Approval submissions. Other packages are available, and this book can be supplemented by any one of them. In this second edition, we also present output from SAS and Mini tab in the Further Ap- plications section of each chapter. We strongly recommend that some statistical pack- age be used. Some of the review exercises in the text require the use of the computer. To help the reader, we have included the data sets used in these exercises both in Appendix B and on a CD at the back of the book. The CD contains each data set in two different for- mats: an ASCII file (the "raw" suffix) and a Stata file (the "dta" suffix). There are also many exercises that do not require the computer. As always, active learning yields bet- ter results than passive observation. To this end, we cannot stress enough the importance of the review exercises, and urge the reader to attempt as many as time permits. New to the Second Edition This second edition includes revised and expanded discussions on many topics through- out the book, and additional figures to help clarify concepts. Previously used data sets, especially official statistics reported by government agencies, have been updated when- ever possible. Many new data sets and examples have been included; data sets described in the text are now contained on the CD enclosed with the book. Tables containing exact probabilities for the binomial and Poisson distributions (generated by Stata) have been added to Appendix A. As previously mentioned, we now incorporate computer output from SAS and Minitab as well as Stata in the Further Applications sections. We have also added numerous new exercises, including questions reviewing the basic concepts covered in each chapter. Acknowledgements A debt of gratitude is owed a number of people: Harvard University President Derek Bok for providing the support which got this book off the ground, Dr. Michael K. Martin for calculating Tables A.3 through A.8 in Appendix A, and John-Paul Pagano for viii Preface assisting in the editing of the first edition. We thank the individuals who reviewed the manuscript: Rick Chappell, University of Wisconsin; Dr. Todd G. Nick, University of Mississippi Medical Center; Al Bartolucci, University of Alabama at Birmingham; Bruce E. Trumbo, California State University, Hayward; James Godbold, The Mount Sinai School of Medicine of New York University; and Maureen Lahiff, University of California, Berkeley. Our thanks to the teaching assistants who have helped us teach the course and who have made many valuable suggestions. Probably the most deserving of thanks are the students who have taken the course over the years and who have toler- ated us as we learned how to teach it. We are still learning. Marcello Pagano Kimberlee Gauvreau Boston, Massachusetts Contents l Introduction 1. 1 Overview of the Text 2 1.2 Review Exercises 5 Bibliography 6 2 Data Presentation 7 2.1 Types of Numerical Data 7 2.1.1 Nominal Data 7 2.1.2 Ordinal Data 9 2.1.3 Ranked Data 10 2.1.4 Discrete Data 10 2.1.5 Continuous Data 11 2.2 Tables 11 2.2.1 Frequency Distributions 12 2.2.2 Relative Frequency 13 2.3 Graphs 15 2.3.1 Bar Charts 15 2.3.2 Histograms 16 2.3.3 Frequency Polygons 18 2.3.4 One-Way Scatter Plots 20 2.3.5 Box Plots 21 2.3.6 Two-Way Scatter Plots 22 2.3.7 Line Graphs 22 2.4 Further Applications 24 2.5 Review Exercises 30 Bibliography 36 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.