ebook img

Biostatistical Analysis: Pearson New International Edition PDF

761 Pages·2014·4.491 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Biostatistical Analysis: Pearson New International Edition

Biostatistical Analysis Jerrold H. Zar Fifth Edition ISBN 10: 1-292-02404-6 ISBN 13: 978-1-292-02404-2 Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk © Pearson Education Limited 2014 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affi liation with or endorsement of this book by such owners. ISBN 10: 1-292-02404-6 ISBN 13: 978-1-292-02404-2 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Printed in the United States of America 11122231235703803601379193173531 P E A R S O N C U S T O M L I B R AR Y Table of Contents 1. Data: Types and Presentation Jerrold H. Zar 1 2. Populations and Samples Jerrold H. Zar 17 3. Measures of Central Tendency Jerrold H. Zar 23 4. Measures of Variability and Dispersion Jerrold H. Zar 35 5. Probabilities Jerrold H. Zar 53 6. The Normal Distribution Jerrold H. Zar 71 7. One-Sample Hypotheses Jerrold H. Zar 103 8. Two-Sample Hypotheses Jerrold H. Zar 137 9. Paired-Sample Hypotheses Jerrold H. Zar 189 10. Multisample Hypotheses and the Analysis of Variance Jerrold H. Zar 201 11. Multiple Comparisons Jerrold H. Zar 239 12. Two-Factor Analysis of Variance Jerrold H. Zar 263 13. Data Transformations Jerrold H. Zar 301 I 333334444556667712358048925246153771757773335531 14. Multiway Factorial Analysis of Variance Jerrold H. Zar 313 15. Nested (Hierarchical) Analysis of Variance Jerrold H. Zar 327 16. Multivariate Analysis of Variance Jerrold H. Zar 337 17. Simple Linear Regression Jerrold H. Zar 351 18. Comparing Simple Linear Regression Equations Jerrold H. Zar 387 19. Simple Linear Correlation Jerrold H. Zar 405 20. Multiple Regression and Correlation Jerrold H. Zar 447 21. Polynomial Regression Jerrold H. Zar 487 22. Testing for Goodness of Fit Jerrold H. Zar 497 23. Contingency Tables Jerrold H. Zar 523 24. Dichotomous Variables Jerrold H. Zar 553 25. Testing for Randomness Jerrold H. Zar 623 26. Circular Distributions: Descriptive Statistics Jerrold H. Zar 645 27. Circular Distributions: Hypothesis Testing Jerrold H. Zar 665 Literature Cited Jerrold H. Zar 713 Index 751 II Data: Types and Presentation 1 TYPESOFBIOLOGICALDATA 2 ACCURACYANDSIGNIFICANTFIGURES 3 FREQUENCYDISTRIBUTIONS 4 CUMULATIVEFREQUENCYDISTRIBUTIONS Scientificstudyinvolvesthesystematiccollection,organization,analysis,andpresen- tation of knowledge. Many investigations in the biological sciences are quantitative, whereknowledgeisintheformofnumericalobservationscalleddata.(Onenumeri- calobservationisadatum.*)Inorderforthepresentationandanalysisofdatatobe validanduseful,wemustusemethodsappropriatetothetypeofdataobtained,tothe design of the data collection, and to the questions asked of the data; and the limita- tionsofthedata,ofthedatacollection,andofthedataanalysisshouldbeappreciated whenformulatingconclusions. The word statistics is derived from the Latin for “state,” indicating the historical importanceofgovernmentaldatagathering,whichrelatedprincipallytodemographic information (including census data and “vital statistics”) and often to their use in militaryrecruitmentandtaxcollecting.† The term statistics is often encountered as a synonym for data: One hears of col- legeenrollmentstatistics(suchasthenumbersofnewlyadmittedstudents,numbers ofseniorstudents,numbersofstudentsfromvariousgeographiclocations),statistics of a basketball game (such as how many points were scored by each player, how many fouls were committed), labor statistics (such as numbers of workers unem- ployed, numbers employed in various occupations), and so on. Hereafter, this use of the word statistics will not appear in this text. Instead, it will be used in its other commonmanner:torefertotheorderlycollection,analysis,andinterpretationofdata withaviewtoobjectiveevaluationofconclusionsbasedonthedata. Statisticsappliedtobiologicalproblemsissimplycalledbiostatisticsor,sometimes, biometry‡ (the latter term literally meaning “biological measurement”). Although *Thetermdataissometimesseenasasingularnounmeaning“numericalinformation.”This bookrefrainsfromthatuse. †Peters(1987:79)andWalker(1929:32)attributethefirstuseofthetermstatisticstoaGerman professor,GottfriedAchenwall(1719–1772),whousedtheGermanwordStatistikin1749,andthe firstpublisheduseoftheEnglishwordtoJohnSinclair(1754–1835)in1791. ‡The word biometry, which literally means “biological measurement,” had, since the nine- teenthcentury,beenfoundinseveralcontexts(suchasdemographicsand,later,quantitativegenet- ics;Armitage,1985;Stigler,2000),butusingittomeantheapplicationofstatisticalmethodstobiological informationapparentlywasconceivedbetween1892and1901byKarlPearson,alongwiththename Biometrikaforthestill-importantEnglishjournalhehelpedfound;anditwasfirstpublishedinthe inauguralissueofthisjournalin1901(Snedecor,1954).TheBiometricsSectionoftheAmerican FromChapter1ofBiostatisticalAnalysis,FifthEdition,JerroldH.Zar.Copyright(cid:2)c 2010by PearsonEducation,Inc.PublishingasPearsonPrenticeHall.Allrightsreserved. 1 Data:TypesandPresentation the field of statistics has roots extending back hundreds of years, its development began in earnest in the late nineteenth century, and a major impetus from early in thisdevelopmenthasbeentheneedtoexaminebiologicaldata. Statistical considerations can aid in the design of experiments intended to collect data and in the setting up of hypotheses to be tested. Many biologists attempt the analysisoftheirresearchdataonlytofindthattoofewdatawerecollectedtoenable reliableconclusionstobedrawn,orthatmuchextraeffortwasexpendedincollecting datathatcannotbeofreadyuseintheanalysisoftheexperiment.Thus,aknowledge of basic statistical principles and procedures is important as research questions are formulatedbeforeanexperimentanddatacollectionarebegun. Once data have been obtained, we may organize and summarize them in such a way as to arrive at their orderly and informative presentation. Such procedures are often termed descriptive statistics. For example, measurements might be made of the heights of all 13-year-old children in a school district, perhaps determining an average height for each sex. However, perhaps it is desired to make some gen- eralizations from these data. We might, for example, wish to make a reasonable estimate of the heights of all 13-year-olds in the state. Or we might wish to con- cludewhetherthe13-year-oldboysinthestateareontheaveragetallerthanthegirls of that age. The ability to make such generalized conclusions, inferring characteris- tics of the whole from characteristics of its parts, lies within the realm of inferential statistics. 1 TYPESOFBIOLOGICALDATA A characteristic (for example, size, color, number, chemical composition) that may differ from one biological entity to another is termed a variable (or, sometimes, a ∗ variate ), and several different kinds of variables may be encountered by biologists. Because the appropriateness of descriptive or inferential statistical procedures de- pends upon the properties of the data obtained, it is desirable to distinguish among the principal kinds of data. The classification used here is that which is commonly employed (Senders, 1958; Siegel, 1956; Stevens, 1946, 1968). However, not all data fit neatly into these categories and some data may be treated differently depending uponthequestionsaskedofthem. (a)DataonaRatioScale. Imaginethatwearestudyingagroupofplants,thatthe heights of the plants constitute a variable of interest, and that the number of leaves per plant is another variable under study. It is possible to assign a numerical value to the height of each plant, and counting the leaves allows a numerical value to be recorded for the number of leaves on each plant. Regardless of whether the height measurements are recorded in centimeters, inches, or other units, and regardless of whethertheleavesarecountedinanumbersystemusingbase10oranyotherbase, therearetwofundamentallyimportantcharacteristicsofthesedata. First,thereisaconstantsizeintervalbetweenadjacentunitsonthemeasurement scale.Thatis,thedifferenceinheightbetweena36-cmanda37-cmplantisthesame StatisticalAssociationwasestablishedin1938,successortotheCommitteeonBiometricsofthat organization,andbeganpublishingtheBiometricsBulletinin1945,whichtransformedin1947into the journal Biometrics, a journal retaining major importance today. More recently, the term bio- metricshasbecomewidelyusedtorefertothestudyofhumanphysicalcharacteristics(including facial and hand characteristics, fingerprints, DNA profiles, and retinal patterns) for identification purposes. ∗ “Variate”wasfirstusedbyR.A.Fisher(1925:5;David,1995). 2 Data:TypesandPresentation as the difference between a 39-cm and a 40-cm plant, and the difference between eightandtenleavesisequaltothedifferencebetweennineandelevenleaves. Second, it is important that there exists a zero point on the measurement scale andthatthereisaphysicalsignificancetothiszero.Thisenablesustosaysomething meaningful about the ratio of measurements. We can say that a 30-cm (11.8-in.) tall plant is half as tall as a 60-cm (23.6-in.) plant, and that a plant with forty-five leaves hasthreetimesasmanyleavesasaplantwithfifteen. Measurement scales having a constant interval size and a true zero point are said toberatioscalesofmeasurement.Besideslengthsandnumbersofitems,ratioscales include weights (mg, lb, etc.), volumes (cc, cu ft, etc.), capacities (ml, qt, etc.), rates (cm/sec,mph,mg/min,etc.),andlengthsoftime(hr,yr,etc.). (b)DataonanIntervalScale. Somemeasurementscalespossessaconstantinterval size but not a true zero; they are called interval scales. A common example is that of the two common temperature scales: Celsius (C) and Fahrenheit (F). We can see ◦ ◦ ◦ ◦ ◦ thatthesamedifferenceexistsbetween20 C(68 F)and25 C(77 F)asbetween5 C ◦ ◦ ◦ (41 F) and 10 C (50 F); that is, the measurement scale is composed of equal-sized ◦ ◦ intervals. But it cannot be said that a temperature of 40 C (104 F) is twice as hot ◦ ◦ ∗ as a temperature of 20 C (68 F); that is, the zero point is arbitrary. (Temperature measurements on the absolute, or Kelvin [K], scale can be referred to a physically meaningfulzeroandthusconstitutearatioscale.) Some interval scales encountered in biological data collection are circular scales. Time of day and time of the year are examples of such scales. The interval between 2:00p.m.(i.e.,1400hr)and3:30p.m.(1530hr)isthesameastheintervalbetween8:00 a.m.(0800hr)and9:30a.m.(0930hr).Butonecannotspeakofratiosoftimesofday because the zero point (midnight) on the scale is arbitrary, in that one could just as wellsetupascalefortimeofdaywhichwouldhavenoon,or3:00p.m.,oranyother time as the zero point. Circular biological data are occasionally compass points, as if one records the compass direction in which an animal or plant is oriented. As the ◦ designationofnorthas0 isarbitrary,thiscircularscaleisaformofintervalscaleof measurement. (c)DataonanOrdinalScale. Theprecedingparagraphsonratioandintervalscales of measurement discussed data between which we know numerical differences. For example, if man A weighs 90 kg and man B weighs 80 kg, then man A is known to weigh 10 kg more than B. But our data may, instead, be a record only of the fact that man A weighs more than man B (with no indication of how much more). Thus,wemaybedealingwithrelativedifferencesratherthanquantitativedifferences. Such data consist of an ordering or ranking of measurements and are said to be on an ordinal scale of measurement (ordinal being from the Latin word for “order”). We may speak of one biological entity being shorter, darker, faster, or more active than another; the sizes of five cell types might be labeled 1, 2, 3, 4, and 5, to denote ∗ TheGerman-DutchphysicistGabrielDanielFahrenheit(1686–1736)inventedthethermome- terin1714andin1724employedascaleonwhichsaltwaterfrozeatzerodegrees,purewaterfroze at32degrees,andpurewaterboiledat212degrees.In1742theSwedishastronomerAndersCel- sius (1701–1744) devised a temperature scale with 100 degrees between the freezing and boiling pointsofwater(theso-called“centigrade”scale),firstbyreferringtozerodegreesasboilingand 100degreesasfreezing,andlater(perhapsatthesuggestionofSwedishbotanistandtaxonomist CarolusLinnaeus[1707–1778])reversingthesetworeferencepoints(Asimov,1982:177). 3 Data:TypesandPresentation theirmagnitudesrelativetoeachother;orsuccessinlearningtorunamazemaybe recordedasA,B,orC. Itisoftentruethatbiologicaldataexpressedontheordinalscalecouldhavebeen expressed on the interval or ratio scale had exact measurements been obtained (or obtainable). Sometimes data that were originally on interval or ratio scales will be changedtoranks;forexample,examinationgradesof99,85,73,and66%(ratioscale) mightberecordedasA,B,C,andD(ordinalscale),respectively. Ordinal-scaledatacontainandconveylessinformationthanratioorintervaldata, foronlyrelativemagnitudesareknown.Consequently,quantitativecomparisonsare impossible (e.g., we cannot speak of a grade of C being half as good as a grade of A, or of the difference between cell sizes 1 and 2 being the same as the difference betweensizes3and4).However,wewillseethatmanyusefulstatisticalprocedures are,infact,applicabletoordinaldata. (d)DatainNominalCategories. Sometimesthevariablebeingstudiedisclassified by some qualitative measure it possesses rather than by a numerical measurement. In such cases the variable may be called an attribute, and we are said to be dealing with nominal, or categorical, data. Genetic phenotypes are commonly encountered biological attributes: The possible manifestations of an animal’s eye color might be brown or blue; and if human hair color were the attribute of interest, we might record black, brown, blond, or red. As other examples of nominal data (nominal is from the Latin word for “name”), people might be classified as male or female, or right-handedorleft-handed.Or,plantsmightbeclassifiedasdeadoralive,oraswith or without fertilizer application. Taxonomic categories also form a nominal classi- fication scheme (for example, plants in a study might be classified as pine, spruce, orfir). Sometimes, data that might have been expressed on an ordinal, interval, or ratio scale of measurement may be recorded in nominal categories. For example, heights mightberecordedastallorshort,orperformanceonanexaminationaspassorfail, where there is an arbitrary cut-off point on the measurement scale to separate tall fromshortandpassfromfail. Aswillbeseen,statisticalmethodsusefulwithratio,interval,orordinaldatagen- erallyarenotapplicabletonominaldata,andwemust,therefore,beabletoidentify suchsituationswhentheyoccur. (e)ContinuousandDiscreteData. Whenwespokepreviouslyofplantheights,we weredealingwithavariablethatcouldbeanyconceivablevaluewithinanyobserved range; this is referred to as a continuous variable. That is, if we measure a height of 35cm and a height of 36cm, an infinite number of heights is possible in the range from35to36cm:aplantmightbe35.07cmtallor35.988cmtall,or35.3263cmtall, andsoon,although,ofcourse,wedonothavedevicessensitiveenoughtodetectthis infinity of heights. A continuous variable is one for which there is a possible value betweenanyothertwovalues. However,whenspeakingofthenumberofleavesonaplant,wearedealingwitha variablethatcantakeononlycertainvalues.Itmightbepossibletoobserve27leaves, or 28 leaves, but 27.43 leaves and 27.9 leaves are values of the variable that are impossible to obtain. Such a variable is termed a discrete or discontinuous variable (also known as a meristic variable). The number of white blood cells in 1 mm3 of blood, the number of giraffes visiting a water hole, and the number of eggs laid by a grasshopper are all discrete variables. The possible values of a discrete variable generally are consecutive integers, but this is not necessarily so. If the leaves on our 4 Data:TypesandPresentation plants are always formed in pairs, then only even integers are possible values of the variable. And the ratio of number of wings to number of legs of insects is a discrete variable that may only have the value of 0, 0.3333..., or 0.6666... (i.e., 0, 2, or 4, ∗ 6 6 6 respectively). Ratio-, interval-, and ordinal-scale data may be either continuous or discrete. Nominal-scaledatabytheirnaturearediscrete. 2 ACCURACYANDSIGNIFICANTFIGURES Accuracy is the nearness of a measurement to the true value of the variable being measured. Precision is not a synonymous term but refers to the closeness to each otherofrepeatedmeasurementsofthesamequantity.Figure1illustratesthediffer- encebetweenaccuracyandprecisionofmeasurements. 0 1 2 3 4 5 6 kg 0 1 2 3 4 5 6 kg (a) (b) 0 1 2 3 4 5 6 kg 0 1 2 3 4 5 6 kg (c) (d) FIGURE1: Accuracyandprecisionofmeasurements.A3-kilogramanimalisweighed10times.The10 measurementsshowninsample(a)arerelativelyaccurateandprecise;thoseinsample(b)arerelatively accuratebutnotprecise;thoseofsample(c)arerelativelyprecisebutnotaccurate;andthoseofsample (d)arerelativelyinaccurateandimprecise. Human error may exist in the recording of data. For example, a person may mis- count the number of birds in a tract of land or misread the numbers on a heart-rate monitor.Or,apersonmightobtaincorrectdatabutrecordtheminsuchaway(per- hapswithpoorhandwriting)thatasubsequentdataanalystmakesanerrorinreading them.Weshallassumethatsucherrorshavenotoccurred,butthereareotheraspects ofaccuracythatshouldbeconsidered. Accuracy of measurement can be expressed in numerical reporting. If we report that the hind leg of a frog is 8cm long, we are stating the number 8 (a value of a continuous variable) as an estimate of the frog’s true leg length. This estimate was made using some sort of a measuring device. Had the device been capable of more accuracy, we might have declared that the leg was 8.3cm long, or perhaps 8.32cm long.Whenrecordingvaluesofcontinuousvariables,itisimportanttodesignatethe accuracy with which the measurements have been made. By convention, the value 8 denotes a measurement in the range of 7.50000 ... to 8.49999..., the value 8.3 designates a range of 8.25000 ... to 8.34999..., and the value 8.32 implies that the true value lies within the range of 8.31500... to 8.32499.... That is, the reported value is the midpoint of the implied range, and the size of this range is designated bythelastdecimalplaceinthemeasurement.Thevalueof8cmimpliesanabilityto ∗The ellipsis marks (...) may be read as “and so on.” Here, they indicate that 2 and 4 are 6 6 repeatingdecimalfractions,whichcouldjustaswellhavebeenwrittenas0.3333333333333 ...and 0.6666666666666...,respectively. 5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.