ebook img

Environmental and ecological statistics with R PDF

433 Pages·2010·5.098 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Environmental and ecological statistics with R

E E nvironmEntal and cological S r tatiSticS with © 2010 by Taylor & Francis Group, LLC C6206_FM.indd 1 7/20/09 5:04:08 PM CHAPMAN & HALL/CRC APPLIED ENVIRONMENTALS University of North Carolina TATISTICS Series Editor Richard Smith University of North Carolina U.S.A. Published Titles Michael E. Ginevan and Douglas E. Splitstone, Statistical Tools for Environmental Quality Timothy G. Gregoire and Harry T. Valentine, Sampling Strategies for Natural Resources and the Environment Daniel Mandallaz, Sampling Techniques for Forest Inventory Bryan F. J. Manly, Statistics for Environmental Science and Management, Second Edition Steven P. Millard and Nagaraj K. Neerchal, Environmental Statistics with S Plus Song S. Qian, Environmental and Ecological Statistics with R © 2010 by Taylor & Francis Group, LLC C6206_FM.indd 2 7/20/09 5:04:08 PM Chapman & Hall/CRC Applied Environmental Statistics E E nvironmEntal and cological S r tatiSticS with S S. Q ong ian nicholaS School of thE EnvironmEnt dukE univErSity durham, north carolina, u.S.a. © 2010 by Taylor & Francis Group, LLC C6206_FM.indd 3 7/20/09 5:04:08 PM CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20110725 International Standard Book Number-13: 978-1-4200-6208-3 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2010 by Taylor & Francis Group, LLC Preface Statistics is partof the curriculumof almostall environmentaland ecological studiesdepartmentsandprogramsinhighereducationinstitutionsworldwide. Yet statistics is also often cited as the subject that is least liked and ineffec- tively taught, especially for students outside the mathematics/statisticsarea. Acommonprobleminlearningstatisticsisthatstatisticsisoftenperceivedas asubfieldofmathematics. Consequently,weexpecttolearnasetofrulesand beabletousestatisticsinourwork. Butappliedstatisticsisnotmathematics. This book represents an effort in bridging the gap between a typical applied statisticstextandtheneedofscientistsinenvironmentalandecologicalfields, with anemphasisonthe inductive natureofstatisticalthinking. Muchofthe mathematical/theoreticalbackgroundsareavoided. Examples are usedto in- troduce concepts and to illustrate methods. Statistics is introduced as a tool to facilitate scientific thinking, as it is intended when R.A. Fisher introduced statistics to applied scientists. The approach adopted by this book follows Fisher’s general steps of a sta- tisticalmodelingproblem,namely,modelspecification,parameterestimation, and model evaluation. These steps are similar to the steps a scientist takes in a scientific project. However, as discussed by many, statistics is often the subject of which students in science and engineering do not like [Berthouex and Brown, 1994] and upon which ecologists often make mistakes [Peters, 1991]. Thedifficultyliesinthe disconnectbetweenatypicalappliedstatistics course/book and a typical scientific problem. In solving a scientific problem, we start with a hypothesis about the underlying mechanism as the basis for data collection. The proposed hypothesis provides the basis for formulat- ing a model, often with unknown parameters. Experiments and other data collection efforts are to provide data for estimating these unknown parame- ters. Once these parameters are estimated, scientists can evaluate the model by comparing a model’s prediction to new observations. In this simplified summary of a scientific problem-solving process, the first step (forming an hypothesis) is often the most difficult part and requires the scientist to be bothexperiencedandcreative. Model/hypothesisformulationisalsothemost important step of the process because a wrong model will never lead us to success. In applied statistics, the typical steps we take, as described by R.A. Fisher, are similar to the steps of a scientific problem-solving process. With a specific problem, we must first examine the data and propose a statistical model to describe the distribution of the variable of interest. The statistical model is parameterizedwith unknownparametersto be estimatedwith data. v © 2010 by Taylor & Francis Group, LLC vi Environmental and Ecological Statistics Whenthe parametersareestimated,wemustassesstheuncertaintyofsucha model by examining the sampling distributions of the estimated parameters. Thissimilarityintheprocessesofascientificproblemsolvingandastatistical modeldevelopment,however,doesnottranslateintoeasylearningofstatistics for scientists. The difficulty is the transition from a scientific hypothesis to a statistical model. There is, unfortunately, no easy-to-follow steps to make this transition. A typical applied statistics course/book presents the subject asacollectionofmethodsfordifferenttypesofstatisticalmodels,andmoreor less ignores the problem of model formulation. This treatment is inevitable, because model formulation is necessarily a scientific problem. Applied statis- tics books or courses are focused on the statistical problems of parameter estimation and model evaluation. Different types of models often require dif- ferent mathematical solutions. Frequently, this treatment of statistics leads to a misperception of what statistics is and why we learn statistics. This book is motivated by this underlying link betweenstatistical thinking andscientificmethods. Thebookisstillorganizedbasedonstatisticalmodels. However, throughout the book, examples were used to discuss each type of statistical models and some of these examples are used to coverseveraltypes of models. The emphasis of these examples is on model formulation and the underlyingmathematical/statisticaltheoriesaremostlyomittedandreplaced by presentations of R implementation of these models. The book is based on teaching materials I accumulated at the Nicholas School of the Environment of Duke University. The book can be divided into three units. Chapters1to5havebeenusedinagraduatelevelapplieddataanalysis • course. They canbe readasaunittoserveasprerequisiteforadvanced statisticalmodeling. Thesechaptersareintendedforbuilding afounda- tion so that readers will be able to conduct a simple data analysis task such as exploratory data analysis and fitting linear regressionmodels. Chapters 6 to 8 have been used in a followup course in statistical mod- • eling. Thethreechaptersinthisunitaresomewhatindependentofeach other, and they can be read separately. The same is true for the three topics in Chapter 8 (Sections 8.1-8.4,8.5, and 8.6). Chapters 9 and 10 have been used for a PhD-level independent study • course. Chapter 9 discusses the use of simulation for model checking, providing tools for a critical assessment of the developed model. Sim- ulation is commonly used for parameter estimation and for uncertainty assessment. The use of simulation for model checking, although less frequently discussed in the literature, is an important aspect of model developmentandassessment. Chapter10discussesthe useofmultilevel regression models, a class of models that can have a broad impact in environmental and ecological data analysis. Data sets and R scripts used in the book are available online at http://www.duke.edu/ song/eeswithr.htm. ∼ © 2010 by Taylor & Francis Group, LLC Preface vii Many people helped in the process of writing this book. Kenneth H. Reck- how, Curtis J.Richardson,and MichaelLavine are my mentorsand longtime collaborators. This book reflects their influence on my approach to environ- mental and ecological statistics. Collaboration with Yandong Pan improved my understanding of ecological problems and the problem-solving process in ecology. CraigA.Stowconstantlyfeedsmewithinterestingideasandpapers. His work in analyzing the PCB in the fish data is greatly appreciated. Olli Malve, George B. Arhonditsis, and Andrew D. Gronewold spent numerous hours helping me sort through ideas and concepts. Thomas F. Cuffney and GerardMcMahonpresented the EUSE example to me and spent many hours discussing the example used in Chapter 10. Zehao Shen hosted me at Peking University in the summer of 2007 and provided many interesting examples. Richard L. Smith readthe manuscript of the book and provideda critical re- viewwhichhelpedgreatlyinthepresentationofthebookandinimprovingthe clarity of the discussions of some key concepts. Many errors were found and improvements suggested by Meg Mobley, Ibrahim Alameddine, Itai Shelem, Kristen Marine, Emily Sharp, Erin Gray, and Wyatt Hartman. Song S. Qian Durham, North Carolina March, 2009 © 2010 by Taylor & Francis Group, LLC © 2010 by Taylor & Francis Group, LLC Contents Preface v Table of Contents ix List of Tables xiii List of Figures xv I Basic Concepts 1 1 Introduction 3 1.1 The Everglades Example . . . . . . . . . . . . . . . . . . . . 6 1.2 Statistical Issues . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 12 2 R 13 2.1 What is R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Getting Started with R . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 R Prompt and Assignment . . . . . . . . . . . . . . . 14 2.2.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 R Functions . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 The R Commander . . . . . . . . . . . . . . . . . . . . . . . 18 3 Statistical Assumptions 25 3.1 The Normality Assumption . . . . . . . . . . . . . . . . . . . 25 3.2 The Independence Assumption . . . . . . . . . . . . . . . . . 29 3.3 The Constant Variance Assumption . . . . . . . . . . . . . . 30 3.4 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . 32 3.4.1 Graphs for Displaying Distributions . . . . . . . . . . 32 3.4.2 Graphs for Comparing Distributions . . . . . . . . . . 35 3.4.3 Graphs for Exploring Dependency Among Variables . 36 3.5 From Graphs to Statistical Thinking . . . . . . . . . . . . . . 45 3.6 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 47 4 Statistical Inference 49 4.1 Estimation of Population Mean and Confidence Interval . . . 50 4.1.1 Bootstrap Method for Estimating Standard Error. . . 57 4.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . 61 ix © 2010 by Taylor & Francis Group, LLC x Environmental and Ecological Statistics 4.2.1 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.2 Two-Sided Alternatives . . . . . . . . . . . . . . . . . 69 4.2.3 Hypothesis Testing Using the Confidence Interval . . . 70 4.3 A General Procedure . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Nonparametric Methods for Hypothesis Testing . . . . . . . 73 4.4.1 Rank Transformation . . . . . . . . . . . . . . . . . . 73 4.4.2 Wilcoxon Signed Rank Test . . . . . . . . . . . . . . . 74 4.4.3 Wilcoxon Rank Sum Test . . . . . . . . . . . . . . . . 75 4.4.4 A Comment on Distribution-Free Methods . . . . . . 77 4.5 Significance Level α, Power 1 β, and p-Value . . . . . . . . 80 − 4.6 One-Way Analysis of Variance . . . . . . . . . . . . . . . . . 87 4.6.1 Analysis of Variance . . . . . . . . . . . . . . . . . . . 88 4.6.2 Statistical Inference . . . . . . . . . . . . . . . . . . . 90 4.6.3 Multiple Comparisons . . . . . . . . . . . . . . . . . . 92 4.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.7.1 The Everglades Example . . . . . . . . . . . . . . . . 98 4.7.2 Kemp’s Ridley Turtles . . . . . . . . . . . . . . . . . . 99 4.7.3 Assessing Water Quality Standard Compliance . . . . 105 4.7.4 Interaction between Red Mangrove and Sponges . . . 108 4.8 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 113 II Statistical Modeling 115 5 Linear Models 119 5.1 ANOVA as a Linear Model . . . . . . . . . . . . . . . . . . . 122 5.2 Simple and Multiple Linear Regression Models . . . . . . . . 124 5.2.1 The Least Squares . . . . . . . . . . . . . . . . . . . . 125 5.2.2 PCBs in the Fish Example . . . . . . . . . . . . . . . 126 5.2.3 Regression with One Predictor . . . . . . . . . . . . . 127 5.2.4 Multiple Regression . . . . . . . . . . . . . . . . . . . 129 5.2.5 Interaction . . . . . . . . . . . . . . . . . . . . . . . . 131 5.2.6 Residuals and Model Assessment . . . . . . . . . . . . 133 5.2.7 Categorical Predictors . . . . . . . . . . . . . . . . . . 140 5.2.8 The Finnish Lakes Example and Collinearity . . . . . 144 5.3 General Considerations in Building a Predictive Model . . . 155 5.4 Uncertainty in Model Predictions . . . . . . . . . . . . . . . 159 5.5 Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . 161 5.5.1 Interaction . . . . . . . . . . . . . . . . . . . . . . . . 166 5.6 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 167 6 Nonlinear Models 169 6.1 Nonlinear Regression . . . . . . . . . . . . . . . . . . . . . . 169 6.1.1 Piecewise Linear Models . . . . . . . . . . . . . . . . . 178 6.1.2 Example: U.S. Lilac First Bloom Dates . . . . . . . . 184 6.2 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 © 2010 by Taylor & Francis Group, LLC

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.