ebook img

Evaluation of species distribution models by resampling of sites surveyed a century ago by Joseph PDF

47 Pages·2014·5.06 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Evaluation of species distribution models by resampling of sites surveyed a century ago by Joseph

Ecography 36: 1017–1031, 2013 doi: 10.1111/j.1600-0587.2013.00107.x © 2013 Th e Authors. Ecography © 2013 Nordic Society Oikos Subject Editor: David Nogues-Bravo. Accepted 25 January 2013 Evaluation of species distribution models by resampling of sites surveyed a century ago by Joseph Grinnell Adam B. Smith , Maria J. Santos , Michelle S. Koo , Karen M. C. Rowe , Kevin C. Rowe , James L. Patton , John D. Perrine , Steven R. Beissinger and Craig Moritz A. B. Smith ([email protected]), M. S. Koo, J. L. Patton, S. R. Beissinger and C. Moritz, Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building, Univ. of California, Berkeley, CA 94720-3060, USA. Present address of ABS: Center for Conservation and Sustainable Development, Missouri Botanical Garden, PO Box 299, Saint Louis, MO 63166, USA. SRB also at: Dept of Environmental Science, Policy and Management, 130 Mulford Hall, Univ. of California, Berkeley, CA 94720-3114, USA. – M. J. Santos, Spatial History Project and Bill Lane Center for the American West, History Dept, Stanford Univ., Stanford, CA 94305-2055, USA. – K. M. C. Rowe and K. C. Rowe, 4 Sciences Dept, Museum Victoria, GPO Box 666, Melbourne 3001, VIC, Australia. – J. D. Perrine, Biological Sciences Dept, California Polytech- nic State Univ., San Luis Obispo, CA 93407-0401, USA. Species distribution models (SDMs) are commonly applied to predict species ’ responses to anticipated global change, but lack of data from future time periods precludes assessment of their reliability. Instead, performance against test data in the same era is assumed to correlate with accuracy in the future. Moreover, high-confi dence absence data is required for testing model accuracy but is often unavailable since a species may be present when undetected. Here we evaluate the performance of eight SDMs trained with historic (1900 – 1939) or modern (1970 – 2009) climate data and occurrence records for 18 mammalian species. Models were projected to the same or the opposing time period and evaluated with data obtained from surveys conducted by Joseph Grinnell and his colleagues in the Sierra Nevada of California from 1900 to 1939 and modern resurveys from 2003 to 2011. Occupancy modeling was used to confi dently assign absences at test sites where species were undetected. SDMs were evaluated using species ’ presences combined with this high-confi dence absence (HCA) set, a low-confi dence set in which non-detections were assumed to indicate absence (LCA), and ran- domly located ‘ pseudoabsences ’ (PSA). Model performance increased signifi cantly with the quality of absences (mean AUC (cid:2) SE: 0.76 (cid:2) 0.01 for PSA, 0.79 (cid:2) 0.01 for LCA, and 0.81 (cid:2) 0.01 for HCA), and apparent diff erences between SDMs declined as the quality of test absences increased. Models projecting across time performed as well as when pro- jecting within the same time period when assessed with threshold-independent metrics. However, accuracy of presence and absence predictions sometimes declined in cross-era projections. Although most variation in performance occurred among species, autecological traits were only weakly correlated with model accuracy. Our study indicates that a) the quality of evaluation data aff ects assessments of model performance; b) within-era performance correlates positively but unreliably with cross-era performance; and c) SDMs can be reliably but cautiously projected across time. Anthropogenic climate change promises to rewrite the bio- algorithms using within-era evaluation, testing models geography of Earth ’ s species, with some expected to gain, against records from the same region and time period used to some to lose, and some to shift their current distributions. As train the models (Elith et al. 2006, Hijmans and Graham a result, conservation planners require reliable methods to 2006, Syphard and Franklin 2009). However, within-era project future distributions of species of concern and to assessments of SDMs may give overly optimistic estimates of prioritize conservation eff ort (Th omas et al. 2004, Carroll cross-era performance (Ara ú jo et al. 2005a, Hijmans 2012). et al. 2010, Ogawa-Onishi et al. 2010, Saupe et al. 2011). While cross-era evaluation increases the independence Species distribution models (SDMs), which correlate species between training and test data, it requires data from both occurrence data with climate variables and other factors time periods of interest, which are rarely available for indicative of habitat quality to produce maps of environ- time spans relevant to conservation planning (i.e. several mental suitability, are frequently used for such projections. decades or more). Unfortunately, the reliability of projecting SDMs across SDMs should be less reliable when projecting across time periods relevant to conservation remains largely time than within the same era for reasons related to both unknown (Ara ú jo et al. 2005a, b, Dormann 2007, Elith biology and modeling (Ara ú jo et al. 2005a, b, Dobrowski and Leathwick 2009, Kharouba et al. 2009). Scores of et al. 2011). From a biological perspective model perfor- studies have assessed the performance of diff erent SDM mance will be diminished if species distributions are not in 1017 equilibrium with the environment in the era from or to design of the original and contemporary surveys allow us to which their ranges are projected (Nogu é s-Bravo 2009, use occupancy modeling to confi dently assign absences Wiens et al. 2009). Disequilibrium can arise if species ’ and compare the accuracy of cross-era and within-era SDM ranges are shaped by biotic interactions that are inde- projections. pendent of climate (Pellissier et al. 2010, Rubidge et al. Our primary questions are: 1) how well do SDMs 2011), held in check by dispersal limitation from otherwise project across time periods relevant to conservation; 2) do favorable regions (Early and Sax 2011), or are infl uenced by SDM algorithms diff er in their performance; 3) how does adaptive evolution (Lavergne et al. 2010). Certain traits the quality of the test data set infl uence assessment of model related to dispersal, longevity, and reproductive capacity may accuracy; 4) can performance be predicted by species ’ favor or disfavor equilibrium and thereby correlate with autecological traits or rates of colonization and extirpation; model performance (McPherson and Jetz 2007). As a and 5) can performance of a SDM projected across time be result, there has been a recent shift from fi nding the best predicted by its performance against test data drawn from modeling technique to explaining variation in model perfor- the same region and time period as the data used to train mance between species (Guisan et al. 2007, Dobrowski it (does within-era performance predict cross-era perfor- et al. 2011). Th e accuracy of predictions may also decline mance)? SDMs are commonly assessed using so-called when projecting across time if models incorrectly fi t or ‘ threshold-independent ’ measures of performance, which overfi t training data (Elith and Graham 2009, Elith et al. calculate model skill across all possible values that could 2010), if the covariance between interacting predictors be used to convert model output to a binary ‘ presence/ changes across time (Jim é nez-Valverde et al. 2009), or absence ’ state (Fielding and Bell 1997). In contrast, pre- if models extrapolate beyond the range of training data dictions from SDMs are commonly used after thresholds (Ara ú jo et al. 2005b, Peterson et al. 2007, Nenz é n and have been applied to convert output to a binary presence/ Ara ú jo 2011). absence state because they are easily interpretable (Nenz é n False absences compound the problem of assessing the and Ara ú jo 2011). Th us we examine threshold-dependent reliability of SDMs. Although false presences can yield and -independent measures of model performance. A misleading results, they are generally uncommon since diagram of the study design is shown in Fig. 2. occurrences can be confi rmed with voucher specimens or similar robust evidence. However, confi rmation of absences Methods requires ‘ n egative ’ evidence, which is rarely reported in specimen databases (K é ry 2011). Even when presence – Training data: species ’ records absence data are available, absences are confounded by the possibility that a species was present but undetected Museum records from MaNIS ((cid:3) www.manisnet.org (cid:4) ) (MacKenzie et al. 2006). While attention has been devoted and Arctos ((cid:3) http://arctos.database.museum/ (cid:4) ) from the to the eff ects of false absences on the calibration of SDMs eastern border of the Rocky Mountains (103.77W) to that use presence – absence data (Gu and Swihart 2004, the Pacifi c Ocean (Fig. 1a) and between the northern and Lobo et al. 2010, Rota et al. 2011) or presence – only data southern borders of the US were used to train the SDMs. (K é ry 2011), the consequences of false absences in data used By using sites from the conterminous western US as training for model evaluation are less well understood (Foody data, we included the full range or a substantial portion 2011). One way to address this problem is to employ occu- of each species ’ range in the model training set (Nenz é n and pancy modeling, which uses the detection probability Ara ú jo 2011). Supplementary material Appendix 2 contains estimated from repeated surveys to infer the probability details on data cleaning procedures. Models were trained of true absence at sites where a species was not detected using records either from 1900 to 1939 (the ‘ historic ’ era) (MacKenzie et al. 2006, Tingley et al. 2009, K é ry 2011). or 1970 to 2009 (the ‘ modern ’ era) and projected to the Here, we evaluate the performance of eight SDMs trained same era or the opposing era. To allow a fair comparison with historic (1900 –1 939) or modern (1970 – 2009) museum between SDMs, we equalized training presences in each era records which were projected to the same or the opposing by subsampling records in the era with more sites. We only time period for 18 mammalian species. Historic evaluation included species with (cid:5) 30 presences in each era (Wisz et al. data was obtained from the work of Joseph Grinnell and 2008) and (cid:5) 5 test presences and absences (described below) his colleagues who conducted systematic inventories of in each of the eras. Th e fi nal data set had 18 species (mini- vertebrates of the western United States in the early 20th mum, median, and maximum training sites per species in an century (Grinnell and Storer 1924). Th eir meticulous fi eld notes (∼ 50 000 pages) and specimens (∼ 80 000) are era were 50, 130, and 1003, respectively; Supplementary material Appendix 2, Table A2). preserved at the Univ. of California, Berkeley ’ s Museum of Vertebrate Zoology, and have allowed us to resurvey matching and similar sites between 2003 and 2011 to serve Environmental data as modern evaluation data (Moritz et al. 2008, Tingley et al. 2009, 2012, Morelli et al. 2012). Our test regions con- We used 30-arcsec (∼ 800-m) resolution climate layers sist of three elevational gradients along the Sierra Nevada of monthly minimum, maximum and mean temperature and southern Cascade Range (Fig. 1a – d). Combined with and precipitation derived from the parameter-elevation the appreciable climatic change that has occurred across the regression on independent slopes model (PRISM), averaged region over the past century (Fig. 1e and f; Supplementary across 1900 – 1939 and across 1970 – 2009 (Daly et al. 2000). material Appendix 1, Table A1), the thoroughness and PRISM is an expert-tuned meteorological interpolation 1018 Figure 1. Th e study region (a), the three regions used for testing SDMs (b – d), and climate change at the Grinnell sites (e – f). (a – d) Th e test regions and sites (circles). Model training was conducted on species ’ records from across the western US, but model evaluation was per- formed using presence/absence records from the Grinnell sites (b – d). Sites for pseudoabsences (PSA) were drawn from an 80-km buff er around the Sierra Nevada ecoregion (the shaded area). National Park boundaries are shown in the insets (Lassen Volcanic National Park, Yosemite National Park, and in the Southern Sierras Kings Canyon and Sequoia National Parks). (e) Climate change vectors for Grinnell sites. Each arrow represents climate change at a Grinnell site, with the beginning located at the mean minimum temperature of the coldest month and mean precipitation of the driest month in historic times, and the end located at the corresponding values in the modern era. On average minimum temperature and precipitation increased. (f) Th e same as panel e but for mean maximum temperature of the warmest month and precipitation of the wettest month. system with predictions based on observed weather measure- resource-based limits on survival (e.g. minimum tempera- ments, and it has higher accuracy in topographically complex ture of the coldest month, precipitation of the driest month; areas like the Sierra Nevada compared to other interpolation Austin 2002). Th is resulted in nine predictors averaged across methods (Parra and Monahan 2008). A description of the years in each era (Supplementary material Appendix 1, PRISM interpolation algorithm and weather station data Table A1): mean diurnal temperature range, the ratio of are presented in Supplementary material Appendix 1. From diurnal to yearly temperature range, minimum temperature these layers we derived 19 ‘ BIOCLIM ’ variables (Nix 1986) of the coldest month, maximum temperature of the warmest and kept those with pairwise correlations between – 0.7 and month, temperature annual range, precipitation of the wet- 0.7. When deciding between highly correlated variables, test month, precipitation of the driest month, and precipita- we retained those that we expected to represent environmen- tion of the warmest quarter, and precipitation seasonality tal ‘b ottlenecks ’ which would impose physiological or (the coeffi cient of variation of monthly precipitation). 1019 Historic species’ records Modern species’ records and climate layers and climate layers Model trained with Model trained with historic data modern data Predictions at Predictions at historic test sites modern test sites Model Model evaluation evaluation Historic test Historic Modern Modern test pseudo- test test pseudo- absences presences presences absences (PSA) (PSA) Historic low- Modern low- Historic Modern confidence confidence capture capture test absences test absences history history (LCA) (LCA) Historic high- Modern high- confidence OccupancyOccupancy confidence test absences modeling modeling test absences (HCA) (HCA) Figure 2. An outline of the study design. For each era occurrence records for each species and contemporaneous climate layers were used to train historic and modern models using one of six algorithms. Each model was then projected to the same era and opposing era using the respective climate surfaces. Capture histories at each Grinnell site in each era were used to generate test presences and three sets of test absences: randomly located ‘ pseudoabsences ’ (PSA) across the Sierra Nevada and southern Cascades, low-confi dence absences (LCA) inferred from non-detections at a site, and high-confi dence absences (HCA) inferred from occupancy modeling. Predictions from the SDMs were then compared to presences and each set of absences at Grinnell sites to evaluate the SDMs. Species distribution models background sites (save for SVMs for which we used a num- ber of target background sites equal to the number of We compared performance of six SDMs: BIOCLIM (Busby training presences for each species to increase model stabil- 1991), boosted regression trees (BRTs; Elith et al. 2008), ity). Background sites for BRTs, GAMs, and GLMs generalized additive models (GAMs; Wood 2006), general- were weighted to have the same infl uence as the number of ized linear models (GLMs), MAXENT (Phillips et al. presences (Maggini et al. 2006). 2006), and support vector machines (SVMs; Guo et al. 2005). Th ese models were chosen because they are among the most popular SDMs in use or, in the case of Evaluation data: Grinnell surveys and resurveys BIOCLIM, represent niches in a simplistic manner so may transfer through time better than more complex Between 1900 and 1939 Joseph Grinnell and his colleagues formulations. Supplementary material Appendix 2 contains conducted an extensive inventory of terrestrial vertebrate detailed descriptions and information on model imple- species in California (Grinnell and Storer 1924, Grinnell mentation. We also calculated two ensemble models using et al. 1930, Sumner and Dixon 1953). Our resurveys the arithmetic mean (EMEAN) and median (EMED) of focused on three elevational gradients in the Sierra Nevada output from all of the individual models save BIOCLIM. and southern Cascades that have experienced relatively We excluded BIOCLIM from the ensembles because it little human development over the past century (Fig. 1; uses only presence data, whereas all of the other techniques Moritz et al. 2008, Tingley et al. 2012): Lassen (surveyed at utilize the same presence and background data, with SVMs elevations spanning 80 to 2510 m and centered on what being the exception (described below). Predictions for each is now Lassen Volcanic National Park and National model were rescaled to the range [0, 1] before ensembling Forest), Yosemite (from 50 to 3280 m; focused on Yosemite (Mateo et al. 2012). National Park), and the southern Sierras (from 120 to 3640 m; We used records from all non-domesticated, non- including Sequoia and Kings Canyon National Parks and managed mammals in the study region as target back- Sequoia, Sierra, and Inyo National Forests). We perused ground sites to minimize sampling bias in geographical Grinnell and colleagues ’ historical fi eld notes and specimen (i.e. environmental) space (Phillips et al. 2009) for all records to ascertain locations of survey sites, species caught, SDMs except BIOCLIM, which does not require back- the number of traps set per night (trapping eff ort), and ground data. Each model was trained using the focal the pattern of captures across nights at each site to use for species ’ presences and 10 000 randomly selected target occupancy modeling to validate absences. Between 2003 and 1020 2011 we resurveyed these and similar sites across the same Th e third set consisted of ‘ high-confi dence ’ absences regions, yielding 61 sites surveyed in both the historic and (HCA) inferred from occupancy modeling (MacKenzie modern era, plus an additional 29 sites surveyed in just et al. 2006), which uses the pattern of detections (detected/ the historical era and 75 in the modern era, for a total of not detected) across successive nights at each site within 90 historical and 136 modern sites for occupancy modeling an era to estimate the probability that a species was present and SDM evaluation (Supplementary material Appendix 2, but not detected (Supplementary material Appendix 2, Table A2). Following Moritz et al. (2008) and Tingley Table A2). Detailed procedures for occupancy models are et al. (2012), we defi ned a site as a 2-km radius circle and described in Moritz et al. (2008) and in Supplementary within a 100-m elevational band around a point (usually material Appendix 2, so are briefl y presented here. We used a campsite), since trapping eff ort encompassed a range of the single-season occupancy framework to estimate the habitats within this area. Hereafter we refer to these locations probability of a false absence at each site in each era for as ‘ Grinnell ’ sites. Supplementary material Appendix 2 each species, derived from averaging across a suite of detect- provides detailed descriptions of the historic and modern ability and occupancy models that incorporated trapping survey methods, and Tingley et al. (2012) describes the eff ort, elevation, and era as covariates. Sites where the target three test regions. Data from these sites were used for testing species was not detected were assumed to be true absences the SDMs but were not part of the training data. if the probability of false absence was (cid:6) 0.10 (Rubidge Th e climate of the Grinnell sites and the western US et al. 2011). Sites where a species was not detected and as a whole changed noticeably over the past century with a probability of false absence (cid:4) 0.10 were excluded (Supplementary material Appendix 1, Table A1). Between from the HCA, meaning they were a subset of the LCA. the historical and modern survey periods, mean annual Hereafter, when we refer to the PSA, LCA, and HCA evalu- temperature increased by 0.4° C in the western US and by ation sets we implicitly include species ’ test presences as 0.3 ° C at Grinnell sites, while mean annual precipitation well as the relevant type of absences. increased by 34 mm in the western US and by 10 mm For each species we evaluated SDM performance for two at Grinnell sites. Relative to the western US, Grinnell sites within-era and two cross-era projections. Th e historic-to- were on average cooler and wetter, and had greater fl uctua- historic projection (HH) used historic training and test data, tions in annual precipitation and temperature. Generally, and the modern-to-modern (MM) comparison used modern environmental minima (minimum temperature of the training and test data. Th e two cross-era projections (historic- coldest month and minimum precipitation of the driest to-modern, HM; and modern-to-historic, MH) used training month) at the Grinnell sites increased between eras, while data in one era and test data in the other. SDM predictions maxima (maximum temperature of the warmest month were extracted from then averaged across pixels within a 2-km and precipitation of the wettest month) remained roughly radius at each test site to match the scale of a Grinnell site. constant relative to their range (Fig. 1e, f). Threshold-independent analysis of model Assessing the effects of false absences on model performance performance SDM performance was evaluated using the area under We assessed model performance using the observed pres- the receiver-operator curve (AUC) and the correlation ences at the Grinnell sites and three sets of absences of between predicted values and the probability of presence and varying quality. Th e fi rst set consisted of ‘ pseudoabsences ’ absence (COR; Elith et al. 2006). For the PSA set, AUC (PSA), or randomly-located sites from across the test equals the probability that a randomly chosen presence site region, an 80-km buff er around the U.S. Environmental will have a higher predicted value than a randomly located Protection Agency ’ s Sierra Nevada ecoregion (which site (Phillips et al. 2006). For the LCA and HCA sets, includes the southern Cascade Range; Omernik 1987; AUC equals the probability that a randomly chosen presence Fig. 1a). PSA are commonly used for evaluation when site has a higher predicted value than a randomly chosen absence data are unavailable (Hernandez et al. 2006, Phillips absence site, where ‘ absence ’ is a low- or high-confi dence et al. 2006, Stralberg et al. 2009). We set the number of absence. COR represents the model ’ s ability to predict PSA sites equal to the number of Grinnell presence sites the probability of presence (or ‘ pseudopresence ’ , if PSA is for each species to avoid bias in test metrics caused by used). Prevalence was kept at 0.5 for the PSA tests by using unequal prevalence (ratio of presences to presences plus the same number of pseudoabsences as there were test absences; McPherson et al. 2004, Foody 2011). Th is process presences for each species but varied by species and era was repeated 1000 times for each test set (test presences for the LCA and HCA tests (Supplementary material kept the same, PSA changing each time) to stabilize the stan- Appendix 2, Table A2). dard error of performance metrics to (cid:3) 0.01 across replicated We used a two-tiered approach to determine the eff ects PSA using the same presences. of model algorithm, projecting across time, and autecologi- Th e second absence set consisted of ‘ low-confi dence ’ cal traits on model performance. Both tiers involved calcu- absences (LCA) inferred from non-detections at Grinnell lating linear regressions with AUC or COR from evaluation sites in each era (Supplementary material Appendix 2, of the PSA, LCA, or HCA sets (or all sets combined) as Table A2). Th is type of absence is similar to presence – the response variable with SDM, projection (historic- absence data sets in which non-detection is assumed to to-historic, modern-to-modern, modern-to-historic, and indicate absence of the species. historic-to-modern), and their interaction as factors. 1021 Th e fi rst tier of models also included ‘ species ’ as a fi xed Predicting threshold-independent performance eff ect. We reasoned that if autecological traits infl uenced across eras species ’ propensity to be in equilibrium with their environ- ment – and thus increase model performance (Nogu é s- Th e performance of SDMs against test data from the Bravo 2009, Wiens et al. 2009) – then they would together same era and region as the training data is often used as an explain as much variation in model performance as a indicator of performance of models projected across time simple ‘ species ’ term. Hence, in the second tier of models periods (Broennimann et al. 2006, Loarie et al. 2008, we replaced the ‘ species ’ term with 10 autecological traits: Ogawa-Onishi et al. 2010, Saupe et al. 2011). To test this activity cycle (nocturnal/diurnal/both), annual rhythm assumption we calculated Pearson ’ s correlation coeffi cients (hibernator/non-hibernator), diet (omnivore/granivore/ for within-era performance versus cross-era performance insectivore/herbivore), adult mass, litter size, litters per (e.g. HH AUC across species vs HM AUC or MM AUC year, young per year, range area, and climatic niche breadth across species vs MH AUC). We performed separate and marginality (data from Moritz et al. 2008, Jones correlations for each absence type and across absence et al. 2009, and the IUCN Red List at (cid:3) www.iucnredlist. types: PSA within-era performance vs PSA cross-era perfor- org (cid:4) ). Niche breadth (the range of climatic conditions in mance, PSA within-era vs LCA cross-era, PSA within-era vs which the species is found relative to the available climatic HCA cross-era, LCA within-era vs HCA cross-era, and space) and marginality (the diff erence between the species ’ HCA within-era vs HCA cross-era. Others have used climatic niche and the center of the distribution of avail- the transferability index from Randin et al. (2006) for this able climate), were calculated using ecological niche factor purpose. However, accuracy varied by absence types, making analysis (ENFA; Hirzel et al. 2002) with mean annual use of this index problematic because it is penalized temperature and precipitation at all training presence sites when accuracy of one set diff ers from another, even if one in each era. We also included the mean detectability of set predicts the other well. However, for comparative each species given that it was present estimated from purposes we also calculated a modifi ed transferability occupancy modeling as a covariate. We initially desired index between like absence sets (i.e. PSA within-era perfor- to include number of training presence sites, but it was mance vs PSA cross-era performance, LCA vs LCA, and strongly correlated with range size (r (cid:7) 0.63, p (cid:7) 0.005, HCA vs HCA) using Eq. A1 (Supplementary material n (cid:7) 16), so retained the latter. Appendix 3). We also included other factors in the regressions, depend- ing on the test set. For the regressions with all absence Site-level turnover and threshold-independent sets combined we added absence type (PSA, LCA, HCA) as performance a factor to determine the eff ect of absence quality on apparent model performance. Test prevalence and its qua- We also examined the relationship between model perfor- dratic term was included as a ‘ nuisance ’ variable in analyses mance and turnover (colonization and extinction) at the of LCA and HCA AUC and COR since an unequal 61 matching Grinnell sites that were surveyed in both number of test presences and absences can aff ect perfor- the historic and modern eras. Turnover was defi ned as the mance metrics (McPherson et al. 2004, Foody 2011). number of sites changing status across time (present-to- Number of test sites (presences (cid:8) absences) was also used absent or absent-to-present) divided by the total number of as a covariate in analyses of performance against PSA and sites in which species changed status or stayed the same HCA since it can also infl uence apparent performance (present – present or absent – absent). A species was consid- (Bean et al. 2012). Th e number of test sites for the LCA ered ‘ present ’ if it was detected at a site or ‘ absent ’ if it analysis was equal to the number of Grinnell sites in each met our criteria for inclusion in the HCA data set. Pearson era so did not diff er between species, and therefore was not correlation coeffi cients were calculated across species used in analyzing the LCA set. between turnover rates and the average of HM and MH AUC and COR were transformed using a modifi ed HCA AUC for each SDM to determine how turnover logit function prior to analysis following Warton and correlated with model performance. Hui (2011; COR was fi rst transformed to the range [0, 1] using ( x (cid:8) 1)/2). All continuous predictors were log trans- formed, centered by subtracting their log means, and Threshold-dependent analysis of model performance standardized by their transformed standard deviations prior to analysis except for detectability, which was logit- Finally, we examined the ability of SDMs to correctly transformed then centered and standardized since it took predict presences and absences after thresholding model the range [0, 1] (Warton and Hui 2011). output to a binary presence/absence state. Two commonly- Contrasts between levels of SDM, projection, and used thresholds based on sensitivity (proportion of pres- absence type in the regressions were explored using Tukey ences correctly predicted) and specifi city (proportion of HSD tests when these factors were signifi cant. We then absences correctly predicted) were applied (Liu et al. employed stepwise forwards-backwards model selection 2005): one that maximized the sum of sensitivity and with p (cid:6) 0.05 for inclusion of a term. To discern the specifi city (MSSS) and another that minimized the diff er- contribution of each factor to variation in AUC or COR ence between sensitivity and specifi city (MDSS). Th resholds we applied variance partitioning to the fi nal models were calculated for each absence set separately using the (Gr ö mping 2007). test presences and the absences of each set. We applied the 1022 within-era threshold to the projection of the opposing era in the fi nal model, niche marginality contributed more to mimic the situation in which modelers fi nd themselves than twice as much as any other autecological factor (0.12 of when projecting to a time period from which they have total R2 ; Table 2 and Supplementary material Appendix 3, no test data (i.e. the HH threshold was applied to HM Table A4) and was positively correlated with performance. projections and MM threshold to MH projections). In some cases AUC was (cid:3) 0.5, indicating predictions Omission rates (the proportion of presences incorrectly worse than random. Among SDMs and test sets poor predicted to be absences) and commission rates (the pro- performance was most common for BIOCLIM and GLMs portion of absences incorrectly predicted to be presences) tested against PSA or LCA. Among species P eromyscus were calculated for each combination of absence type, maniculatus performed consistently poorly (mean HCA threshold, species, SDM, and projection. Omission or AUC (cid:7) 0.58 (cid:2) 0.02) while other species performed consis- commission rates for each threshold were analyzed in sepa- tently well, especially T amias amoenus (mean HCA AUC (cid:7) rate analyses of variance using absence type, SDM, projec- 0.91 (cid:2) 0.01) and R eithrodontomys megalotis (0.92 (cid:2) 0.02). tion, all possible two-way interaction terms between these factors, and species as covariates. Error rates were logit- transformed before analyses (Warton and Hui 2011). Predicting threshold-independent performance across eras from within-era performance Modelers are often in the position of having to assume that Results model performance against test data drawn from the same region and time period correlates with performance in Threshold-independent analysis of model another time period from which data is unavailable. We performance found that HCA AUC from within-era projections (HH or Absence type was a signifi cant predictor in regression MM) signifi cantly and positively correlated with cross-era models of threshold-independent performance for all com- HCA AUC for BIOCLIM, BRTs, and SVMs, regardless of parisons (Table 1 and Supplementary material Appendix 3, the temporal direction in which the cross-era projection Table A3). Mean AUC ((cid:2) standard error) increased signifi - was conducted (Table 3 and Supplementary material cantly with the quality of absences from 0.76 (cid:2) 0.01 for Appendix 3, Table A6). Surprisingly, within-era LCA AUC PSA to 0.79 (cid:2) 0.01 for LCA to 0.81 (cid:2) 0.01 for HCA was nearly always a good predictor of cross-era HCA AUC. (Fig. 3c). Hereafter we focus on tests using the HCA data set, Predicting cross-era HCA AUC using within-era assess- since it best refl ects patterns of true presence and absence; ments against PSA was reliable only for BIOCLIM, but results for PSA, LCA, and all test sets combined are pre- this model also had below-average performance (Fig. 3b). sented in Supplementary material Appendix 3. Results for Th e ability to predict performance in one direction COR were qualitatively very similar to analysis of AUC and (e.g. MM vs MH) did not necessarily imply equivalent are also presented in Supplementary material Appendix 3. ability in the opposing direction (e.g. HH vs HM). For HCA AUC did not signifi cantly diff er between projec- example, when using within-era PSA AUC to predict tions (Table 1, Fig. 3a), meaning models performed as well cross-era HCA AUC for GAMs, the correlation between when projecting within eras as across eras. Projection con- MM AUC and MH AUC was 0.47 (p (cid:7) 0.049), but fell to tributed little to total R ² in regressions with species as a fi xed 0.22 (p (cid:7) 0.380) for the HH vs HM comparison (Table 3). eff ect, or in regressions replacing ‘ species ’ with autecological We found fairly high average model transferability within traits (Table 2 and Supplementary material Appendix 3, absence types with no signifi cant diff erences between SDMs Table A4). within the same absence type (Supplementary material SDM algorithm was marginally signifi cant (p (cid:7) 0.051) in Appendix 3, Fig. A1). regressions of HCA AUC with ‘ species ’ as a term but was signifi cant in regressions with ‘ species ’ replaced by aute- Site-level turnover and threshold-independent cological traits (Table 1). Mean AUC across species and performance projections varied by SDM from 0.76 (GLM) to 0.85 (EMED). Th e two ensemble models performed equally Mean turnover (colonization (cid:8) extinction rate) at the well and better than BIOCLIM and GLM, with the other Grinnell sites surveyed in both historic and modern eras was models having intermediate performance (though these 17 (cid:2) 3% (Supplementary material Appendix 3, Table A7). diff erences are tentative given the marginal signifi cance of Some species experienced substantial rates of turnover SDM in the regression model; Fig. 3b). (e.g. Z apus princeps at 42% of sites), whereas other species Species identity had the largest eff ect on model perfor- experienced none (e.g. 0% for T amias senex ). Average mance, and was always signifi cant in the fi rst-tier models cross-era AUC was not correlated with turnover (p (cid:4) 0.05 (Table 1). Alone it explained 0.36 of the variance in HCA for each SDM) except for SVMs, for which the relationship AUC (Table 2 and Supplementary material Appendix 3, was negative (r (cid:7) (cid:9) 0.62, p (cid:7) 0.005, n (cid:7) 18). Table A4). However, when the ‘ species ’ term was replaced with autecological traits in the second tier models, the traits that remained after stepwise model selection together Threshold-dependent analysis of model performance contributed only 0.28 to total R2 , suggesting that addi- tional traits not included in our analysis may explain diff er- To simplify presentation we focus on omission and commis- ences in performance among species. Of the traits retained sion errors from application of the MSSS threshold, leaving 1023 Figure 3. AUC as a function of (a) projection, (b) SDM, (c) absence type, and (d) species. In each panel dark bars are tests against pseudoabsences (PSA), light bars against low-quality absences (LCA), and white bars against high-quality absences (HCA). In (a), (b), and (c) diff erent letters denote groups that are signifi cantly diff erent (p (cid:6) 0.05) using Tukey HSD tests within each absence type. Contrasts between groups were generally only calculated if the relevant term was signifi cant in analyses of variance. A signifi cant interaction between SDM and projection precludes displaying signifi cance groups for the PSA set in the fi rst two panels. Signifi cance groups are coded by letter for each absence type in panels (a) and (b) and between absence types in panel (c). SDM was only marginally signifi cant in tests of HCA AUC so signifi cance groupings for HCA in panel (b) are only suggestive of diff erences, not indicative. Signifi cance groups are not shown in (d), but species has a signifi cant eff ect within each absence type. Mean AUC decreases with the order of the signifi cance group (e.g. group ‘ a ’ has the highest AUC, ‘ b ’ the second highest, etc.). In general, tests are worst against PSA and best against HCA, but AUC varies most by species. Tops of boxes, horizontal lines within boxes, and bottoms of boxes represent the upper 75%, median, and lower 25% quartiles, respectively. Dashed vertical lines extend to the lesser/greater of the maximum/minimum value and 2 standard derivations from the mean. Abbreviations: Call late: C allospermophilus lateralis , Chae cali: Chaetodipus californicus , Micr cali: Microtus californicus , Micr long: M. longicaudus , Micr mont: M. monticolus , Neot fusc: Neotoma fuscipes , Neot macr: N. macrotis , Pero boyl: Peromyscus boylii , Pero mani: P. maniculatus , Pero true: P . truei , Reit mega: Reithrodontomys megalotis , Sore mont: Sorex monticolus , Sore vagr: S . vagrans , Tami amoe: T amias amoenus , Tami sene: T . senex , Tami spec: T . speciosus , Uroc beld: U rocitellus beldingi , Zapu prin: Z apus princeps . ns (cid:7) not signifi cant. analysis of the MDSS threshold for Supplementary material Commission errors against the LCA and HCA sets were Appendix 3. Across all species, SDMs, and projections equal to one another and lower than against the PSA mean omission and commission rates against HCA across set (Fig. 4g). Species was always a signifi cant factor in ana- were 0.19 (cid:2) 0.01 and 0.25 (cid:2) 0.01, respectively, indicating lyses of omission and commission rates. that SDMs tended to predict false presences more than false absences using the MSSS threshold (t -test paired by SDM, projection, and species: p (cid:7) 10 (cid:9) 5 , t (cid:7) 4.366, DF (cid:7) Discussion 575). In contrast to the threshold-independent analyses, regressions of omission errors indicated that overall rates var- Th e temporal transferability of SDMs is of keen interest ied by projection and its interaction with SDM (Table 4), for conservation practitioners. Numerous studies have notably for BIOCLIM, MAXENT, and SVMs (Fig. 4b). used SDMs to forecast severe range loss and even extinction Most SDMs had equal commission error rates (Fig. 4f). of species due to anticipated global change (Th omas et al. 1024 Table 1. Regressions on AUC for each absence type and all Table 2. Partitioning of variance in AUC for the high-quality absences three absence types together using ‘ species ’ as a fi xed factor (see (HCA) set in regression models with ‘ species ’ as a fi xed term or Supplementary material Appendix 3, Table A3 for analysis of COR). replacing ‘ species ’ with autecological traits (see Supplementary Sums of squares are calculated for each term when it is entered last material Appendix 3, Table A4 for other absence types and Table A5 into the model. Number of test sites was not included in the LCA for analysis of COR). Values represent each term ’ s contribution to analysis, nor was prevalence for the PSA analysis. Species is signifi - R 2 . For each absence type a simple regression with projection, SDM, cant in every analysis. SDM is signifi cant in each analysis except projection (cid:10) SDM, and species was analyzed (prevalence and its for AUC calculated for high-quality absences (HCA), in which it is square and number of test sites were included as ‘ nuisance ’ terms). only marginally signifi cant. Absence type is signifi cant in the ana- The species term was then replaced with autecological traits that lysis combining all three absence sets together. Bold values high- were expected to infl uence SDM performance; if traits infl uence light signifi cant factors. model performance substantially, then they should be expected to explain as much variance as the ‘ species ’ term they replace. Terms Source DF Sum of Squares F p were only included in the fi nal partitioning if they were signifi cant Performance against pseudoabsences (PSA) (p (cid:6) 0.05) in a forwards/backwards model selection procedure. Pluses and minuses in parentheses indicate the direction of the rela- Projection 3 0.966 5.241 0.872 tionship for non-categorical variables in the fi nal model. ns: not sig- SDM 7 1.097 2.550 10 (cid:2) 14 ∗ nifi cant; autecological trait. Projection (cid:10) SDM 21 2.780 2.154 0 .002 Species 17 41.079 39.320 10 (cid:2) 16 Term R 2 Number of test sites 1 0.624 10.149 0 .002 Regression with ‘ species ’ as a term Error 526 32.326 Projection 0.01 Performance against low-quality absences (LCA) SDM 0.05 Projection 3 0.044 0.328 0.805 Species 0.36 SDM 7 1.151 3.694 0 .001 Prevalence (cid:8) (prevalence)2 0.04 Projection (cid:10) SDM 21 0.645 0.691 0.844 Number of test sites 0.00 ( (cid:9) ) Species 17 18.442 24.376 10 (cid:2) 16 Total 0.47 Test prevalence 1 0.111 2.496 0.115 Regression replacing ‘ species ’ with traits (Test prevalence) 2 1 0.035 0.786 0.380 Projection ns Error 525 23.364 SDM 0.05 Performance against high-quality absences (HCA) Prevalence (cid:8) (prevalence)2 0.05 Projection 3 0.219 0.851 0.466 Number of test sites ns SDM 7 1.212 2.022 0.051 Detectability in test era 0.01 ( (cid:9) ) Projection (cid:10) SDM 21 0.773 0.430 0.988 ∗ Activity cycle 0.04 Species 17 30.133 20.704 10 (cid:2) 16 ∗ Annual rhythm 0.03 Test prevalence 1 1.640 19.162 10 (cid:2) 5 ∗ (Test prevalence)2 1 1.157 13.517 10 (cid:2) 4 Diet ∗ 0.03 Adult mass ns Number of test sites 1 0.547 6.393 0 .012 Litter size ∗ 0.02 ( (cid:8) ) Error 524 44.861 Litters per year ∗ 0.02 ( (cid:9) ) Performance against all absence types together ∗ Absence type 2 5.779 37.128 10 (cid:2) 16 YRoaunngge pareera y ∗e ar 0.0n1s ( (cid:8) ) PSPDrrooMjjeeccttiioonn (cid:10) SDM 2371 031...929823008 451...291018006 0 0 1..002 (cid:2)05 763 NNiicchhee ((EENNFFAA)) bmraeragdintha∗ li ty∗ 00..0112 (( (cid:9) (cid:8) ) ) Species 17 65.554 49.550 10 (cid:2) 16 Total 0.39 Total of autecological traits 0.28 Test prevalence 1 0.276 3.548 0.060 (Test prevalence) 2 1 0.066 0.846 0.358 Number of test sites 1 0.103 1.330 0.249 Error 1674 130.275 2010, K é ry 2011), we found that SDMs can produce accu- rate projections for some species even when high-quality absence data was unavailable for model calibration (e.g. 2004, Hijmans and Graham 2006, Loarie et al. 2008, R. megalotis , T . amoenus ; Fig. 3d). However, knowing Ogawa-Onishi et al. 2010), optimize resiliency of conserva- which models were accurate and which species were modeled tion reserves against climate change (Carroll et al. 2010), well depended on having high-quality absences for testing and predict the future connectivity of migration corridors (Fig. 3d and Fig. 4d, h). (Early and Sax 2011). Overall, our results suggest that High detectability of a species does not necessarily 1) assessment of true accuracy (within or across eras) depends obviate the need to apply occupancy modeling to diff erenti- on having high quality test data; 2) within-era accuracy ate false from true absences. In our study, the conditional unreliably predicts cross-era accuracy; and 3) accuracy diff ers probability of detection for a species at a site, given that it as a function of the SDM algorithm and type of projection, was present, averaged 0.80 (cid:2) 0.02 across species, sites, and but mostly by species. We discuss each fi nding below. eras. Despite this fairly high level of detectability, threshold- independent and -dependent measures of performance Absences and accuracy of SDMs varied with the quality of absences. For example, mean AUC for R. megalotis increased from 0.59 (cid:2) 0.02 against the Our results emphasize the importance of having high PSA set to 0.88 (cid:2) 0.01 against the LCA set to 0.92 (cid:2) 0.01 confi dence in absences when assessing the accuracy of for the HCA set (Fig. 3d). At fi rst glance this suggests that SDMs using either threshold-independent or -dependent when HCA is unavailable, models tested with PSA or LCA metrics. While attention has been directed to the confound- can be assumed to be more accurate than the available ing eff ect of false absences on model calibration (Lobo et al. data indicate. Th is would seem to imply that possession of 1025 Table 3. Pearson correlation coeffi cients for AUC of within-era projections vs AUC of cross-era projections. Strong correlations indicate performance of a cross-era projection can be predicted from performance of a within-era projection. Bolded values are signifi cant at p (cid:6) 0.05 (n (cid:7) 18 in each case). PSA AUC (within-era) vs LCA AUC (within-era) vs HCA AUC (within-era) vs HCA AUC (opposing era) HCA AUC (opposing era) HCA AUC (opposing era) SDM HH vs HM MM vs MH HH vs HM MM vs MH HH vs HM MM vs MH BIOCLIM 0 .53 0.82 0.70 0.82 0.67 0.80 BRT 0.33 0.40 0.65 0.44 0.52 0.47 GAM 0.22 0.47 0.63 0.57 0.46 0.36 GLM (cid:9) 0.09 0.46 0.64 0.83 0.42 0.67 MAXENT 0.42 0.25 0 .68 0.69 0.47 0.42 SVM 0.68 0.24 0.73 0.75 0.63 0.58 EMEAN 0.28 0.30 0 .62 0.64 0.44 0.40 EMED 0.30 0.26 0.66 0.56 0.42 0.28 HCA evaluation data, while advantageous, is not necessary, Supplementary material Appendix 3, Fig. A2). Th us our since assessments of performance against PSA or LCA are results suggest that the transferability of SDMs across time conservative. However, there is not a consistent positive may be a function of the type of output (thresholded or not relationship between model performance and quality of thresholded) used in the analysis and the quality of the absences. For example, the highest inferred accuracies for data used to assess accuracy. some species were against PSA data (e.g. Neotoma macrotis ; In this context, our fi nding that projection matters Fig. 3d), perhaps because PSA AUC can have a negative little to threshold-independent model accuracy is heartening relationship with accuracy evaluated using HCA (Smith since one of the primary applications of SDMs in conserva- in press). tion is to project the future potential ranges of species given anthropogenic global change (Wiens et al. 2009). However, we found the ability to predict cross-era perfor- Projecting across time mance using within-era performance varied by SDM and the particular combination of within- and cross-era projections Projections to diff erent time periods should be less accurate and absence types (Table 3 and Supplementary material than projections within the same era if species are not Appendix 3, Table A6). Th is is unfortunate since within- in equilibrium with their environment (Wiens et al. era accuracy is often used as a surrogate for cross-era 2009). In our study projection mattered little to threshold- accuracy when test data is unavailable in the target era independent measures of model performance when tested (Broennimann et al. 2006, Loarie et al. 2008, Ogawa-Onishi against HCA data (Table 1 and Fig. 3a), but it did infl u- et al. 2010, Saupe et al. 2011). Similar results were found ence omission and commission error rates for thresholded in studies of Canadian butterfl ies (Kharouba et al. 2009) predictions for some SDMs and absence types (Fig. 4 and and Californian plants (Dobrowski et al. 2011). To further compound the problem, we were unable to identify auteco- Table 4. Analyses of variance of omission and commission error logical traits that strongly explain the substantial among- rates vs the high-quality absence (HCA) test set for the threshold species variation in performance, the exception being niche that maximizes the sum of sensitivity and specifi city (MSSS; see marginality. Th us, we advise against assuming that the per- Supplementary material Appendix 3, Table A7 for the threshold formance of a SDM tested against data from the same region that minimizes the difference between sensitivity and specifi city). Bold values highlight signifi cant factors. See also Fig. 4 and and era indicates its ability to project accurately across time. Supplementary material Appendix 3, Fig. A2. In general, the few studies that have evaluated the performance of SDMs when projected across timescales Source DF Sum of Squares F p similar to ours fi nd cross-era performance is diminished Omission error rate relative to within-era performance (Araú j o et a l. 2005a, b, Absence type 2 2.985 5.022 0 .007 Projection 3 11.281 12.655 10 (cid:2) 8 Kharouba et al. 2009, Dobrowski et al. 2011, Rubidge SDM 7 3.774 1.814 0.080 et a l. 2011, Rapacciuolo et a l. 2012). In contrast, we Species 17 167.272 33.112 10 (cid:2) 16 found no decline in cross-era performance for threshold- SDM (cid:10) projection 21 19.733 3.162 10 (cid:2) 6 independent analysis (Fig. 3a) and declines for a limited SDM (cid:10) absence type 14 1.730 0.416 0.970 number of models in the threshold-dependent analysis Projection (cid:10) absence type 6 14.072 7.892 10 (cid:2) 8 (Fig. 4b, f and Supplementary material Appendix 3, Error 1657 492.4 Fig. A2b, f). Th ere are several reasons why our results Commission error rate may diff er from these studies. First, given the eff ect of Absence type 2 4.356 9.178 10 (cid:2) 4 absence quality on apparent performance, it might seem Projection 3 9.057 12.722 10 (cid:2) 8 that results from other studies were infl uenced by low- SDM 7 1.679 1.011 0.421 Species 17 85.485 21.191 10 (cid:2) 16 confi dence absences. Of similar studies, only Rubidge SDM (cid:10) projection 21 13.318 2.672 10 (cid:2) 5 et a l. (2011) applied occupancy modeling to diff erentiate SDM (cid:10) absence type 14 1.758 0.529 0.917 false from true absences, but they also found diminished Projection (cid:10) absence type 6 2.614 1.836 0.088 performance when models were projected across time. If Error 1657 393.2 the quality of absences infl uenced assessments of cross-era 1026

Description:
In each panel dark bars are tests against pseudoabsences (PSA), light bars .. sensitivity and specificity (MSSS). In panels (b) and (f) the darkest bars represent the historic-to-historic projection, the next-darkest the modern-to-historic projection, .. recluse ( Loxosceles reclusa). – PLoS One
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.