Post provided by Nick J. B. Isaac
To celebrate the 10th Anniversary of the launch of Methods in Ecology and Evolution, we are highlighting an article from each volume to feature in the Methods.blog.
For Volume 5, we have selected ‘Statistics for citizen science: extracting signals of change from noisy ecological data’ by Isaac et al. (2014). In this post, the authors discuss the background and key concepts of the article, and the application of the article for assessing biodiversity occurrence datasets.
The ecological literature is full of studies reporting that biodiversity has declined over time, but most of this evidence is from a narrow set of species for which population time series exist (mostly vertebrates and butterflies). Data on the rest of biodiversity is mostly in the form of millions of occurrence records (e.g. on GBIF), which detail the time and location where particular species were observed. It’s long been recognised that occurrence records have enormous potential for understanding biodiversity change more broadly, but until recently we lacked appropriate methods for doing so.
Using occurrence records in this way is not straightforward for a number of reasons. For one thing, they tell you only about the species that were present: they do not record absences. Second, we lack information about the sampling effort or the protocols, if any, that were used to generate the observations. Third, many occurrence records are created by volunteer citizen scientists, so the data reflect places that volunteers like to visit, as opposed to a random sample of the landscape. This means that occurrence records data contain several biases that need to be controlled.
Trends from Occurrence Data
A range of methods have been proposed to calculate trends from occurrence records data, from resampling to account for changes in effort over time and attempting to filter out high-quality portions of the data. In the past decade, sophisticated models have been developed that attempt to capture elements of how the data were generated. The best known of these approaches is occupancy modelling, originally developed to account for imperfect detection during repeated population surveys. In 2010, Marc Kéry and others proposed that occupancy models might be used on occurrence records. They overcame the “presence-only” problem by inferring non-detections from records of closely-related species. Occupancy models are hierarchical, with separate submodels for occupancy (the state of interest) and detection (the data generation process). Covariates on the detection submodel provide a convenient and flexible way to address some of the biases that are present in occurrence records data. Given the proliferation of methods, there was an urgent need to understand which, if any, were robust to biases in the data.
In our Methods in Ecology and Evolution paper, ‘Statistics for citizen science: extracting signals of change from noisy ecological data’, we compared the statistical properties of a range of methods, including occupancy models, using computer simulations. We first analysed occurrence records data from UK and the Netherlands to parameterise six different scenarios of recording, designed to capture a range of biases in the data generation process.
We tested the validity of each method by measuring the proportion of simulations in which a significant trend was detected for a species whose occupancy did not change (i.e. the Type I error rate) under each recording scenario. We compared the power of each method by measuring the proportion of simulations in which a changing distribution could be detected under each recording scenario.
The conclusions were straightforward: simple methods fail because the data are biased in multiple ways, whereas sophisticated methods were robust under multiple bias scenarios. Occupancy models were the most powerful method, and were robust to biased recording when appropriate covariates were included in the detection submodel.
Applying the Winning Formula
After publication, my research group started applying the winning formulation to datasets of occurrence records from UK national recording schemes and societies for organisms as diverse as moths, spiders and lichens. However, we soon noticed that results for a vast majority of species were not useable. We worked with statisticians to come up with an improved formulation that allows information to be shared across years. This variant formed the basis of high-profile papers on trends in UK pollinators and other biodiversity. These models contribute to biodiversity indicators both in the UK and the Netherlands, as well as periodic State of Nature reporting.
We’ve also extended this approach to hypothesis testing, revealing the effect of neonicotinoid pesticides on wild bees. However, the model does not appear to capture the data generation process for all groups of organisms, notably vascular plants. We’ve come to realise that it’s not possible to capture all the biases without much better information about how the data were generated.
It seems that we may be approaching the limit to what that can be learned about biodiversity change from using occurrence records alone. In response to this, we’ve started building occupancy models in which the occurrence records are augmented by data from systematic surveys. These “integrated distribution models” have enormous potential for extending the applications of existing data and in the design of new monitoring schemes. Integrated models bring a whole new series of methodological questions, some of which are now beginning to be answered.
Find out about the Methods in Ecology and Evolution articles selected to celebrate Volumes 1-3: