Post provided by Anthony Ives
It was a true privilege to be asked to write the inaugural E. C. Pielou Review for Methods in Ecology and Evolution. The first ecology book I bought as an undergraduate was her Ecological Diversity (1975) which still sits on my bookshelf full of marginalia. Both ecology and evolution have long and rich histories of theoretical and empirical work, yet sometimes theory and observation have been only loosely connected. Pielou’s work made it possible to link theory and observation more tightly by providing quantitative, statistical metrics to describe patterns in the world that can be related back to theory.
Huge statistical advances have occurred over the last five decades. You have statistical tools at your fingertips now that were only dreamed of when Chris Pielou started her career. My review was spurred, however, by concern that researchers do not take full advantage of these tools to analyze their data. My concern isn’t that researchers fail to use fancy new methods: old, tried-and-true methods are often better. Instead, my concern is that, after analyzing data, researchers don’t take full advantage of the fruits of their analysis. Often, information produced in a statistical analysis is ignored when it could be productively used to answer interesting questions.
Statistical metrics and models give the tools to test theory with data
Statistical methods in ecology and evolution generally fall into two categories: metrics and models. Statistical metrics, such as Pielou’s diversity index, take complex data such as the species composition of ecological communities and summarize them with a single number. This can then be used to compare communities. The strength of this approach is that it boils down complex data to the essential quantity of interest. This strength is also a weakness, however, because it necessarily means that large amounts of information in the data are ignored – any information not captured by a single number. Therefore, results may be less rich than the patterns truly in the data, and maybe worse, ignoring information could result in the wrong conclusions being drawn from a statistical metric.
In contrast to statistical metrics, statistical models attempt to give a more-complete description of data. Statistical models describe data in terms of statistical distributions characterized by means, variances, covariances, and higher statistical moments. In ecology and evolution, key advances over the last four decades in statistical modeling involve more-realistic and accurate characterization of random errors.
The unfortunately misnamed “random errors” contain patterns in the data (variances, covariances, etc.) other than those involving means. You are likely familiar with random-effects models that account for non-independence of random errors: random effects generate correlation among random errors. Correlated random errors appear ubiquitously, although they often take different names in different types of models. Random effects, spatial correlation, phylogenetic signal, and temporal autocorrelation are all types of correlated random errors, and although they might appear in very different contexts, they share many of the same statistical challenges and solutions.
Make the most of your data
Although statistical models potentially give more information about data than statistical metrics, often this information is ignored. Huge statistical effort has been made to develop models that can fit complex data with correlated random errors, yet these models are often used in the same way as metrics: only simple values like regression coefficients are retained, and information residing in the random errors is discarded.
If you go through the challenges of fitting a statistical model to data, you owe it to yourself to use the model to the fullest extent possible. The random errors are generally neither random nor errors: the correlations among random errors revealed by statistical models are the products of unmeasured variables. Although unmeasured, some information about these variables can be extracted from the correlations among random errors. This should be routine statistical practice.
My goal in the review is to champion random errors by showing some of the types of information they contain and how to extract them.