Post provided by ÁDÁM T. KOCSIS

The source of occurrence data: fossil collections (Early Jurassic ammonites in the collection of the University of Erlangen-Nuremberg, photo by Konstantin Frisch)
The source of occurrence data: fossil collections (photo by Konstantin Frisch).

To find out about changes in ancient ecosystems we need to analyse fossil databases that register the taxonomy and stratigraphic (temporal) positions of fossils. These data can be used to detect changes of taxonomic diversity and to draft time series of originations and extinctions.

The story would be so simple if it wasn’t the effects of heterogeneous and incomplete sampling: the white spots in our understanding of where and when species lived exactly. This phenomenon decreases the fidelity of face-value patterns extracted from the fossil record, making them less reliable. It must be considered if we want to get a glimpse into the biology or the distribution of life in space and time. Naturally, several metrics have been proposed to overcome this problem, each claiming to accurately depict the patterns of ancient life.

Scripting Analyses Should Not be a Pain

By the time the Paleobiology Database (PaleoDB) was created around the turn of the century, people were already relying on computer scripts to analyse the global fossil record. In the last two decades, the database has developed into a centralised source for macrofaunal palaeontological analyses. It also has an associated default toolkit to calculate the time series that form the basis of most research using it.

Unlike for community ecological problems though (where software to analyse data were already present), researchers had to implement the methods in this toolkit themselves. The lack of a readily deployable, user-friendly toolset to apply rigorous methods of diversity dynamics to fossil data has been a continuous source of headaches for uninitiated researchers: both seasoned specialists with little programming experience as well as students that have just set out to discover the thrill of data analysis.

Time series of extinctions with different methodological pathways.
Time series of extinctions with different methodological pathways.

Even for seasoned researchers, adapting a piece of code to generate a time series of extinction rates posed its own challenges. A lot of time was wasted with problems like the ‘proper alignment of the output vector with numeric ages’ and the re-implementation of subsampling methods. It’s also difficult to assess the robustness of past results and the comparability of new ones when the supposedly exact implementations of analytical methods might lead to different results. Our aim with the divDyn R package is to make the most widely used methodology available as an easy to use and fast toolkit to analyse fossil occurrence data.

Report on Phanerozoic-Scale Patterns

To demonstrate the relevance of our package, we reproduced a published study about the classic patterns of Phanerozoic life. Using divDyn, we recalculated time series of extinction and origination rates from PaleoDB data with several frequently used methodological steps. We then reassessed support for mass extinctions, the decline of background turnover and equilibrial diversity dynamics. You can find out more about this in our Applications paper ‘The R package divDyn for quantifying diversity dynamics using fossil sampling data‘, recently published in Methods in Ecology and Evolution.

To promote the transparency of scripting, we documented every step of our analysis from data download to statistical tests. We’re also planning to generate a report with every major release of the package using up-to-date occurrence data. Additions and suggestions to this analytical pipeline are most welcome. We hope that it will reflect a collaborative effort of the palaeontological community and can serve as a template for future analyses. This applies to the software as well: our intention is to develop divDyn into a comprehensive library that reflects the analytical needs of the community.

Although in this example we applied divDyn to PaleoDB data, it is applicable to any occurrence datasets that register geological time. The package is already accessible from the CRAN servers. We welcome all issue reports via email or at the package development GitHub page.

To find out more about divDyn, read our Methods in Ecology and Evolution article ‘The R package divDyn for quantifying diversity dynamics using fossil sampling data’ (No Subscription Required).