Anacapa Toolkit: Automating the Cataloguing of Biodiversity

Post provided by Emily Curd

Imagine that you want to catalogue all of the biodiversity (all of the living organisms) from a particular location; how many trained experts would that require? How many person hours would it take to collect and identify all of the rare, well-disguised, and microscopic organisms? How many of these organisms would have to be removed from the environment and taken back to a lab for taxonomic analysis.

With eDNA, you can survey the presence of this gorgeous opalescent nudibranch without capturing or even touching it.
©Natural History Museum of Los Angeles County — Amanda Bemis & Brittany Cumming

Although there is no substitute for human expertise, we have begun using the traces of DNA that organisms leave behind (e.g. excretions, skin and hair cells) in the environment to catalogue biodiversity. These traces of DNA, referred to as environmental DNA, can persist in the environment for minutes or can persist for centuries depending on where they end up. This field of environmental DNA (eDNA) is rapidly becoming an effective tool to complement surveys of biodiversity, both past and present.

Applications of Environmental DNA

You don’t even need to stop
diving to filter eDNA samples!
Just hang the filtration
apparatus from the side of
the boat and let gravity do
the rest. ©Zack Gold.

The power of eDNA research is that it requires very little training and people power to collect large numbers of samples from a variety of habitats. A bonus of using eDNA tools is that a lot of data can be generated from individual samples. Researchers and community scientists all over the world are already collecting freezers full of environmental samples (e.g. 1 gram of soil, 1 liter of seawater, filtered air). These samples are contain the bits of DNA that can be replicated using metabarcodes (short DNA sequences) to produce millions of sequences which are diagnostic for a group of organisms. These metabarcodes can be specific to a small group of organisms (e.g. a family of fish) or entire domains of life (e.g. the bacteria and archaea).

Metabarcoding can be extremely powerful for detecting biodiversity, managing species, and understanding how ecosystems function. For example, my coauthor Zack Gold found more Southern California kelp forest fish species in one liter of seawater (n=90) than he observed in over 300 hours of diving (n=50) off the Southern California coast. Land managers use eDNA to detect invasive species like wild boar, which allows them to be more time and cost efficient in their eradication efforts. Additionally, microbial ecologists and ecosystem scientists have been using eDNA for more than a decade to understand how microbial species drive ecosystem function.

Computational Considerations When Working With eDNA

Computation tools for eDNA research are still developing. In order to figure out which taxa are in a sample, a researcher needs to compare the metabarcode sequences with a reference database for comparison. There are few automated tools to make metabarcode specific and comprehensive databases, especially tools that consistently generate the same database given a set of parameters. Typically, a researcher needs several markers to explore an entire community. However, there are not many tools designed specifically to process multiple metabarcodes per sample. Many existing tools also throw out data because it does not fit a specific format. Depending on chosen sequencing platform and metabarcode(s), many eDNA software might find most of a sample’s sequence data usable. Finally, there are a lot of useful tools for visualizing and analyzing eDNA taxonomic data, but most are designed for research scientists and difficult for non-scientists to use.

The Anacapa Toolkit

We developed the Anacapa Toolkit to remedy the software needs mentioned above. The Anacapa modules, executed primarily through wrapper scripts, accomplish these tasks using available software in useful ways and by making modifications to some of those software. CRUX, the database building module, relies upon ecoPCR and iterative BLAST searches to build comprehensive metabarcode databases from the much larger database GenBank. The Anacapa QC and dada2 module uses a variety of tools to demultiplex samples and generate DNA sequence variants for all possible data types. The Anacapa classifier module makes taxonomic calls for all datatypes using Bowtie2 and a modified version of a Bayesian Lowest Common Ancestor algorithm. The fourth module, ranacapa, is an r package and a shiny web app that allows for the easy upload and preliminary analysis of data. This app is currently used in educational settings.

The Anacapa Toolkit Logo. (Reference: https://github.com/limey-bean/Anacapa).

Moving Forward: The Future of eDNA

Anacapa is not a perfect solution for the computational needs of all eDNA researchers, but it fulfills many of the current software needs. The eDNA field is moving forward rapidly, and researchers are continuing to improve methods, to find ways to use existing software in the most efficient ways, and to generate novel tools.

Any software for generating metabarcode databases and assigning taxonomy to metabarcode sequence is only as good as the currently available data. There are giant taxonomic holes in public databases. I recently worked on a desert springs eDNA project where fewer than half of the expected plant species had been sequenced for the entire length of the metabarcode used in the study. It was generally the case that those missing species had never been sequenced. There are also metabarcode specific problems. For a given metabarcode, closely related organisms may not have any sequence differences and are indistinguishable. Taken together these problems lead to poor classification at the species, genus, and even family level regardless of the software used to assign taxonomy.

Summary

There are many initiatives to generate reference sequences for more species across metabarcode markers. As this data becomes available, our ability to catalogue biodiversity will improve greatly. In the meantime, eDNA sequencing results need to be evaluated critically. Taxonomic classifications should be verified, and experts should be consulted to determine whether a taxon could reasonably be found in a sampled habitat. Certainly, eDNA will continue to revolutionize how biodiversity is studied and it is a great tool to complement existing methods.

To find out more about Anacapa Toolkit, check out our Methods in Ecology and Evolution article, ‘Anacapa Toolkit: An environmental DNA toolkit for processing multilocus metabarcode datasets’.

This article was shortlisted for the Robert May Prize 2019. You can find out more about the shortlisted articles here.

Leave a comment