Integrating multi-gene barcodes with deep learning to classify snails

Post provided by Bin Ye

Gastropoda animals, such as land, freshwater, and sea snails, have diverse forms and unique life histories, making them an excellent window for exploring biodiversity. In the era of accelerated integration of artificial intelligence and ecology, we have developed the SnailBaLLsp intelligent classification model, aiming to use innovative algorithms to solve traditional classification problems and better serve species identification and evolutionary research with DNA data.

Land snail, Traumatophora triscalpta, distributed in Tianzhu Mountain, Anhui, China. Photo by Bin Ye.

Why did we develop SnailBaLLsp?

Traditional species identification relies on expert experience or a single DNA fragment (such as COI gene), which makes it difficult to cope with situations where species have similar morphology or complex genetic variations. Most existing algorithms are developed based on massive insect data and perform poorly on taxa such as Gastropoda. They also fail to utilize the natural classification hierarchy (from order to species), resulting in low accuracy in high-order classification. We hoped to use artificial intelligence to build an intelligent tool that can integrate multiple DNA barcoding data, understand classification logic, and be applicable to non-pattern taxa, making species identification more efficient and reliable.

How does SnailBaLLsp work?

We propose a deep learning framework called SnailBaLLsp. Firstly, we use COI gene data with the highest coverage to train the model and establish basic identification capabilities. Subsequently, a progressive strategy was adopted to gradually integrate the other five rare and incomplete barcoding data (16S, 18S, H3, ITS1, and ITS2) to avoid the model being disrupted by incomplete information. The most crucial aspect is that we have designed a hierarchical attention mechanism in the model, which explicitly follows the taxonomic hierarchy when making judgments – for example, determining “order” first and then inferring “family” and identifying “genus” next, with high-level information guiding low-level predictions, forming a logically consistent classification chain. In addition, we also introduced dynamic data augmentation to balance sample size differences and improved the model’s generalization ability to new species through domain adaptation.

How can SnailBaLLsp  help you?

Researchers or conservation workers can use our open-source SnailBaLLsp to quickly analyze DNA barcode sequences from environmental samples (such as water, soil) or specimens. Even if users only have one basic barcoding data of COI, the model can output multi-level predictions from family to species, especially adept at providing reliable high-order classification references when species information is unclear. The model has been successfully migrated to bivalve data, indicating its potential for cross group applicability, which can help carry out invertebrate diversity monitoring, invasive species screening, or ecological assessment. Users can follow the tutorial to use SnailBaLLsp. All codes, models, and multi DNA barcoding datasets of Gastropoda have been made public, supporting further development and application.. We look forward to this work contributing to the construction of a more universal biometric identification platform, enabling technology to better serve biodiversity awareness and conservation practices.

What can we do in the future?

The core innovation of this study lies in the combination of multi gene progressive fusion and hierarchical attention, providing new ideas for handling imbalanced and multimodal biological data. In the future, this framework can be extended to integrate multi-source data such as morphological images and geographic information, promoting the development of intelligent taxonomy towards multi-dimensional integration. In addition, the recognition dilemma of high variation sequences within species by the model itself can also inspire new problems in evolutionary biology.

Read the full article here.

Leave a comment