Robert May Prize Shortlisted Article
Post provided by Arthur Porto
Each year Methods in Ecology and Evolution awards the Robert May Prize to the best paper in the journal by an author at the start of their career. Arthur Porto has been shortlisted for his article ‘ML‐morph: A fast, accurate and general approach for automated detection and landmarking of biological structures in images’. In this blog, Arthur discusses how his paper came to be and describes development of the ML-morph pipeline.
Quantifying and comparing complex shapes is a key component of a variety of fields within biology, ranging from paleontology to molecular cell biology. Biologists are interested in quantifying shape differences between individuals or species because they reflect important genetic, developmental and/or functional differences between them.
Geometric morphometrics (GM)
While a variety of approaches have emerged for the characterization of complex shapes, one approach has seen rapid development and has gained considerable popularity in the last decade: the approach known as geometric morphometrics (GM).
GM refers to a suite of analytic techniques aimed at describing shape variation through the annotation of key points (landmarks) corresponding to morphological structures of interest. GM approaches can be applied widely and have been used, in: (1) genetic contexts, to characterize morphological differences between genetic mouse models; (2) ecological contexts, to characterize phenotypic plasticity in leaf shape; and (3) evolution studies, allowing for the characterization of morphological differences between species.
Three main factors have contributed to the increase in popularity of GM. The first one is that the quality of our imaging techniques has greatly increased in the last decade. For example, scanning electron microscopes have become portable, and they allow us to image specimens in great detail. Secondly, the availability of open-source analytical software to process such datasets, such as a variety of morphometric R packages, have seen great improvements. Finally, we have seen a great increase in data accessibility, with the emergence of several public image databases, such as Morphosource and Phenome 10k, which have greatly facilitated the individual access to robust image datasets.
The bottleneck – Collecting landmarks
GM research is not without its challenges, however. One of the main bottlenecks of GM research is data collection itself. The gold standard for data collection is still manual annotation by a trained expert. There are two main ways in which the reliance on manual annotation has become a significant hindrance for research in the field. One is the fact that GM characterizations of complex morphological structures increasingly require a high density of landmarks. The main by-product of this increase in density has been the increase in the amount of time required to collect data for one specimen. The other reason why reliance on manual annotation represents a significant issue for the field is that it is inherently subjective. Every single thorough study of observer error done within GM shows substantial estimates of error, which are often of similar magnitudes to the levels of intraspecific variation. That presents a problem, because it reduces the reproducibility of science and prevents sensible merging of data across laboratories, or even over time.
Morphometrics enters the Age of Big Data
Recently, several approaches have been developed to automate and standardize landmark data collection in the context of image data. These approaches attempt to improve a researcher’s ability to sample phenotypes in three main ways:
- They allow researchers to sample a structure more densely, increasing the spatial resolution of these characterizations, since the exact number of landmarks being collected usually has minor impact on processing speeds.
- They greatly increase the sample sizes of GM studies, allowing a researcher to sample a larger number of specimens. This is especially helpful in biomedical research using model organisms, where a high number of individuals might be bred in a short span of time.
- Automated approaches are reproducible, reducing, to some extent, the subjectivity inherent in individual annotation.
The drawbacks of automation
This is not to say that automation does not have its drawbacks. For example, most laboratories currently developing or applying automated approaches will have an imaging scientist on staff, which can be costly. Similarly, automated landmarking approaches can be quite resource-hungry. Some approaches for automated landmarking (e.g., 3D datasets) require as much as 10 CPU/hours per specimen, which is a lot more time than it would take an experienced researcher to collect the same dataset manually.
Finally, automated approaches are system-specific, with most algorithms being hard to generalize to other study systems. The problem gets even larger when considering potential software and hardware incompatibilities across different systems.
ML-morph – A simple method
ML-morph is a lightweight, accurate and general method for automated landmarking of 2D images that we developed with the goal of addressing several of the drawbacks of other automated approaches. We developed ML-morph with three main goals in mind. First, to make something that is intuitive enough that someone does not need an image analysis background to use it. Second, to create something that is lightweight enough so that it does not require special hardware. And finally, to create something that is general enough and can be applied to virtually any structure of interest.
The procedure itself is quite simple. ML-morph decomposes the task of automating landmark data collection into two separate tasks: object detection and shape prediction. Object detection refers to the initial step of finding an instance (or more) of the structure of interest within an image. Shape prediction, on the other hand, refers to the step that follows object detection. That is, once an area of the image is found containing a structure of interest, we need to locate all landmark positions within that region.
Due to the generality of its principles, ML-morph can be applied to virtually any structure of interest, given some limitations in the degree of shape variation. Once trained, ML-morph models can be run in real time (>15 images per second), greatly increasing the speed of GM data collection (Figure 4).
ML-morph can greatly increase the reproducibility and speed of GM data collection within biological research. Our pipeline opens up the number of possibilities for automation within the morphometric community, greatly increasing the scale of the questions being asked and opening up new research avenues that previously faced sample size barriers.
Find out more about the articles that were shortlisted for this year’s Robert May Prize here.