Post provided by Daniel Thomas
Many biologists dedicate their careers to finding out why life has taken the shape it has. Darwinian natural selection gives us the how, but researchers are deeply interested in why we find particular morphologies amongst certain organisms, when these morphologies arose, and what these morphologies mean for the organisms and the communities in which they reside. In this post, Daniel Thomas (Massey University, New Zealand) describes the philosophy behind the new morphoBlocks package for R. The package is presented in a new paper within the ‘Realising the promise of large data and complex models’ Special Feature for Methods in Ecology and Evolution. Researchers interested in exploring the morphoBlocks package are encouraged to try these three vignettes.
Studying the shape of structure
The morphology of a biological structure includes both the shape of the structure as well as its size. The reasons for why a biological structure evolved into a new shape might be different from the reasons why that structure evolved into a new size, and hence researchers might be specifically interested in the distribution of shapes amongst the clade or community they are studying. Shapes can be compared mathematically after the ‘non-shape’ aspects of a structure are removed (size, location and rotation), and researchers are typically interested in the aspects of shape that vary the most within their dataset.
The shape of a biological structure is commonly represented as a point in a ‘morphospace’ plot, where closer points in the same plot represent more similar shapes, and the axes of the plot describe how the shapes vary. While current analysis methods are excellent for comparing the shapes of single parts (e.g. isolated bones), ecological or evolutionary research questions regarding the shapes of objects which contain multiple parts (e.g. skeletons) have inspired the development of the morphoBlocks package for R.
We recognised that analysing variation in the shapes of objects made up of multiple parts, versus analysing variation between the shapes of isolated parts, is similar to how multiblock analyses are structured in other big data fields like chemometrics. The key is the block structure.
Morphology with mannequins
Picture a mannequin standing upright with arms reaching into the sky, forming the shape of a Y, and with all of the regions of the body covered in dots (Figure 1). If we could remove the mannequin but leave the dots floating in space then we should still be able to picture the complete shape of the mannequin. We repeat this process for a second mannequin that is the same size and in the same Y pose, taking care to place the dots in similar places to the first mannequin. We again remove the mannequin and keep the constellation of dots floating in space. This gives us the data we need to compare the shapes of these two mannequins. Bringing the two constellations of floating dots close to one another (i.e. lining up the left shoulder dot of mannequin one next to the left shoulder dot of mannequin two), so that the summed distance between corresponding dots in each constellation is the lowest it can be, allows us to understand where the shape differences occur. We find these differences by simply checking the distances between corresponding dots.
This process is fine if we want to compare the shapes of mannequins in the same pose, but what if we made a comparison with another mannequin that is the same size, has the same pattern of dots, but is doing its best to make an M shape with its arms? Our dot-to-dot analysis would tell us that this additional mannequin is a different shape, but only because the pose is now the greatest source of variation, and not the actual shape of the body regions of the mannequin (Figure 1).
This is where introducing a block structure can help. Run back to the start with our mannequins before we added the dots – two with arms in the shape of a Y, and one doing its best to make the shape of an M. This time when we place dots on the mannequins we are going to use different colours for different regions (Figure 1). We will add red dots to the torso of mannequin one, and then in corresponding places on the torsos of the other mannequins we will we also place red dots. We use green dots for the right upper arm of each mannequin, blue dots for the right lower arm of each mannequin, and so on, using different coloured dots for each region.
Now we again make the mannequins disappear while the dots remain floating in space. This time when we bring our constellations of dots together we are going to do so colour by colour (i.e. data block by data block). We will first establish the shape differences in the torso, followed by the upper arm, and so on. We will need to rotate and translate the constellations of matching coloured dots in order to bring the constellations as close to one another as possible, but this is fine, because we are just focussing on one region at a time. We then determine which dots contribute the most to the differences in shape for their region, and then summarise all of these regional sources of variation into an overall description about how and where the mannequin shapes differ irrespective of how they are posed. This is the essence of the multiblock approach to shape analysis performed with morphoBlocks.
Learning from skeletons
Describing the shape of objects made up of multiple parts by first understanding the variation in each of the parts can be useful if we want to study how a skeleton has changed through evolutionary time. Have you ever considered that your skeleton is made up of a mixture of older and newer shapes? That some regions of your skeleton have changed far more recently compared with other regions along your evolutionary history?
Outside of our skull, most of the bones in our skeleton are the same shape as the ancestor we shared most recently with other species of Homo. The shape of our forearm bones and coccyx are largely the same as those in the ancestor we last shared with Australopithecus, and the shapes of some bones in our feet are very like those in the ancestor we last shared with Pan. The timing and magnitude of changes to our skeletons can be informative about the selection pressures that have shaped our evolution. The morphoBlocks package presented in our recent publication produces a morphospace for a multiple-part object such as a skeleton, and can be used to understand how shape variation is distributed across each part. This enables us to discover when this variation arose and provides a means for investigating what this variation means for the organisms and for the communities they reside in.
morphoBlocks includes functions to conduct shape analyses on multiple-part objects. Chief amongst these functions is morphoBlocks::analyseBlocks, which provides access to a multiblock analysis method from the Regularized Generalized Canonical Correlation Analysis (RGCCA) package for R developed by Arthur Tenenhaus and colleagues. Collaborating with Arthur to realise the potential for Regularized Consensus Principal Components analysis from RGCCA for shape analysis has been deeply rewarding. We plan to continue developing morphoBlocks to add new functionality, and are excited to hear from ecologists and evolutionary biologists to learn if morphoBlocks has been beneficial for their research.
To find out more about the morphoBlocks package, read our Methods in Ecology and Evolution article ‘Constructing a multiple-part morphospace using a multiblock method’.