Post provided by Pasquale Raia
A couple of years ago, I was having a discussion on the R-sig-phylo blog and dared to define the Brownian Motion (BM) as kind of a null hypothesis that more realistic scenarios should be compared to. Maybe I crossed a line or made too simplistic a statement (see Adams and Collyer’s article in Systematic Biology for an explanation of why this matter is far trickier and more complicated than my reply suggested). The point is, my comment was hotly contested and a colleague ‘put the onus on me’ to do something better than the almighty (emphasis mine) BM.
The RRphylo method was my attempt to do just that. It may not be better than BM, but it is different. Often, that can be exactly what you need.
What is the RRphylo Method?
RRphylo uses Ridge Regression to estimate phenotypic values at nodes, and to compute evolutionary rates along individual branches. The method was first published in an unduly overlooked paper by Kratsch and McHardy in a 2014 issue of Bioinformatics.
We worked on the original paper to change the regularization factor in Ridge Race (Kratsch and McHardy used L2 penalization) and devised our own, biologically-oriented, solution. By “biologically-oriented” I mean the RRphylo penalty factor is well-suited to fit BM-like phenotypic evolution. And yet, it’s designed to keep kin species phenotypically very similar to each other – as long as the tree isn’t particularly strange.
The figure to the right, taken from the RRphylo vignette helps to explain this point. The y-axis shows the phylogenetic signal computed for phenotypes simulated by RRphylo under different penalty (lambda) values (this is not the maximum likelihood solution developed in RRphylo, this is the mathematical results you get by fixing lambda values). When the penalty gets close to zero, the simulated phenotypes behave like BM (the red dots along the K-curve), except for a narrow range of lambdas where the phylogenetic signal is not present in the simulated data (green dots). Excluding such narrow fields, K may take on very large values, which is expected when evolution is very conservative and there’s little change along the branches.
Advantages of RRphylo
One of the key benefits of RRphylo is that you immediately get rate estimates for all branches in the tree, along with the ancestral estimates. These are really fast, but they’re obviously overparametrized, and they entirely depend on the tip phenotypes. Judging where and when this is useful depends heavily on the reason that you’re using RRphylo.
RRphylo rates are regression coefficients. As such, they have a sign and a size, which means the direction and magnitude per unit time of phenotypic change are known straight away. This is great, because more common metrics for rate variation come in the form of the rate of accumulation of phenotypic variance or other disparity metrics. These are squared, so they tell whether the phenotype is speeding up or slowing down, but not if it’s getting larger or smaller.
Also, RRphylo rates can be easily compared among different branches of the tree, which is itself equivalent to the exercise of finding rate shifts. We did this in our Methods in Ecology and Evolution article ‘A new method for testing evolutionary rate variation and shifts in phenotypic evolution’, where the RRphylo method is accompanied by another R-function specifically set to find such shifts by means of randomization (of rates across branches). When we did this, we found that flying dinosaurs (actually it is flying ornithodirans, since pterosaurs are part of the game) had higher rates of evolution than either bipedal or quadrupedal dinosaurs.
The reason I bring up this result is that it gives another twist to the significance of RRphylo in relation to other phylogenetic comparative methods. When you’re working with the phenotypes, you have to consider all the weird and unusual things that come with the fossil record. By using the BM, or most phylogenetic comparative methods (PCMs) , the evolutionary model is applied almost treewise, and it’s difficult to get idiosyncratic results from extremely ‘deviant’ phenotypes . With RRphylo, even isolated tips, presumed to evolve under a certain regime, could be tested to find out if they have different evolutionary rates. The classic example, presented in the paper, is the idea that body size evolution accelerates in insular vertebrates (by the way, it doesn’t! See the ‘mammal’ figure below, right side).
One final word on the tree structure. Many popular PCMs designed to look for rate shifts don’t work with non-ultrametric trees. This is fine, but it’s not very helpful for paleobiologists. With RRphylo, it doesn’t matter if the tree is ultrametric or not, so it’s especially well-suited to paleontological investigations.
Limitations of RRphylo
There are downsides, though. For one, RRphylo is overparametrized (although as Kratsch and McHardy noted, the regularization cures the problem). Secondly, when you move away from the model-oriented approach it’s hard to make sense of what evolution across the whole tree looks like. In other words, with so much detail it is not always easy to summarize phenotypic evolution at the tree level in a few words (see the left panel of the ‘mammal’ figure below). Still, what you get really depends on what’s in the tree. Although this is true with most PCMs, it could be more annoying with an approach giving as many parameters as RRphylo.
Why You Should Use RRphylo
RRphylo is especially well-suited to work with non-ultrametric trees, and for fast recognition of rate shifts across the tree. It even provides the possibility of ‘automatic’ recognition of rate differences (increase/decrease) between clades. Its key advantages over existing methods are:
- The ability to test for rate shifts not just for entire clades, but even for isolated species evolving under a presumably different evolutionary regime.
- It gives you the direction of rate change (i.e. if the phenotypes grow faster or smaller) straight away with the rate magnitude.