Early career research: Increasing access, reproducibility and transparency in phylogenetic analyses with Cristian Román-Palacios

Photo of Cristian Román-Palacios

I was born in the Colombian Andes (Armenia, Quindío) back in the 90s. I received my bachelor’s degree in Biology from Universidad del Valle, in Cali, Colombia, in 2015. I moved to the US in 2016 to pursue a Ph.D. in Ecology and Evolutionary Biology at the University of Arizona – a degree that I completed in Fall of 2020. Although my research interests seem to have changed over time, most of the time, my main focus has been on addressing long-standing questions and using methods for asking questions primarily in biology from novel perspectives.

As an undergraduate, my research focused on describing insect species in the neotropics and conducting analyses on the trophic and reproductive ecology of particular freshwater fish species in Colombia. That was when I became aware of the beauty in programming, statistical methods – their intrinsic potential in biology and their flexibility. I was particularly focused on exploring multi-model comparison and fundamental machine learning algorithms from an applied perspective. I was also aware that evolutionary biology, and particularly phylogenetics, was a unifying field in biology and related disciplines.

As a PhD student, I intended to balance my work between my direct research interests (e.g., phylogenetic comparative methods), more applied research (e.g., climate change and biodiversity), and work that was expected to support upcoming generations of scientists (e.g., blog posts in Nature) Was this combination of perspectives what interested me the most during that stage of my career.

I recently released an R package aimed at increasing reproducibility and access to tools that are often used to infer molecular phylogenies. The phruta R package was first reviewed and released by ROpenSci in 2022. The associated paper in Methods in Ecology and Evolution was published early in 2023. I, however, started to work on this package towards the end of the first year of my PhD in 2017. Early work of phruta relates to PhyLoTA as I was aiming to bring some of the original functionality in that database back through two packages: muPHY and rPHYLOTA. By that time, I was also planning to work on a rotation at Mike Sanderson’s lab, also at UArizona (author of r8s, PhyLoTA). For my rotation, we ended up collaborating on a smaller project focused on comparing rates of molecular evolution in accordance to particular life history traits in plants.

I moved forward with implementing this package, mostly towards the end of my Ph.D. The main functions in the phruta R package allow for quick mining and curation of GenBank sequences. This package is designed for students and researchers interested in assembling species-level genetic datasets for particular sets of taxa. Phruta also allows for basic phylogenetic inference procedures. However, I often highlight that the functionality of phruta is still limited given the extensive breadth of tools that could be used in each step of the phylogenetic workflow. There are long-term plans to enhance curation and data retrieval routines as well as to support additional tools for sequence alignment and tree inference.

I started my own lab in Fall 2023, also at the University of Arizona. Phruta represents one of the branches of the research that is being done in the lab (i.e., software development). As an excuse for being faculty at a cross-disciplinary academic unit on campus (School of Information), research and work in the lab also overlap with Biology, machine learning, data science, and DEI. I welcome collaboration on synergistic efforts with faculty and students alike.

You can read the full article for the R package Phruta here

Leave a comment