Post provided by Tiago Olivoto
Este post também pode ser lido em Português
In our recent paper in Methods in Ecology and Evolution, Alessandro Lúcio and I describe a new R package, metan, for multi-environment trial analysis. Multi-environment trials are a kind of trial in plant breeding programs where several genotypes are evaluated in a set of environments. Analyzing such data requires the combination of several approaches including data manipulation, visualization and modelling. The latest stable version of metan (v1.5.1) is now on CRAN. So, I want to share the history about my first foray into using R, creating an R package, and submitting a paper to a journal that I’ve never had submitted before.
The first rows of code in R
One of the aims of my doctorate studies at the Department of Crops Science of the Federal University of Santa Maria, was to propose new methods to perform stability analyses in plant breeding trials. At that time, I was working with at least four distinct softwares for analyzing multi-environment trial data. With the exception of SAS®, all other computer programs were based on “point and click” Graphical User Interfaces, which created a big challenge for the implementation of the novel methods I was proposing. It was time to start using an open-source Command Line Interface software such as R.
In early 2017, I was introduced to the R programming language by Dr. Bruno Giacomini Sari, to whom I will be forever grateful. It wasn’t difficult to translate my codes to the R language because I already had prior experience with SAS®. At the beginning of the next year I wrote a draft of a study proposing new stability measures for plant breeding trials (WAAS and WAASB), which is now published. Many thanks to my co-authors Alessando Lúcio, José G. da Silva, Volmir S. Marchioro, Velci Q. de Souza, and Evandro Jost! While writing the draft paper, I also wrote two of the most complex functions I had already made in R: waas() and waasb().
The paper was done and I also had the functions used to implement the methods. The next step was to find the best way to make my script accessible to anyone interested in our work. The question was how could I do this?
My first foray into creating an R package
The 62nd Rbras (Brazilian Region International Biometric Society) scientific meeting on challenges of applied statistics in the era of big data that happened in July 2017 in Lavras-MG played an important part in this story. I was in my first steps in R programming, and during that meeting I attended some short courses about the R language and had the opportunity to connect with many people with extensive experience working in the language. This meeting helped me to understand what is an R package and that an R package would be the best way to share my code. After reading the instructions regarding how to create R packages (a good source is R packages by Hadley Wickham) and going through many warnings and errors I had finished my first R package! Conveniently, the package was called METAAB (multi-environment trial analysis using AMMI and BLUP). Yes, metan was once called METAAB!
metan was conceived to produce a readable file of stability indexes
During the peer-review process of our previous paper, one of the reviewers suggested a comparison of the genotype ranking produced with the WAAS and WAASB indexes with other stability indexes used worldwide. Many R packages are available for computing stability indexes in a variety of ways. We chose to use four R packages for this comparison: stability, ammistability, agricolae, and plantbreeding. However, each package has its own syntax and made the code difficult to write and to follow, especially when analyzing several variables. Putting all these stability indexes into the same “ready-to-read” file was a pretty tedious task. Below is an example of the kind of code I needed to use in order to extract the variables from the data:
I was certain that I would face these same difficulties over and over again. I also thought (today I’m sure) that other researchers could face these same issues. Thus, we decided to do something to make computing stability indexes in R easier. The project metan (multi-environment trial analysis) was born!
When we started this project the package name was METAAB (multi-environment trial analysis using AMMI and BLUP), because the only functions available for stability analysis were those we were proposing. The first change we’ve made to METAAB was to change its name to metan. Changing the package name was necessary because now the package wouldn’t just be used to compute the WAAS and WAASB indexes. This blog helped me a lot with this process. During the development of metan v0.2.0 several new parametric and non-parametric stability indexes were implemented. The last function I’ve included in that version of the package was ge_stats(), which is able to finally accomplish the task I was looking for a long time! Today, all that users need to do to obtain 30 (parametric and non-parametric) stability indexes for all numeric variables of a data set fits into two lines of code! Bellow, you can see a short tutorial (without audio) about using ge_stats() to compute stability indexes in R.
Getting metan on CRAN
In July 2019 metan was already known in 18 countries, even though it was only hosted in Github. If I wanted to have significant traction in the R community, I knew I should also submit it to CRAN. Prior to R CRAN submission I fixed some bugs and added new functions, such as gge() for computing Genotype plus Genotype-Environment interaction models, desc_stat() for computing descriptive statistics, and some other useful functions for data manipulation.
After reading many tutorials (R packages by Hadley Wickham helped me a lot again!), and fixing all errors, warnings, and messages during the checking process, I had my first experience in submitting a package to CRAN on early September 2019. The submission process of v0.2.0 was much easier I thought it would be and metan 1.2.1 was first released on CRAN on 2020/01/14.
metan beyond plant breeding
metan offers a set of functions to manipulate, summarize, and plot data that can be useful for R users beyond plant breeders. This convinced me that it was time to write a paper to describe and promote the package as well as act as a citeable resource. We decided to write an Application paper for Methods in Ecology and Evolution.
A short period after submission of our paper, I received the decision that the paper has been accepted for publication. It was one of the most motivating emails I have ever received.
Our article ‘metan: an R package for multi‐environment trial analysis’ is freely available to everyone. To find out more about metan visit its website, that is updated regularly. You can also see a summary of the functions in the cheat sheet bellow. The pdf version can be downloaded from our Github page.
To read more about package metan and its uses for plant breeding trials, check out the Methods in Ecology and Evolution article, ‘metan: An R package for multi-environment trial analysis’