Post provided by Eliza M. grames
To celebrate the 10th Anniversary of the launch of Methods in Ecology and Evolution, we are highlighting an article from each volume to feature on the Methods.blog. For Volume 10, we have selected ‘An automated approach to identifying search terms for systematic reviews using keyword co‐occurrence networks’ by Grames et al. (2019).
In this post, Eliza Grames shares the motivation behind the litsearchr search approach, and developments since the article’s publication.
Whether you are making policy decisions or planning a new research program, it is critical to consider all the available evidence and previous research in order to make the most informed decisions possible. Like many new grad students, when I started my PhD, I began by synthesizing previous studies to generate new ideas. I set out to do a systematic review to capture all relevant studies to include in my synthesis, however, every time I came up with a new set of search terms, one of my collaborators would point out terms that I was missing which led me to omit key papers. Choosing search terms for a synthesis is incredibly important because they define which studies you will retrieve and ultimately include in a review, but it is also really challenging to come up with synonyms for terms in fields where there is not a standard ontology, new terms get created frequently, or sub-fields use terms differently. Even when working with an information specialist and experts on the topic, I still kept missing important terms and phrases to retrieve relevant articles.
When I got frustrated with the process of trying to develop a comprehensive search so that my synthesis was not biased, one of my PhD committee members, Robi Bagchi, suggested that there must be methods for identifying missing search terms in other fields where systematic review is more commonplace, like psychology and public health. So, I looked. As it turns out, identifying search terms was a problem across disciplines and no one had a good solution yet. I decided that if a method did not exist, I would develop one and automate it in a way that was reproducible, tested, and easy to use.
I spent several months tinkering with approaches to identifying search terms, combining different algorithms and natural language processing techniques for extracting keywords, bouncing ideas off my collaborators, drawing convoluted workflow diagrams on the whiteboard that my labmates made funny faces at, and pursuing many dead ends along the way. After many, many failed attempts, we finally came up with an automated method that works for identifying search terms by using automatic keyword extraction and keyword co-occurrence networks to select important terms, which is what we describe in the paper.
Once we had a method that we were fairly confident in, I set out to turn the code I had written into a full-blown R package so that it was easy for people to use and easy to reproduce. I determined which functions needed to exist, identified parts of my scripts that could become functions, and mapped out relationships between functions and how they fit into the workflow. This is not surprising if you look at the early versions of the code underlying the functions, but one of the things many people who use the package are surprised to learn is that I actually learned to code while writing it, and my motivation to become proficient in writing R code came from wanting to make the method work.
For the paper describing the method and code, we used the default settings in the beta version of litsearchr (named by my co-author Andrew Stillman) to develop search strategies for six published systematic reviews. We then compared the performance of the automated search strategies to the published search strategies, checking to see whether the automated and manual searches retrieved all the relevant articles included in the published review and also how many irrelevant articles they returned. Basically, we wanted to know if our automated method was performing as well as manual methods or if it was performing worse and was not yet suitable to share widely.
Based on our first set of tests, the automated approach was doing just as well, and in some cases better, than manual methods. I was rather pleased with this result, though I was also surprised because I had done the testing myself, and some of the published reviews I had used for the test dataset were on topics I know basically nothing about and had expected to perform poorly due to user error. For example, my dissertation research is on birds in forest patches, and yet I had to develop test searches for papers on mesophotic coral reef ecosystems.
Even though it has not even been a year since the paper was published, a lot has changed since the first version of litsearchr was released and tested. While the paper was still in review at Methods in Ecology and Evolution, I was invited to the Evidence Synthesis Hackathon in Canberra, Australia to work with an incredibly talented group of researchers from around the world to develop open source software to support systematic reviews and meta-analyses. At the hackathon, we developed blueprints for a completely reproducible, self-contained evidence synthesis workflow in R, planning connections between packages (including litsearchr) and new functions that would need to be developed to automate the most time-intensive steps of conducting a systematic review and meta-analysis. The team I worked with at the hackathon is now developing the metaverse, a project funded by the R Consortium and led by Martin Westgate, to make this workflow a reality.
I have also been working with two librarians who specialize in evidence synthesis — Amelia Kallaher from Cornell University and Sarah Young from Carnegie Mellon University — to develop a Library Carpentry lesson on using R and litsearchr for systematic reviews. We taught the lesson as a workshop for information specialists, nearly all of whom were brand new to coding, in August 2020. We are now revising the lesson content to release to the community based on feedback from workshop participants. It has been incredible to learn about the community of people using litsearchr for synthesis since it first came out; it has been used by researchers in conservation, ecology, evolution, psychology, special education, linguistics, nutrition, public health, and even business. As a testament to the truly interdisciplinary nature of evidence synthesis methods, litsearchr even recently received a commendation from the Society for the Improvement of Psychological Science and users have created discipline-specific tutorials for psychology, even though I built the package to study bird conservation.
I am still actively working on the litsearchr package to make the code more efficient and to make it more flexible. Based on user feedback, we have added lots of new features and capabilities, like recognizing generic bibliographic data and supporting more languages, while still maintaining the core functionality. I keep aiming to get the litsearchr package included in CRAN, but users keep suggesting new features that I want to add before formally archiving the package! In the meantime, all the versioned releases are still available on GitHub and there are vignettes and other helpful information on the litsearchr website.
To find out more about literature search approaches, read the Methods in Ecology and Evolution article, ‘An automated approach to identifying search terms for systematic reviews using keyword co‐occurrence networks’.
Find out about the Methods in Ecology and Evolution articles selected to celebrate the other volumes and our editors’ favourite papers in our anniversary blog series.