Post provided by Midori Yajima
Midori Yajima graduated from an MSc in Ecology with a project on palaeoecology and decided to join the online workshop Data Manipulation and Statistical Analysis in Palaeoecology: A Masterclass in R, set up by the Palaeoecology Special Interest Group (PalaeoSIG). In this post, Midori discusses some highlights from the workshop and emphasises the importance of a research community when navigating the programming realm.
“Data has a better idea”. This was the shiny neon sign glowing at me from one of the slides that Dr. Gavin Simpson showed as part of his presentation. The growing amount of data being generated is indeed a game-changer for almost every field in Science, Technology, Engineering, Mathematics (STEM) and beyond, and food for those ‘better ideas’. As a matter of fact, researchers have witnessed an exciting widening of the research questions they can address, inevitably leading to an increasing need for up-to-date toolkits with cutting-edge statistical know-how, analytical competencies and coding skills – perhaps also a lurking sniff of impostor syndrome.
The Palaeo SIG was indeed able to cope with this need. This event was born from the idea of sharing knowledge and expertise between members of the palaeoecological community. As such, it was set up as a three-day workshop aimed at intermediate R users, plus a mid-way seminar open to everyone who wished to have a look at the variety of R applications in palaeoecology. After expert-led training sessions in the morning, the afternoon sessions of the workshop were dedicated to individual work, where we either worked through the morning exercises in detail or worked on our own data. All sessions were supported by a virtual chat room. There, people could leave their questions, share resources, connect with each other and with experts and organizers, who were always there to offer their help throughout the event.
Right from the start, the feeling I had was of having entered a room where people at different career stages from different corners of the globe converged to learn and exchange, regardless of the education level and this shared pandemic going on in the background. A challenging task, especially given that online spaces, although rapidly becoming the new normal, still hold great potential for awkwardness when it comes to interacting with people from behind a screen.
Navigating the data
It was with this premise that Professor Steve Juggings kicked off day one, before setting sail to wrangling data and workflows in R – that is, the process of cleaning and manipulating data to get them ready for later analyses. As the first step for data analysis and not accidentally the first topic of the workshop, we were guided through palaeo-data wrangling by comparing base R and the tidyverse ‘grammar’, so that we could appreciate the differences between the two, while progressively adding complexity to our workflow, and discovering a few tips and tricks along the way.
The second day was dedicated to a theme of concern to many, namely the need to foster communication between ecology and palaeoecology. This time, Dr. Simpson was the chaperone who led us through methods drawn from ecology with the potential to be applied to palaeoecological data, namely topic models and the analysis of the network of interactions within the communities that we study. We were thus shown how the former may help us deal in a flexible way with our data when they feature too many variables, while the latter would allow us to add more information on the systems of interest.
The gates were then opened to a wider audience for the seminar day, where eleven speakers were split into four thematic sessions with presentations concerning open data and replicability, methods in organizing and plotting data, and modeling complex scenarios. The last session was then left to get a taste of the diversity of R uses crossing palaeoecological borders, whether backing local decision-making with a long-term perspective on ecosystem dynamics, such as fire dynamics and management on the Ericaceous belt of Bale mountains in Ethiopia with Dr. Graciela Gil-Romera, or supporting ecological knowledge by analyzing environmental features such as plant functional diversity over longer time scales, with Dr. Triin Reitalu, or dealing with qualitative and multidisciplinary data such as those generated by citizen science, historical, and local knowledge, with Dr. Lizzie Jones.
Day four was then entirely dedicated to applying ‘wiggly’ models to palaeo-data and thus unpacking the wonders of Generalized Additive Models (GAMs) as a technique to learn by fitting different functions that follow variables’ trends better than a fixed functional form would do (“LOESS must die” – a sardonic Dr. Simpson declared, referring to this well-known smoothing method).
Thus from the theoretical background, on how to implement them in R, to introducing more fine-tuned models, such as hierarchical GAMs and distributional models, Dr. Simpson led us through notions and scripts, making sure he answered every question and comment thrown at him on the way.
Building the community
Something never more appropriate to wrap up the journey stemmed right from one of those questions, at the end of the workshop. To answer the uncertainties of someone new to the programming realm, speakers and organizers alternated words of support with sound advice on the process of learning R, where “anyone is going through these problems”, “accept this is going to take time”, “there’s no shame in seeking help” and “everyone has different strengths, everyone will benefit from collaboration” were the recurrent points.
And after all, collaboration, patience, and time were precisely what these four days were about. We experienced a great sharing of scripts, resources, and insights, with each session being recorded and later made available, and with an eye being kept on inclusivity since we all could turn on live subtitles at will. We could see an ongoing discussion between audience and speakers, spanning specific topics to informal conversations. Above all, we were allowed to look into top-notch researchers’ worlds for a while.
We could then hear some of them acknowledging that they still have a sense of awe while facing coding twists and turns, and sharing their learning path and advice. Even more, to see them dealing with pets, connection failings, Zoom tricks, and kids’ intrusions did the trick to make anything – especially their work – more friendly.
It was with this sense of inclusion in the community, surprisingly magnified by the virtual space we were in, that I left the workshop. The feeling that I entered a place where people could work, discuss, compare, find new resources, and improve, in a whole human fashion.
I would like to thank Dr. Suzette Flantua for being the reason I could take part to this workshop in the first place and for her never-failing support. My thanks are also due to Dr. Althea Davies for her valuable comments throughout the writing process, and for being always open to help.
For more information on the Palaeoecology Special Interest Group, visit the SIG website here.