The Data Double Standard: how this article came to be

Post provided by Alison Binley

As a new Master’s student at Carleton University, I was excited to learn the ins and outs of using community science data (also known commonly as citizen science, participatory research, and crowd-sourced data) to conduct conservation research. I was working on estimating population trends using eBird, a popular, opportunistic community science platform that collects data on birds, and fascinated by the possibilities of having such an immense resource at my disposal. Much to my surprise however, while attending my first ornithology conference, I discovered that hardly anyone there was using these data in their research. Furthermore, I was questioned quite rigorously on whether my data were “trustworthy”, and whether they had a place alongside more conventionally-collected professional datasets.

Over the years, I would continue to hear variations on this theme. What if someone submits something that is wrong? What if two or more individuals report the same bird? Will you then double count that bird? How are these data useful if there is no survey protocol? Shockingly, I and others working with community science datasets had considered these issues previously, and developed strategies to overcome them (https://doi.org/10.1111/2041-210X.13834). Increasingly more research effort is being put into developing statistical methods to better understand and work with these datasets, and to adjust for biases associated with where people prefer to collect data, how different individuals collect data, and the potential to make mistakes.

Photo credit Brandon Edwards

The Data Double Standard was borne out of the desire to not only address these concerns, but also to flip the narrative. Because these methodological advances that have evolved from the need to make sense of community science data can also be used to improve data collected by experts as well. In fact, the complementary use of both crowd-sourced and professionally collected data may help us get the clearest picture of the state of biodiversity. For example, since many community science programs allow participants to collect data wherever they like, this results in a spatial bias in the dataset. However, data collected by professionals, while usually subject to study design protocols that attempt to eliminate such biases, are not entirely immune from this problem (https://doi.org/10.1890/110154). Why not direct our professional monitoring efforts to fill the gaps in crowd-sourced datasets?

Photo credit Brandon Edwards

Now, as a post-doctoral researcher at the Cornell Lab of Ornithology, I still use eBird data, alongside many other freely and openly available datasets, to support my research. Many species are in dire need of conservation interventions, and having these data ready at my disposal allows me to provide rapid recommendations that are still based on evidence. I am using a structured decision-making framework to optimize the implementation of temporary conservation interventions in human-dominated landscapes, aiming to maximize the benefits to biodiversity while minimizing interruptions to human activities such as agricultural production. There are still analytical challenges associated with ensuring that those recommendations are not biased by statistical artifacts propagated through the underlying data. However, I am confident that I am considering all the potential shortcomings of my data, regardless of how they were collected.

You can read the full article here.

Leave a comment