Organising the BES Data and Code Hackathon

Post provided by Natalie Cooper, MEE Senior Editor

In my last blog post I wrote generally about why and how to organise a hackathon. To help make those instructions a little clearer, below I provide an example from the BES Data and Code Hackathon we ran 29th-30th September 2025. Note that technically this was really a datathon rather than a hackathon!

We followed the outline plan shown in my previous post.

  1. Gather your core organisational team

Much like Nick Fury, I first assembled a crack team of BES staff and meta-research enthusiasts who are AEs at MEE (OK the last bit is less like the Avengers). Because we already knew we wanted this to be a hybrid event, we asked two of the team to take charge of the online participants.

  1. Decide on your aims/goals

The BES publishes seven journals: Ecological Solutions and Evidence (ESE), Functional Ecology, Journal of Animal Ecology, Journal of Applied Ecology, Journal of Ecology, Methods in Ecology and Evolution (MEE), and People and Nature (PAN). Data-archiving has been mandatory, barring exceptional circumstances at all seven journals since January 2014, and code-archiving has been required for papers presenting simulations, new applications and non-standard analyses since 2017 (2015 for MEE). We (I/MEE/BES) wanted to know how effective these data/code sharing policies have been in the BES journals.

Additionally, we wanted to involve as many interested people as possible, as inclusivity is key to our values. We also wanted to collect as much data as possible. The nature of these goals meant that our hackathon would be large (over 140 people attended in total!), hybrid, and with mixed skill-level of participants (including undergrads to professors to software professionals).

Our final goal was to submit a paper to MEE on the results.

  1. Create a protocol/guidance document(s)

This was very complex for this project as we were planning to collect quite a lot of data.

We created a number of documents as follows.

  • A guidance document with a full protocol. This was 15 pages long! It included information on our goals and how data collection would work. It then included each question along with the available options. We also included examples for the more subjective questions. For example we asked people to score the quality of the README from 1-10, and then provided an example of what should be included in a good README.
  • A Google Form for data entry. This was designed carefully so that most questions had dropdown or multiple choice menus to reduce data input errors. 
  • A Google Form for participant details, e.g. name, affiliation and email. This was separate from the main form because it kept emails separated from the main dataset for GDPR compliance. Asking for participant emails, addresses and affiliations here saved time when prepping the publication. We reminded people to provide emails that would still be valid in over 1 year, as sometimes publication is a slow process and when people leave institutions they often lose their email addresses.
  • A list of all papers published in BES journals between 2017 and 2024, excluding reviews etc. that we wanted to collect data from. These papers were randomised in order then given a unique paper number so they could be indexed.
  • Finally we set up a Discord site with appropriate channels and instructions etc. so people knew where to post comments and questions, and where they could discuss the project more broadly. This was key for ensuring engagement with online participants.

Note that it was really useful to have everything online so it could be updated as needed on the days of the hackathon, rather than having to constantly send round new documents!

  1. Review and test the protocol/guidance document(s)

This is where we would make improvements for next time!

All organisers checked all aspects of the documentation several times. This included metaresearch experts and people who have never collected this kind of data before. Next, my lab group (a mix of research assistants, Masters and PhD students) also tested it twice. All worked perfectly!

We sent everything out two weeks before the hackathon and a few participants sent feedback but not many. Our plan was to do final checks on the protocol and add any missing options to multiple choice questions for the first 2 hours of the hackathon, then begin data collection around noon…this did not entirely go to plan as I will describe later!

In future I would ensure participants read the protocol in advance and send any queries or changes at least one week before the hackathon so they could be incorporated.

  1. Plan event practicalities

We planned day 1 at BES HQ in London. We had about ~ 40 attendees. We paid for lunches, teas and coffees. Day 2 was at NHM London. ~ 60 attendees. We paid for teas and coffees (I bought biscuits to reduce costs, including “free from” alternatives) but people had to find their own lunch as we were out of budget.

Online attendees were looked after by Graziella and Bethany on Discord. They organised coffee and lunch breaks on Discord so people could interact. Online attendees worked whenever they wanted to depending on other commitments and time zones.

Communication of all changes to the protocol etc. was done via Discord so online and in-person attendees had the same information.

  1. Attract/invite participants, share information with them

We advertised via socials, the BES Mailing list, the SORTEE mailing list and various BES SIG mailing lists. We got a great response from this!

  1. Run the hackathon

This was, I think it’s fair to say, initially (semi)organised chaos! The plan was to do a review of the protocol between 10 and 12 noon then to collect data from 12 noon. I had put another organiser in charge of the in-person attendees, and we had two dedicated online organisers, and my role was to add options and clarifications as needed to the protocol and forms during these two hours.

However, two things happened! 

First at the BES HQ the WiFi went down as soon as we all started trying to access the protocol and Discord etc. We think we probably overloaded the system! In-person participants couldn’t access any of the documents. This then required some problem solving and quick thinking (thanks again to Harriet and Amelia for helping to fix this!). We did get it fixed in an hour or so, but in the meantime…

Participants had LOTS of edits, questions, concerns, suggested additional questions, confusions etc. We were not expecting the sheer volume of these because we felt we’d done a good job making the protocol as clear as possible.

Therefore by the time I had access to the Discord and documents again it was absolute chaos! Thanks to Bethany Allen and Graziella Iossa who maintained things online while we were all incommunicado! 

In the time left before 12 noon, I quickly fixed what I could in the protocol. Some issues were easy to fix and based on misunderstandings. Some issues were complex and thus unfixable in the time available – in fact some would only have been fixable if I’d had a whole day to rewrite sections. So I fixed what I could and left the rest.

The hackathon after this ran really well! People asked questions on Discord and helped each other out. There was excellent engagement in the room and online. It was a really fun and positive experience! 🙂

  1. Update participants and evaluate

In the last few hours of the in-person hackathon, Pen-Yuan Hsing and Laura Graham (AEs at MEE) did preliminary data visualisations of the data so far and shared these on Discord. This gave everyone immediate feedback on our goals, for example how many papers we had collected data from, and what were the emerging trends.

After the event ended, we provided more feedback via Discord and email and continued to update people as to the next steps with completing data collection, cleaning the dataset and drafting the paper.

Initial results were sent to everyone in late November for comments, and the draft was shared in Mi February. The manuscript was submitted in March. Note that with so many coauthors (138) we had to combine everyone into a consortium. But we were able to add all authors explicitly to the preprint on EcoArXiv.

  1. Finally…achieve your aims! 

Did we achieve our aims? Yes I think so!

We had lots of highly engaged participants, online and in-person, and lots of positive (and constructive) feedback. Participants felt they learned how to make share their data and code more effectively.

More concretely, we collected data on 1861 papers (~20% of papers published in our time window), and we have submitted a paper to MEE with 138 authors (this was most though not all the participants, some had to remove themselves for various reasons)!

What would we do differently next time?

Have a plan for there being no internet! However, you can spend too much time planning for all eventualities. I think in this situation we did the best we could, and I don’t think planning would have actually made much difference.

The main issues we encountered were related to the large numbers of participants. The sheer volume of comments and questions on Discord was really hard to cope with, but at the same time it was great to see so much engagement from across the world – participants were based in six continents and 27 countries! Definitely my most international project by far!

We learned that people interpret things in very different ways. This is hard to fix, and of course is exacerbated by the large numbers of participants. In future we would ensure that all participants checked the protocol a few weeks before the event and provided comments one week before the event so updates could be done in advance. I expect there would still be things that caused confusion, but there might be fewer to fix.

More specifically related to this project and our data collection, we realised that “write in” boxes for answers to questions are the worst! It’s very easy to make errors. One person wrote their own name in six different ways… This made cleaning the data hard and time consuming. In future I would set aside more time to clean the data. I would also more carefully consider how to divide and delegate tasks after the hackathon. A lot of the work fell on me as it was hard to divide tasks up.

Finally, to formally calculate things like consistency scores across participants, we should have asked every participant to collect data from several of the same papers, instead of the single paper we requested everyone look at. 

Hackathons are great!

In conclusion, hackathons are great and we had a lot of fun with this one. Thanks so much to all the organisers and participants! Maybe you could organise or attend a hackathon soon?!

Hang on, what were the results of the hackathon?

You’ll have to wait for the paper! But if you can’t wait the preprint is available here: https://ecoevorxiv.org/repository/view/12039/

Leave a comment