Avoiding Confusion: Modelling Image Identification Surveys with Classification Errors

Post provided by Jon Barry

We are a group comprised of statisticians, ecologists and a computer scientist. Back in 2021 when this work started, we were all employed at the Centre for Environment, Fisheries and Aquacultural Science (Cefas) at Lowestoft, U.K. Since then, Robert, our computer scientist, has ‘jumped ship’ (no pun intended) to the Alan Turing Institute.

We were aware that AI image recognition was widely used in many areas of science – such as medicine, satellite remote sensing, autonomous vehicles, face recognition and robotics. And other scientists at Cefas had been using image identification for seabed mapping, shoreline change and fish identification. Everyone else was doing it, so why not us?

Given that we were still in the grips of the pandemic, there was plenty of time for thinking. And because I knew that these image recognition techniques weren’t 100% accurate, I was drawn to the following thought experiment.

Our Thought Experiment

Imagine that you are trying to distinguish between three species of plankton. Further, imagine that your AI algorithm reports the following counts: 4 of species A, 3 of species B, and 5 of species C. These counts are reflected in the number of individuals in each row of the diagram below. But, as you can see, there are mistakes. For example, the first row shows that the 4 species “A” are really 3 species A and 1 species B. Similarly, the second row shows that the 3 species “B” are actually 2 species B and 1 species A. And, if we look at all 12 images below, the correct counts are: 5 of species A, 4 of species B and 3 of species C – all different from what AI told us!

All this thinking leads to the obvious question: “Is there a way that we can remedy the AI mistakes?” The key to the answer is that if we understand how the mistakes are happening then we should be able to untangle things to come up with something closer to the truth. Fortunately, there is a tool for understanding the “how” of mistakes: the confusion matrix.

The Confusion Matrix

The concept of a confusion matrix is central to our work. Our model uses observed and latent (don’t ask) versions of the confusion matrix, but to keep it simple, let me try to explain the observed version.

As an example, let’s go back a few stages, to when the plankton AI recognition algorithm was trained (more details below). In short, this was done using around 57 thousand images where we knew the ‘right answer’. Once training had been done, we used some fresh images (not used in the training) to check out how our algorithm was performing – that is, to calculate the confusion matrix.

The confusion matrix summarises how images of a known category (for us, copepods, detritus and non-copepods) are predicted by the AI classifier. For the example below, in the top row, 909 (or 95.2%) out of 955 of true copepods were correctly identified by the AI classifier as copepods; 7 were mistakenly identified as detritus and 39 as non-copepods. In the second row, 71 items of detritus were falsely identified as copepods, 3977 correctly identified and 74 incorrectly labelled as non-copepods. Put all three rows together and you get a confusion matrix like the one we used in our paper:

 CopepodsDetritusNon-copepods
Copepods909739
Detritus71397774
Non-copepods4216547

How Our Work Developed

We originally began our work by deriving some simple (frequentist) results for when the species counts had a Poisson distribution (which might be true if the plankton were randomly distributed in space). However, we soon realised that if we wanted to be able to analyse a far richer array of models, we needed a Bayesian framework. What we finally came up with allows us, for example, to analyse situations in which true counts have a negative binomial distribution (e.g. where the species are clustered), where there is a mix of Poisson and negative binomial distributions amongst species and where species counts have a zero-inflated Poisson distribution (where the number of zeros and the non-zero counts are modelled in separate parts).

Our Bayesian framework can be nicely summarised by what is called a directed acyclic graph (DAG for short). Below is Figure 1 in our paper. You’ll need to read the paper to understand fully what is going on but, in short, this graph shows the link between what we actually observe (shaded circle observed counts   and confusion matrix  ) and the rest of the model components. We are mainly interested in the underlying parameters () that generate the true counts. These could be expected levels of a species, for example.  Our algorithm generates distributions for these parameters based on our model and conditional on the observed counts and confusion matrix.

The basic DAG used in our paper

More Stuff on Plankton

The AI classifier algorithm for plankton pre-dates most of the work in our paper. In 2021, The Alan Turing Institute hosted a data study group to look at the problem of automated plankton classification. The plankton imager (see picture below) was used to generate a dataset of labelled images consisting of copepods (10,275), non-copepods (6,716) and detritus (40,000). (Note that copepods are a group of zooplankton that deserves particular attention: they are often the dominant taxa in collected samples.) From these images, the AI algorithm used in this paper was developed.

The plankton imager: an automated instrument that takes images of all passing particles in seawater pumped on board a ship
The Cefas RV Endeavour – home of the plankton imager

Where Next?

Our method is widely applicable in many subject areas where classification errors occur. I’ve explained the concepts in terms of an AI classifier. However, the classifier doesn’t have to use AI or even computers. For example, remember the lateral flow test from the Covid pandemic? It classified saliva samples as either Covid positive or negative. And, like AI, it sometimes got things wrong.

The bigger picture for us at Cefas is that we want to use AI image classification to do environmental monitoring – so we want to get the maths right.

The next task for the statisticians on this paper is to modify our approach so that it can be used for beach litter identification from images provided by drones. There will be many more categories (up to 80) than for plankton – so this will be quite a challenge!

Read the full article here!

Post edited by Sthandiwe Kanyile

Leave a comment