Post provided by Mariona Roigé

The Need for Modelling

Green vegetable bug nymph (Nezara viridula). ©John Marris. Lincoln University.
Green vegetable bug nymph (Nezara viridula). ©John Marris. Lincoln University.

Despite how far modelling has taken us in science, the use of models remains controversial. Modelling covers a huge range of common practices, from scaled models of ships to determine the shape that will have the least resistance to water to complex, comprehensive ‘models of everything’. A great example of the latter is the Earth System Model. This model aims to understand the changes in global climate by taking into account the interaction between physical climate, biosphere, the atmosphere and the oceans. Basically, a model of how the Earth works.

The controversy in the use of modelling resides in how accurately the model describes reality and the level of confidence we have in its outputs. The first argument can be a bit counter-intuitive: sometimes, a very simple model can be a great predictor. Actually, the conventional view in ecology is that simple models are more generalisable than complex models, although this view is being challenged. However, the level of confidence, or the level of uncertainty, that we have in the outputs of the model is a crucial point. We need to be able to accurately determine our levels of uncertainty if we want people to trust our models.

Uncertainty Communication

Ecological modelling has two primary objectives:

  1. To better understand nature
  2. To use this improved understanding to help decision making

One difficulty applying modelling to decision making is dealing with uncertainty. In modelling, uncertainty comes from several sources, as described in the most important uncertainty taxonomies and reviews. Two strategies can be used when dealing with uncertainty from a modelling point of view:

  1. Reducing uncertainty whenever possible
  2. Showing uncertainty whenever it is irreducible

In our Methods in Ecology and Evolution article ‘Cluster validity and uncertainty assessment for self-organizing map pest profile analysis’, we showed our levels of uncertainty using zeta diversity.

The Example of SOM Modelling Approach

‘Self-organising maps for analysing pest profiles’, or SOM PPA, is an ecological modelling tool that aims to assist pest risk assessment. It helps biosecurity agencies and decision makers to make important decisions on prioritising efforts to detect hazardous species out of a list of hundreds of potential agricultural pests out there.

Regions in red cells are more similar to the target region than regions located in green or white cells.
Regions in red cells are more similar to the target region than regions located in green or white cells.

A pest profile is the assemblage of insect pest species in a region. Self-organising maps are artificial neural network algorithms that perform unsupervised classification. In SOM PPA, pest profiles for all geopolitical regions of the world are collected and their similarity is analysed. Regions clustered together are assumed to share similar biotic and abiotic conditions that have allowed their respective species assemblages to become established. The output of a SOM PPA is a rectangular map of hexagonally shaped clusters, where each hexagon is called a cell or neuron. The map presents spatial similarity features – two regions located in clusters closest in the map are more similar to each other than two regions located in cells far apart. Therefore, two regions in the same cell contain very similar assemblages of insect pests.

Once we have the information about which regions are similar, we then infer whether they can act as donors/recipients of pest species to one another. This is very valuable information for a biosecurity agency. Because of this, SOM PPA has a high potential to be useful in real life applications, but how confident can we be in its resulting clusters? Up to now, the SOM PPA modelling tool had no measure of confidence, and that was a problem indeed. In the search for a suitable measure, the zeta diversity stood out of a group of many candidates.

Zeta as a Similarity Measure

 Zeta (ζ) diversity is a recently developed measure of the number of species that are shared by multiple species assemblages. It was developed as a way to overcome the limitations of traditional pairwise similarity measures of beta diversity.  Zeta diversity can be divided into ζ1, the average number of species per assemblage (or alpha diversity); ζ2, the average number of species shared by any two assemblages (beta diversity); ζ3, the average number of species shared by any three assemblages; and so on until the maximum number of assemblages is reached (the number of assemblages indicated by a subscript is referred to as the “order” of zeta).

Applying Zeta Diversity to the Outputs of SOM PPA

We calculated all the possible zeta diversity orders for each of the hexagonal cells/neurons resulting from our SOM PPA analysis. We favoured zeta diversity over Jaccard or Sorensen similarity metrics because it can show what happens at higher orders, i.e. how similar three, four, or n assemblages are, whereas Jaccard or Sorensen are pairwise comparison measures. It was precisely this interesting feature of zeta diversity that allowed us to discover different levels of similarity in our hexagonal clusters. We found three clearly different behaviours of the zeta orders:

  1. Low uncertainty clusters: These were clusters with high ζ2 or, more precisely, high pairwise similarity and high ζ3, ζ4, …, ζmax similarity. That means two random regions picked from that cluster will share a lot of species (ζ2), three regions picked from the cluster will also share a high number of species, etc. So, we can tell that these clusters would be good candidates to act as donors/recipients for each other’s’ pest species.
  2. Medium uncertainty clusters: We found clusters that presented high ζ2 but not high ζ3, ζ4, …, ζmax. Pairwise comparisons of regions would be ok, but if three regions were picked and compared, their similarity would be low. We interpreted this as ‘superficial’ similarity.. These clusters weren’t as good candidates to be donors/recipients for each other as the ones from the first group.
  3. High uncertainty clusters: The third group of clusters were those with poor values of pairwise similarity (low ζ2) and therefore, poor values of higher order similarity (ζ3, ζ4, …, ζmax). These regions could not act as donors/recipients for each other’s pest species.
Diagram of the application of Zeta to the outputs of SOM PPA.
Diagram of the application of Zeta to the outputs of SOM PPA.


Using Zeta Diversity to Communicate Modelling Results

Uncertainty is an inherent feature of research and decision makers need confidence intervals, levels of uncertainty, categories of risks, percentages… anything to go along with the measures our research provides. In ‘Cluster validity and uncertainty assessment for self-organizing map pest profile analysis’, we took advantage of the unique multi-assemblage comparison feature of the zeta metric and it helped us discover different levels of performance and confidence in our modelling results. For example, if the region we’re running the SOM PPA model for is located in one of the clusters with high uncertainty, we would advise decision makers to remain cautious about using those results to make strong inference about which species are suitable to become invasive in the target region. However, if the target region was located in a low uncertainty cell, we can confidently say that all the regions located in that cell do share a high number of species. The regions probably share a set of biotic and abiotic conditions that make the species transfer between them highly possible, so we can make inferences about the possibility of invasive species entering the target region.

Ultimately, when modelling, we are always subject to a high degree of uncertainty arising from multiple sources. Uncertainty in environmental modelling arises from the data we use  from the theoretical assumptions on which we base our models from the statistical tools we use to run our models, and from the modellers and decision makers themselves.

So, in few words, we’re doomed. However we must be realistic. Efforts to identify and reduce all these sources of uncertainty are very important, but unlikely to be flawless. In the meantime, we still need to be able to make decisions, run our models and present our results. Finding the right tools for doing so is not always easy, and that’s why we are so glad that zeta diversity has worked so well for SOM PPA, and can be extrapolated to other cluster-based similarity assessments.

To find out more about the SOM PPA and zeta diversity, read our Methods in Ecology and Evolution article Cluster validity and uncertainty assessment for self-organizing map pest profile analysis.