Post provided by Sylvain Christin

We have now entered the era of artificial intelligence. In just a few years, the number of applications using AI has grown tremendously, from self-driving cars to recommendations from your favourite streaming provider. Almost every major research field is now using AI. Behind all this, there is one constant: the reliance, in one way or another, on deep learning. Thanks to its power and flexibility, this new subset of AI approach is now everywhere, even in ecology we show in ‘Applications for deep learning in ecology’.

But what is deep learning exactly? What makes it so special?

Deep Learning: The Basics

Deep learning is a set of methods based on representation learning: a way for machines to automatically detect how to classify data from raw examples. This means they can detect features in data by themselves, without any prior knowledge of the system. While some models can learn without any supervision (i.e. they can learn to detect and classify objects without knowing anything about them) so far these models are outperformed by supervised models. Supervised models require labelled data to train. So, if we want the model to detect cars in pictures, it will need examples with cars in them to learn to recognise them.

At a basic level, deep learning models are multilayer neural networks. Neural networks are an assembly of processing units called ‘neurons’. They’re connected in a way that imitates how brain cells work. The concept of multilayer neural networks isn’t new. It can be traced back to the 1970s and 1980s – the earliest days of automatic pattern recognition and ideas about how to train models to use what was discovered. The idea was largely abandoned during the 1990s though as people thought it was unfeasible. The interest was revived in 2006, and it achieved mainstream popularity in 2012. This was thanks to the performance of a deep learning model based on Convolutional Neural Network in image classification tasks.

This isn’t a recent idea, so why has interest deep learning only exploded recently? Three main factors can explain this phenomenon:

  • Data: The performance of deep learning models is largely dependent on the quantity of data they’re fed on. We’ve only recently been able to provide enough data to the models to achieve a sufficient accuracy.
  • Computer power: Neural networks require a lot of computer power. The recent breakthrough in deep learning was facilitated by the improvement of graphic cards and their processors—called GPU. These allow many simple computations to be done simultaneously. By leveraging the power of GPU, computers became powerful enough to train the neural networks efficiently.
  • Performance: Thanks to the two previous factors, deep learning was able to attain accuracy results in classification tasks that were equal or even better than humans. By overcoming this threshold, they became increasingly popular.

How Does Deep Learning Work?

Due to their multilayered nature, deep learning models are extremely flexible. Any two models can differ in the number of layers and the composition of each layer. The most common approach is a feed-forward network: data is given to an input layer and is then transferred to one or several processing layers – called hidden layers – up to the output layer.

Each layer is composed of neurons interconnected with each other. A neuron may or may not be connected to all the neurons of the adjacent layers. Each connection between two neurons is associated to a ‘weight’. When data is fed to the model, it goes through each layer and is transformed into increasingly abstract representations. Each layer uses the information of the previous layer to learn to detect specific features. This could be the presence of forms like circles or squares or the edges of objects in an image. The model will then associate features to a specific object by adjusting the weights of the neural connexions associated to these features.

For example, if the model tries to detect a ball in an image, the weights of neurons associated to circles will be favoured, as balls are round. If it was a box, the neurons detecting squares would have a greater weight. This adjustment is made using a method called ‘backpropagation’. It works by calculating the error between the output of the model and the input. Weights are then modified to minimise this error, starting from the output layer and going backwards to the input layer. By providing enough examples to the model, it can adjust the weights until it can detect the desired objects accurately. These examples don’t need to tell the model where an object is. They just need to tell it an object is there, and it will learn to detect it by itself, just like a human would learn to recognize an object if shown multiple times.

An Example: The Convolutional Neural Network

The convolutional neural network (CNN) is by far the most popular architecture of deep learning. It’s a flexible architecture that works well with images. As such, many different architectures exist. They’re all based on the same building blocks though.

First are the convolution blocks. They contain at least the following layers:

  • Convolutional layers: the goal of this layer is to extract features from the input using a filter in order to create a feature map. In the case of images, the input is a matrix of pixels to which is applied a filter matrix. The filter can vary between layers in order to extract different features. Once the feature map has been calculated, non-linearity is added to the feature map (as real-world data is not expected to be linear).
  • Pooling layers: these layers immediately follow the convolutional layers. Their goal is to reduce the dimensionality of the data while still retaining the important information.
Schematic view of a CNN giving a probability (in parenthesis) that the input belongs to one of the trained categories.

Each CNN can have one or several of these convolution blocks in succession. The output of these blocks represents high-level features of the input. They will then go through one or several fully connected layers – layers where neurons are all connected to each other – where these features will be associated together to match a specific object.

The architecture of a CNN can vary in many ways: the number of convolutional blocks, the type of filter used in the convolution, the non-linearity function used, the number of total layers, etc. Popular implementations include AlexNet, VGG, ResNet  or Inception. Most popular deep learning frameworks include an out-of-the-box implementation of these networks.

Going Further with Deep Learning

CNN and feed-forward networks are the most popular approaches for classification and detection in images. They aren’t always the most appropriate way to handle data though and other approaches exist. For instance, for sequential data (such as speech or time series), recurrent neural networks (RNN) are often more appropriate. RNNs usually have only one hidden layer but they process elements in sequence, one at a time and keep a memory of previous elements, with each output included in the input of the next element. The summation of each individual step can be seen as one very deep feed-forward network. A popular implementation is the Long-Term Short-Memory network (LTSM), an architecture capable of learning long‐term dependencies that has proven especially efficient for tasks such as speech recognition.

As well as playing with the number and composition of layers in a network, it’s also possible to combine several networks together. For instance, generative adversarial networks (GAN) let you generate data by using two CNNs pitted against one another. One will generate data and the second will evaluate whether the data seems real or not. The goal is to be able to trick the second network by generating high quality data. CNNs and RNNs have been used together to generate text to describe the content of an image.

It’s this flexibility, associated with good performance that explains why deep learning has become so popular so fast. Possibilities are endless, and while it might not solve everything, it sure opens a lot of prospects.

To find out more about deep learning, read our Methods in Ecology and Evolution article ‘Applications for deep learning in ecology

To get deeper and have some quick information on deep learning we recommend the following: