I’m sure by now you’ve heard of MAXENT. Have you got the impression that it’s some revolutionary new method that sits apart from classical methods like GLM? If so I have some big news for you.

First a little background – maximum entropy modelling (MAXENT) had its origins in the 1950’s, and went quiet for some time before a resurgence in the machine learning literature which has blown over into ecology in a major way. Steven Phillips et al (2006) proposed a MAXENT approach to species distribution modelling using presence-only data in a spectacularly successful paper (cited 600 times in 2012), and Bill Shipley et al (2006) proposed a MAXENT method for using traits to understand community structure in the journal Science. Ever since there has been a flurry of work to understand MAXENT and its properties, recent Methods papers on the topic including Charles Yackulic et al (in press) on assumption-checking (or lack thereof), and a critical review by Andy Royle et al (2012).

Pros and cons – One reason for the appeal of MAXENT is that it tends to perform well in comparisons of predictive performance, some also see the MAXENT “minimum assumption” philosophy as attractive. On the other hand, some critics argue the method is “blackbox” (a property that always makes control-freaks like me a bit nervous) and point to a lack of clarity about the meaning of the quantity that is actually being modelled. Also, Phillips et al (2006) were most forthcoming in acknowledging that formal inferential and model checking tools are not available, at least, not in their software, in part because the methodology is not as well understood as classical competitors, such as generalised linear models.

Now for the big news – MAXENT, as implemented in presence-only analyses, has recently been shown to be exactly mathematically equivalent to a GLM, more specifically, to a Poisson regression (also known as “log-linear modelling”). Poisson regression is a standard method from statistics that is well understood and implemented on most statistical packages, e.g. R users can type glm(formula, family=”poisson”). This equivalence result is big news because a modern icon of ecological modelling has now been revealed to be equivalent to an older method that went out of fashion some time ago… it’s a bit like Beyonce taking off a face mask at the Superbowl to reveal that all along it’s actually been… Janet Jackson!

Beyonce image
MAXENT: big since the 00’s, like Beyonce

Janet Jackson image
GLMs: made it big in the 80’s but less in the limelight recently, like Janet

Key papers – Ian Renner and I stumbled upon this MAXENT=GLM equivalence result, and discussed some of its implications in a paper in press at Biometrics. We also made a connection with point process models (PPMs), which can often be implemented as GLMs. PPMs are pretty much THE appropriate statistical framework for modelling data that arise as a set of point locations, but their application to presence-only data was only recently appreciated — as well a paper by myself and Leah Shepherd (2010), see Avishek Charaborty, Alan Gelfand et al (2011) and Geert Aarts et al (2012), whose Methods paper also connected to the literature on resource selection functions. Trevor Hastie and Will Fithian also noticed the MAXENT=GLM equivalence result – their pre-print on ArXiv contains a number of interesting ideas.

Importance – This MAXENT=GLM equivalence result immediately changed our perspective on what MAXENT is and does – for example, in one fell swoop it dismisses all the advantages/disadvantages about MAXENT stated in this blog post (see Pros and cons). One cannot argue that MAXENT has better predictive performance than GLM when it is equivalent – any differences, e.g. those seen in the famous Elith et al paper, must be due to differences in how the methods are applied rather than due to any differences in the analysis methods per se. (The main difference in how they are applied I think is that MAXENT by default uses a LASSO penalty, rarely used with GLM, but seemingly highly effective.) Also, it is hard to argue philosophically that MAXENT is better because it makes minimal assumptions – it is mathematically equivalent to a method that makes all the assumptions we were trying to avoid in the first place! On the other hand, practitioners with an understanding of GLM can no longer argue that MAXENT is blackbox, and the “MAXENT lacks clarity” criticism can be addressed by reexpressing as a point process model. Finally, there is no longer a need to be concerned about any lack of inferential or model-checking tools – GLM and PPM have these in spades. While Yackulic et al (in press) bemoaned the lack of MAXENT assumption checking in the literature, practitioners no longer have much by way of excuses given the suite of model checking tools available for GLMs and PPMs. An important outcome of our model checks to date is that presence-only data often violate the all-important independence assumption, meaning that often we have needed to use alternatives to the MAXENT model.

It is interesting to think about what will happen now that we know the two methods are one and the same – will MAXENT go out of fashion or will GLM be everyone’s best friend again? Personally, as an “old-school” statistician, I’m hoping for a return to thinking more carefully about data properties and how to specify and check models that reflect the particular properties of any given dataset. If that sounds boring, would it help if we threw in a LASSO penalty?

PS: not sure if you’ve heard of the two-day Eco-Stats symposium in Sydney, Australia, July 11-12 2013? This will include a special session on maximum entropy modelling, featuring Trevor Hastie and Bill Shipley, as well as a presence-only modelling session, featuring Jane Elith and Adrian Baddeley. Beyonce is yet to confirm. So if this blog post is up your alley then please come along!

By David Warton
Associate Editor, Methods in Ecology and Evolution