Stuck between Zero and One: Modelling Non-Count Proportions with Beta and Dirichlet Regression

Post provided by JAMES WEEDON & BOB DOUMA

Chinese translation provided by Zishen Wang

這篇博客文章也有中文版

Proportion of leaf damage is a type of measurement that can lead to proportional data.

Imagine the scene: you’re presenting your exciting research results at an important international conference. Being conscientious and aware of statistical best-practice and so you’ve included test statistics and confidence intervals on all your result figures. Not just P values! Some of the data you are presenting involves the proportion of leaf surface damaged by an insect herbivore under different treatments. You finish your presentation (on time!) and there’s time for questions. From the audience a polite but insistent colleague asks: “Your confidence interval for that estimate goes from -0.3 to 0.5… how should we interpret a negative proportion of a leaf?”.

Someone chuckles. As you nervously flick back to the slide in question, you mutter something about the difference between confidence intervals and point estimates. You start to feel dizzy. A murmur of confused voices slowly builds amongst the audience members. In the distance, a dog barks.

How can you avoid this?

Proportional Data in Ecology and Evolution

Many kinds of quantities that ecologists and evolutionary biologists routinely measure are most conveniently expressed as proportions. In many cases these proportions are derived from counts. The data are based on discrete entities that can be assigned to two or more classes: success or failure, male or female, invasive or non-invasive. In other cases the proportions are derived from continuous measurements: the proportion of time an animal spends on different activities;  percent cover of a plant functional type in a vegetation survey quadrat; allocation of total plant biomass to different organs and tissues. What these data types have in common is that they can only take values between zero and one. Negative values, or values greater than one, don’t make any sense. Continue reading