Stop, think, and beware of default options

Post provided by Paula Pappalardo (with contributions from Elizabeth Hamman, Jim Bence, Bruce Hungate & Craig Osenberg)

Esta publicación también está disponible en español.

You spent months carefully collecting data from articles addressing your favorite scientific question, you have dozens of articles neatly arranged on a spreadsheet, you found software or code to analyze the data, and then daydream about how your publication will be the most cited in your field while making cool plots. If that sounds familiar, you have probably done a meta-analysis. Meta-analysis uses statistical models to combine data from different publications to answer a specific question.

What you may not have realized when going down the meta-analysis rabbit hole, is that small, seemingly inconsequential, choices can greatly affect your results. If you want to know about one of them lurking behind the scenes… read on!

Uncertainty intervals used in ecological meta-analysis

Thanks to Cookie the Pom for sharing their work on Unsplash.

Most of us have encountered confidence intervals based on the normal distribution (also known as a z-distribution), which are widely used, often highlighted in text books, and tend to be the default confidence interval to calculate an overall effect size in meta-analytic software. Ideally, a confidence interval will match the coverage, which is the probability that the confidence interval contains the actual effect size. In other words, a 95% confidence interval should include (or “cover”) the true effect size 95% of the time. The problem is that the normal distribution is based on a large sampling approximation, and ecologists doing meta-analysis may not have enough studies to justify this assumption. If the number of studies is low and a normal approximation is used, the confidence intervals may be too narrow, leading to low coverage, and giving a false impression that your results are more precise than they actually are.

Fortunately, there are other ways to construct confidence intervals less prone to this error of over-precision. What are they? In our recent Methods in Ecology and Evolution paper (written by myself, Kiona Ogle, Elizabeth Hamman, Jim Bence, Bruce Hungate & Craig Osenberg), we investigated the performance of traditional frequentist and Bayesian methods to calculate a mean effect size and its uncertainty interval. In the medical literature, meta-analyses with a low number of studies can be common in studies of rare diseases. To achieve good coverage when the number of studies is small, methodological papers in the medical field suggest using the Hartung-Knapp-Sidik-Jonkman (HKSJ) method. The HKSJ combines a refined estimate of the among-study variance with a t-distribution. In ecology, bootstrap confidence intervals have been recommended because they are robust to departures from normality. And when a meta-analysis is carried out in a Bayesian framework, we can obtain the equivalent of a confidence interval, the credible interval. Many people argue that credible intervals can offer a more straightforward interpretation, but they depend on a prior distribution that needs to be defined. Even though there is a steep learning curve for Bayesian methods, that can be intimidating for beginners, its use is increasing among ecologists.

In “Comparing traditional and Bayesian approaches to ecological meta-analysis”, we found that most authors reported confidence intervals using the bootstrap or normal distribution. Surprisingly, 39% of the meta-analyses reviewed did not report the type of confidence interval used to estimate the uncertainty around their effect sizes.

Does it matter (and what should we do about it)?

Simulated meta-analyses can be useful to explore which type of confidence (or credible) interval performs best to measure uncertainty around a mean effect size. When running simulations it is important to simulate meta-analytic datasets that reflect the data of interest, in this case, data encountered by ecologists. Our review of ecological meta-analyses showed that most meta-analyses included studies with low number of replicates (median = 5), and combined a small number of studies to estimate a mean effect size (median = 44). A previous review of meta-analyses in ecology and evolution had shown high heterogeneity, which was the last piece of the puzzle needed to simulate data representative of ecological meta-analysis. We based our simulations on these conditions.

The number of studies included in the meta-analysis was the more important factor affecting performance of the uncertainty intervals. To simplify, we refer to “good” performance when the coverage of confidence intervals was indistinguishable from nominal coverage (i.e., the 95% confidence intervals in our simulations included the true effect size close to 95% of the time), and bad performance when the coverage deviated from 0.95.

As we expected, the confidence interval based on the normal distribution (which is the default setting in commonly used meta-analysis software) did not perform well when the number of studies was lower than 40. We also found that the bootstrap confidence interval did not perform well. Only the Bayesian credible interval and the HKSJ interval provided good coverage. Yet, these methods were used in less than 3% of published meta-analyses that we reviewed.

Our advice for ecologists

  • Check software defaults to know what confidence interval you are using in your meta-analysis, and more importantly, report your choice.
  • If you are planning to use a random-effects meta-analysis using frequentist methods, we recommend using the HKSJ confidence interval. In the R package “metafor”, which is currently one of the most popular for conducting meta-analysis, this can be easily set up adding the option knha = T.
  • If you have a low number of studies for a particular meta-analysis, be aware that using the normal distribution or a bootstrap confidence interval can give you a low coverage and yield false positive results.
  • If you feel ready for a rewarding challenge, use a Bayesian approach, which also performs well.

To find out more advice on which uncertainty interval to use in ecological meta-analysis, read the Methods in Ecology and Evolution article, ‘Comparing traditional and Bayesian approaches to ecological meta-analysis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s