Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

Abrego, Nerea; Ovaskainen, Otso

Abrego, Nerea; Ovaskainen, Otso.

Afiliação

Abrego N; Department of Biological and Environmental Science University of Jyväskylä Jyväskylä Finland.
Ovaskainen O; Department of Agricultural Sciences University of Helsinki Helsinki Finland.

Ecol Evol ; 13(12): e10784, 2023 Dec.

Article em En | MEDLINE | ID: mdl-38111919

ABSTRACT

ABSTRACT

When comparing multiple models of species distribution, models yielding higher predictive performance are clearly to be favored. A more difficult question is how to decide whether even the best model is "good enough". Here, we clarify key choices and metrics related to evaluating the predictive performance of presence-absence models. We use a hierarchical case study to evaluate how four metrics of predictive performance (AUC, Tjur's R 2, max-Kappa, and max-TSS) relate to each other, the random and fixed effects parts of the model, the spatial scale at which predictive performance is measured, and the cross-validation strategy chosen. We demonstrate that the very same metric can achieve different values for the very same model, even when similar cross-validation strategies are followed, depending on the spatial scale at which predictive performance is measured. Among metrics, Tjur's R 2 and max-Kappa generally increase with species' prevalence, whereas AUC and max-TSS are largely independent of prevalence. Thus, Tjur's R 2 and max-Kappa often reach lower values when measured at the smallest scales considered in the study, while AUC and max-TSS reaching similar values across the different spatial levels included in the study. However, they provide complementary insights on predictive performance. The very same model may appear excellent or poor not only due to the applied metric, but also how predictive performance is exactly calculated, calling for great caution on the interpretation of predictive performance. The most comprehensive evaluation of predictive performance can be obtained by evaluating predictive performance through the combination of measures providing complementary insights. Instead of following simple rules of thumb or focusing on absolute values, we recommend comparing the achieved predictive performance to the researcher's own a priori expectations on how easy it is to make predictions related to the same question that the model is used for.

Palavras-chave

AUC; Tjur's R2; accuracy; crossvalidation; discrimination; joint species distribution model; maxTSS; maxkappa; presenceabsence model; sensitivity; specificity

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Ecol Evol Ano de publicação: 2023 Tipo de documento: Article País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google