ABSTRACT
Understanding the prevalence of infections in the population of interest is critical for making data-driven public health responses to infectious disease outbreaks. Accurate prevalence estimates, however, can be difficult to calculate due to a combination of low population prevalence, imperfect diagnostic tests, and limited testing resources. In addition, strategies based on convenience samples that target only symptomatic or high-risk individuals will yield biased estimates of the population prevalence. We present Bayesian multilevel regression and poststratification models that incorporate probability sampling designs, the sensitivity and specificity of a diagnostic test, and specimen pooling to obtain unbiased prevalence estimates. These models easily incorporate all available prior information and can yield reasonable inferences even with very low base rates and limited testing resources. We examine the performance of these models with an extensive numerical study that varies the sampling design, sample size, true prevalence, and pool size. We also demonstrate the relative robustness of the models to key prior distribution assumptions via sensitivity analyses.