ABSTRACT
Estimating the true magnitude of infections was one of the significant challenges in combating the COVID-19 outbreak early on. Our inability in doing so allowed unreported infections to drive up disease spread in numerous regions in the US and worldwide. Even today, identifying the true magnitude (the number of total infections) is still challenging, despite the use of surveillance-based methods such as serological studies, due to their costs and biases. This paper proposes an information theoretic approach to estimate total infections accurately. Our approach is built on top of ordinary differential equations based epidemiological models, which have been used extensively in understanding the dynamics of COVID-19, and aims to estimate the true total infections and a parameterization that "best describes" the observed reported infections. Our experiments show that the parameterization learned by our framework leads to a better estimation of total infections and forecasts of the reported infections compared to a "baseline" parameterization, which is learned via usual model calibration. We also demonstrate that our framework can be leveraged to simulate what-if scenarios with non-pharmaceutical interventions. Our results also support earlier findings that most COVID-19 infections were unreported and non-pharmaceutical interventions indeed helped mitigate the COVID-19 outbreak. Our approach gives a general method to use information theoretic techniques to improve epidemic modeling, which can also be applied to other diseases.
ABSTRACT
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multi-model ensemble forecast that combined predictions from dozens of different research groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naive baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-week horizon 3-5 times larger than when predicting at a 1-week horizon. This project underscores the role that collaboration and active coordination between governmental public health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks. Significance StatementThis paper compares the probabilistic accuracy of short-term forecasts of reported deaths due to COVID-19 during the first year and a half of the pandemic in the US. Results show high variation in accuracy between and within stand-alone models, and more consistent accuracy from an ensemble model that combined forecasts from all eligible models. This demonstrates that an ensemble model provided a reliable and comparatively accurate means of forecasting deaths during the COVID-19 pandemic that exceeded the performance of all of the models that contributed to it. This work strengthens the evidence base for synthesizing multiple models to support public health action.
ABSTRACT
How do we forecast an emerging pandemic in real time in a purely data-driven manner? How to leverage rich heterogeneous data based on various signals such as mobility, testing, and/or disease exposure for forecasting? How to handle noisy data and generate uncertainties in the forecast? In this paper, we present DO_SCPLOWEEPC_SCPLOWCO_SCPLOWOVIDC_SCPLOW, an operational deep learning frame-work designed for real-time COVID-19 forecasting. DO_SCPLOWEEPC_SCPLOW-CO_SCPLOWOVIDC_SCPLOW works well with sparse data and can handle noisy heterogeneous data signals by propagating the uncertainty from the data in a principled manner resulting in meaningful uncertainties in the forecast. The deployed framework also consists of modules for both real-time and retrospective exploratory analysis to enable interpretation of the forecasts. Results from real-time predictions (featured on the CDC website and FiveThirtyEight.com) since April 2020 indicates that our approach is competitive among the methods in the COVID-19 Forecast Hub, especially for short-term predictions.