RESUMO
BACKGROUND: Data containing notified cases of disease are often compromised by incomplete or partial information related to individual cases. In an effort to enhance the value of information from enteric disease notifications in New Zealand, this study explored the use of Bayesian and Multiple Imputation (MI) models to fill risk factor data gaps. As a test case, overseas travel as a risk factor for infection with campylobacteriosis has been examined. METHODS: Two methods, namely Bayesian Specification (BAS) and Multiple Imputation (MI), were compared regarding predictive performance for various levels of artificially induced missingness of overseas travel status in campylobacteriosis notification data. Predictive performance of the models was assessed through the Brier Score, the Area Under the ROC Curve and the Percent Bias of regression coefficients. Finally, the best model was selected and applied to predict missing overseas travel status of campylobacteriosis notifications. RESULTS: While no difference was observed in the predictive performance of the BAS and MI methods at a lower rate of missingness (<10 %), but the BAS approach performed better than MI at a higher rate of missingness (50 %, 65 %, 80 %). The estimated proportion (95 % Credibility Intervals) of travel related cases was greatest in highly urban District Health Boards (DHBs) in Counties Manukau, Auckland and Waitemata, at 0.37 (0.12, 0.57), 0.33 (0.13, 0.55) and 0.28 (0.10, 0.49), whereas the lowest proportion was estimated for more rural West Coast, Northland and Tairawhiti DHBs at 0.02 (0.01, 0.05), 0.03 (0.01, 0.08) and 0.04 (0.01, 0.06), respectively. The national rate of travel related campylobacteriosis cases was estimated at 0.16 (0.02, 0.48). CONCLUSION: The use of BAS offers a flexible approach to data augmentation particularly when the missing rate is very high and when the Missing At Random (MAR) assumption holds. High rates of travel associated cases in urban regions of New Zealand predicted by this approach are plausible given the high rate of travel in these regions, including destinations with higher risk of infection. The added advantage of using a Bayesian approach is that the model's prediction can be improved whenever new information becomes available.
Assuntos
Infecções por Campylobacter/epidemiologia , Notificação de Doenças , Modelos Teóricos , Viagem , Teorema de Bayes , Infecções por Campylobacter/prevenção & controle , Infecções por Campylobacter/transmissão , Humanos , Nova Zelândia/epidemiologia , Fatores de Risco , População RuralRESUMO
The purpose of this study was to apply a novel statistical method for variable selection and a model-based approach for filling data gaps in mortality rates associated with foodborne diseases using the WHO Vital Registration mortality dataset. Correlation analysis and elastic net regularization methods were applied to drop redundant variables and to select the most meaningful subset of predictors. Whenever predictor data were missing, multiple imputation was used to fill in plausible values. Cluster analysis was applied to identify similar groups of countries based on the values of the predictors. Finally, a Bayesian hierarchical regression model was fit to the final dataset for predicting mortality rates. From 113 potential predictors, 32 were retained after correlation analysis. Out of these 32 predictors, eight with non-zero coefficients were selected using the elastic net regularization method. Based on the values of these variables, four clusters of countries were identified. The uncertainty of predictions was large for countries within clusters lacking mortality rates, and it was low for a cluster that had mortality rate information. Our results demonstrated that, using Bayesian hierarchical regression models, a data-driven clustering of countries and a meaningful subset of predictors can be used to fill data gaps in foodborne disease mortality.