ABSTRACT
Dengue, like other arboviruses with broad clinical spectra, can easily be misdiagnosed as other infectious diseases due to the overlap of signs and symptoms. During large outbreaks, severe dengue cases have the potential to overwhelm the health care system and understanding the burden of dengue hospitalizations is therefore important to better allocate medical care and public health resources. A machine learning model that used data from the Brazilian public healthcare system database and the National Institute of Meteorology (INMET) was developed to estimate potential misdiagnosed dengue hospitalizations in Brazil. The data was modeled into a hospitalization level linked dataset. Then, Random Forest, Logistic Regression and Support Vector Machine algorithms were assessed. The algorithms were trained by dividing the dataset in training/test set and performing a cross validation to select the best hyperparameters in each algorithm tested. The evaluation was done based on accuracy, precision, recall, F1 score, sensitivity, and specificity. The best model developed was Random Forest with an accuracy of 85% on the final reviewed test. This model shows that 3.4% (13,608) of all hospitalizations in the public healthcare system from 2014 to 2020 could have been dengue misdiagnosed as other diseases. The model was helpful in finding potentially misdiagnosed dengue and might be a useful tool to help public health decision makers in planning resource allocation.