Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate.

Kaliappan, Jayakumar; Srinivasan, Kathiravan; Mian Qaisar, Saeed; Sundararajan, Karpagam; Chang, Chuan-Yu; C, Suganthan

Kaliappan, Jayakumar; Srinivasan, Kathiravan; Mian Qaisar, Saeed; Sundararajan, Karpagam; Chang, Chuan-Yu; C, Suganthan.

Kaliappan J; School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
Srinivasan K; School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
Mian Qaisar S; Electrical and Computer Engineering Department, Effat University, Jeddah, Saudi Arabia.
Sundararajan K; School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India.
Chang CY; Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan.
C S; School of Social Sciences and Languages, Vellore Institute of Technology, Vellore, India.

Front Public Health ; 9: 729795, 2021.

Article in English | MEDLINE | ID: covidwho-1448820

ABSTRACT

ABSTRACT

This paper aims to evaluate the performance of multiple non-linear regression techniques, such as support-vector regression (SVR), k-nearest neighbor (KNN), Random Forest Regressor, Gradient Boosting, and XGBOOST for COVID-19 reproduction rate prediction and to study the impact of feature selection algorithms and hyperparameter tuning on prediction. Sixteen features (for example, Total_cases_per_million and Total_deaths_per_million) related to significant factors, such as testing, death, positivity rate, active cases, stringency index, and population density are considered for the COVID-19 reproduction rate prediction. These 16 features are ranked using Random Forest, Gradient Boosting, and XGBOOST feature selection algorithms. Seven features are selected from the 16 features according to the ranks assigned by most of the above mentioned feature-selection algorithms. Predictions by historical statistical models are based solely on the predicted feature and the assumption that future instances resemble past occurrences. However, techniques, such as Random Forest, XGBOOST, Gradient Boosting, KNN, and SVR considered the influence of other significant features for predicting the result. The performance of reproduction rate prediction is measured by mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), R-Squared, relative absolute error (RAE), and root relative squared error (RRSE) metrics. The performances of algorithms with and without feature selection are similar, but a remarkable difference is seen with hyperparameter tuning. The results suggest that the reproduction rate is highly dependent on many features, and the prediction should not be based solely upon past values. In the case without hyperparameter tuning, the minimum value of RAE is 0.117315935 with feature selection and 0.0968989 without feature selection, respectively. The KNN attains a low MAE value of 0.0008 and performs well without feature selection and with hyperparameter tuning. The results show that predictions performed using all features and hyperparameter tuning is more accurate than predictions performed using selected features.

Subject(s)

COVID-19; Birth Rate; Cluster Analysis; Humans; Reproduction; SARS-CoV-2

Keywords

COVID-19; feature selection; machine learning; prediction error; regression; reproduction rate prediction

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Experimental Studies / Prognostic study / Randomized controlled trials Limits: Humans Language: English Journal: Front Public Health Year: 2021 Document Type: Article Affiliation country: Fpubh.2021.729795

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google