This article is a Preprint
Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.
A multi-class gene classifier for SARS-CoV-2 variants based on convolutional neural network (preprint)
biorxiv; 2021.
Preprint
in English
| bioRxiv | ID: ppzbmed-10.1101.2021.11.22.469492
ABSTRACT
Surveillance of circulating variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is of great importance in controlling the coronavirus disease 2019 (COVID-19) pandemic. We propose an alignment-free in silico approach for classifying SARS-CoV-2 variants based on their genomic sequences. A deep learning model was constructed utilizing a stacked 1-D convolutional neural network and multilayer perceptron (MLP). The pre-processed genomic sequencing data of the four SARS-CoV-2 variants were first fed to three stacked convolution-pooling nets to extract local linkage patterns in the sequences. Then a 2-layer MLP was used to compute the correlations between the input and output. Finally, a logistic regression model transformed the output and returned the probability values. Learning curves and stratified 10-fold cross-validation showed that the proposed classifier enables robust variant classification. External validation of the classifier showed an accuracy of 0.9962, precision of 0.9963, recall of 0.9963 and F1 score of 0.9962, outperforming other machine learning methods, including logistic regression, K-nearest neighbor, support vector machine, and random forest. By comparing our model with an MLP model without the convolution-pooling network, we demonstrate the essential role of convolution in extracting viral variant features. Thus, our results indicate that the proposed convolution-based multi-class gene classifier is efficient for the variant classification of SARS-CoV-2.
Full text:
Available
Collection:
Preprints
Database:
bioRxiv
Main subject:
Coronavirus Infections
/
COVID-19
/
Learning Disabilities
Language:
English
Year:
2021
Document Type:
Preprint
Similar
MEDLINE
...
LILACS
LIS