Pre-processing SARS-CoV-2 Sequence Data for Application of Machine Learning Techniques for Visualization and Clustering of Virus Characteristics
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
; : 663-669, 2022.
Article
in English
| Scopus | ID: covidwho-2217960
ABSTRACT
SARS-CoV-2, first known as unknown pneumonia on December 31, 2019, has been around the world for more than two years. As the virus has spread for a long time, various types of mutant viruses have occurred, and the sequence data of the virus has been accumulated considerably. Therefore, studies are being conducted on the types of mutations that are divided by analyzing sequence data and what features are found in which variants. Traditionally, this kinds of sequence analysis has been dominated by analysis and visualization using phylogenetic trees. Analysis with these phylogenetic trees can be useful if there is not much data. However, analysis and visualization are not easy when there are hundreds of thousands or millions of data. Thus, in this study, we propose a method to pre-process virus sequence data so that several machine learning techniques can be applied to better analyze and visualize data. In this study, SARS-CoV-2 sequence data is pre-processed by suggesting method and machine learning models such as Auto Encoder and DBSCAN are applied to extract important features and clustering the data. According to the experimental results, important features were extracted by reducing the dimension of the data, and it was confirmed that a numerous amount of viruses were well visualized on 3-dimensional graphs depending on the characteristics of the data, and that they were well clustered according to the virus variation. © 2022 Asia-Pacific of Signal and Information Processing Association (APSIPA).
Full text:
Available
Collection:
Databases of international organizations
Database:
Scopus
Language:
English
Journal:
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Year:
2022
Document Type:
Article
Similar
MEDLINE
...
LILACS
LIS