Your browser doesn't support javascript.
Pre-processing SARS-CoV-2 Sequence Data for Application of Machine Learning Techniques for Visualization and Clustering of Virus Characteristics
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 ; : 663-669, 2022.
Article in English | Scopus | ID: covidwho-2217960
ABSTRACT
SARS-CoV-2, first known as unknown pneumonia on December 31, 2019, has been around the world for more than two years. As the virus has spread for a long time, various types of mutant viruses have occurred, and the sequence data of the virus has been accumulated considerably. Therefore, studies are being conducted on the types of mutations that are divided by analyzing sequence data and what features are found in which variants. Traditionally, this kinds of sequence analysis has been dominated by analysis and visualization using phylogenetic trees. Analysis with these phylogenetic trees can be useful if there is not much data. However, analysis and visualization are not easy when there are hundreds of thousands or millions of data. Thus, in this study, we propose a method to pre-process virus sequence data so that several machine learning techniques can be applied to better analyze and visualize data. In this study, SARS-CoV-2 sequence data is pre-processed by suggesting method and machine learning models such as Auto Encoder and DBSCAN are applied to extract important features and clustering the data. According to the experimental results, important features were extracted by reducing the dimension of the data, and it was confirmed that a numerous amount of viruses were well visualized on 3-dimensional graphs depending on the characteristics of the data, and that they were well clustered according to the virus variation. © 2022 Asia-Pacific of Signal and Information Processing Association (APSIPA).
Keywords

Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 Year: 2022 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 Year: 2022 Document Type: Article