Your browser doesn't support javascript.
Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide.
Li, Yawei; Liu, Qingyun; Zeng, Zexian; Luo, Yuan.
  • Li Y; Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
  • Liu Q; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
  • Zeng Z; Department of Data Science, Dana Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA.
  • Luo Y; Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
Genes (Basel) ; 13(4)2022 04 07.
Article in English | MEDLINE | ID: covidwho-1785607
ABSTRACT
Deciphering the population structure of SARS-CoV-2 is critical to inform public health management and reduce the risk of future dissemination. With the continuous accruing of SARS-CoV-2 genomes worldwide, discovering an effective way to group these genomes is critical for organizing the landscape of the population structure of the virus. Taking advantage of recently published state-of-the-art machine learning algorithms, we used an unsupervised deep learning clustering algorithm to group a total of 16,873 SARS-CoV-2 genomes. Using single nucleotide polymorphisms as input features, we identified six major subtypes of SARS-CoV-2. The proportions of the clusters across the continents revealed distinct geographical distributions. Comprehensive analysis indicated that both genetic factors and human migration factors shaped the specific geographical distribution of the population structure. This study provides a different approach using clustering methods to study the population structure of a never-seen-before and fast-growing species such as SARS-CoV-2. Moreover, clustering techniques can be used for further studies of local population structures of the proliferating virus.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: SARS-CoV-2 / COVID-19 Type of study: Observational study / Prognostic study Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: Genes13040648

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: SARS-CoV-2 / COVID-19 Type of study: Observational study / Prognostic study Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: Genes13040648