Your browser doesn't support javascript.
From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.
Melnyk, Andrew; Mohebbi, Fatemeh; Knyazev, Sergey; Sahoo, Bikram; Hosseini, Roya; Skums, Pavel; Zelikovsky, Alex; Patterson, Murray.
  • Melnyk A; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
  • Mohebbi F; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
  • Knyazev S; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
  • Sahoo B; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
  • Hosseini R; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
  • Skums P; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
  • Zelikovsky A; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
  • Patterson M; World-Class Research Center "Digital Biodesign and Personalized Healthcare," I.M. Sechenov First Moscow State Medical University, Moscow, Russia.
J Comput Biol ; 28(11): 1113-1129, 2021 11.
Article in English | MEDLINE | ID: covidwho-1483349
Preprint
This scientific journal article is probably based on a previously available preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See preprint
ABSTRACT
The availability of millions of SARS-CoV-2 (Severe Acute Respiratory Syndrome-Coronavirus-2) sequences in public databases such as GISAID (Global Initiative on Sharing All Influenza Data) and EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) (the United Kingdom) allows a detailed study of the evolution, genomic diversity, and dynamics of a virus such as never before. Here, we identify novel variants and subtypes of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intrahost viral populations. We asses our results using clustering entropy-the first time it has been used in this context. Our clustering approach reaches lower entropies compared with other methods, and we are able to boost this even further through gap filling and Monte Carlo-based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the U.K. and GISAID data sets, and is also able to detect the much less represented (<1% of the sequences) Beta (South Africa), Epsilon (California), and Gamma and Zeta (Brazil) variants in the GISAID data set. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large data sets.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Cluster Analysis / Computational Biology Topics: Variants Limits: Humans Country/Region as subject: Africa / North America / South America / Brazil / Europa Language: English Journal: J Comput Biol Journal subject: Molecular Biology / Medical Informatics Year: 2021 Document Type: Article Affiliation country: Cmb.2021.0302

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Cluster Analysis / Computational Biology Topics: Variants Limits: Humans Country/Region as subject: Africa / North America / South America / Brazil / Europa Language: English Journal: J Comput Biol Journal subject: Molecular Biology / Medical Informatics Year: 2021 Document Type: Article Affiliation country: Cmb.2021.0302