Your browser doesn't support javascript.
Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest.
Hahn, Georg; Lee, Sanghun; Prokopenko, Dmitry; Abraham, Jonathan; Novak, Tanya; Hecker, Julian; Cho, Michael; Khurana, Surender; Baden, Lindsey R; Randolph, Adrienne G; Weiss, Scott T; Lange, Christoph.
  • Hahn G; Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, 02115, USA. ghahn@hsph.harvard.edu.
  • Lee S; Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, 02115, USA.
  • Prokopenko D; Department of Medical Consilience, Graduate School, Dankook University, Yongin, South Korea.
  • Abraham J; Genetics and Aging Research Unit, Department of Neurology, McCance Center for Brain Health, Massachusetts General Hospital, Boston, MA, 02114, USA.
  • Novak T; Department of Microbiology, Harvard Medical School, Blavatnik Institute, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA.
  • Hecker J; Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children's Hospital, Boston, MA, 02115, USA.
  • Cho M; Harvard Medical School, Harvard University, Boston, MA, 02115, USA.
  • Khurana S; Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA.
  • Baden LR; Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA.
  • Randolph AG; Food and Drug Administration, Silver Spring, MD, 20993, USA.
  • Weiss ST; Division of Infectious Diseases, Harvard Medical School, Brigham and Women's Hospital, Boston, MA, 02115, USA.
  • Lange C; Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children's Hospital, Boston, MA, 02115, USA.
BMC Bioinformatics ; 23(1): 547, 2022 Dec 19.
Artículo en Inglés | MEDLINE | ID: covidwho-2196036
ABSTRACT
As of June 2022, the GISAID database contains more than 11 million SARS-CoV-2 genomes, including several thousand nucleotide sequences for the most common variants such as delta or omicron. These SARS-CoV-2 strains have been collected from patients around the world since the beginning of the pandemic. We start by assessing the similarity of all pairs of nucleotide sequences using the Jaccard index and principal component analysis. As shown previously in the literature, an unsupervised cluster analysis applied to the SARS-CoV-2 genomes results in clusters of sequences according to certain characteristics such as their strain or their clade. Importantly, we observe that nucleotide sequences of common variants are often outliers in clusters of sequences stemming from variants identified earlier on during the pandemic. Motivated by this finding, we are interested in applying outlier detection to nucleotide sequences. We demonstrate that nucleotide sequences of common variants (such as alpha, delta, or omicron) can be identified solely based on a statistical outlier criterion. We argue that outlier detection might be a useful surveillance tool to identify emerging variants in real time as the pandemic progresses.
Asunto(s)
Palabras clave

Texto completo: Disponible Colección: Bases de datos internacionales Base de datos: MEDLINE Asunto principal: COVID-19 Tipo de estudio: Estudios diagnósticos / Estudio pronóstico Tópicos: Variantes Límite: Humanos Idioma: Inglés Revista: BMC Bioinformatics Asunto de la revista: Informática Médica Año: 2022 Tipo del documento: Artículo País de afiliación: S12859-022-05105-y

Similares

MEDLINE

...
LILACS

LIS


Texto completo: Disponible Colección: Bases de datos internacionales Base de datos: MEDLINE Asunto principal: COVID-19 Tipo de estudio: Estudios diagnósticos / Estudio pronóstico Tópicos: Variantes Límite: Humanos Idioma: Inglés Revista: BMC Bioinformatics Asunto de la revista: Informática Médica Año: 2022 Tipo del documento: Artículo País de afiliación: S12859-022-05105-y