Your browser doesn't support javascript.
loading
Data-driven approaches for genetic characterization of SARS-CoV-2 lineages
Fatima Mostefai; Isabel Gamache; Jessie Huang; Justin Pelletier; Ahmad Pesaranghader; David Hamelin; Carmen Lia Murall; Raphael Poujol; Jean-Christophe Grenier; Martin Smith; Etienne Caron; Morgan Craig; Jesse Shapiro; Guy Wolf; Smita Krishnaswamy; Julie Hussin.
Afiliação
  • Fatima Mostefai; Universite de Montreal
  • Isabel Gamache; Universite de Montreal
  • Jessie Huang; Yale University
  • Justin Pelletier; Universite de Montreal
  • Ahmad Pesaranghader; McGill University
  • David Hamelin; Universite de Montreal
  • Carmen Lia Murall; McGill University
  • Raphael Poujol; Institut de Cardiologie de Montreal
  • Jean-Christophe Grenier; Institut de Cardiologie de Montreal
  • Martin Smith; Universite de Montreal
  • Etienne Caron; Universite de Montreal
  • Morgan Craig; Universite de Montreal
  • Jesse Shapiro; McGill University
  • Guy Wolf; Universite de Montreal
  • Smita Krishnaswamy; Yale University
  • Julie Hussin; Universite de Montreal
Preprint em Inglês | bioRxiv | ID: ppbiorxiv-462270
ABSTRACT
The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale, leading to a tremendous amount of viral genome sequencing data. To understand the evolution of this virus in humans, and to assist in tracing infection pathways and designing preventive strategies, we present a set of computational tools that span phylogenomics, population genetics and machine learning approaches. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic, using 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets, enabling real-time analyses. Furthermore, time series change of Tajimas D provides a powerful metric of population expansion. Unsupervised learning techniques further highlight key steps in variant detection and facilitate the study of the role of this genomic variation in the context of SARS-CoV-2 infection, with Multiscale PHATE methodology identifying fine-scale structure in the SARS-CoV-2 genetic data that underlies the emergence of key lineages. The computational framework presented here is useful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of worldwide populations of humans and other organisms.
Licença
cc_by_nd
Texto completo: Disponível Coleções: Preprints Base de dados: bioRxiv Tipo de estudo: Experimental_studies Idioma: Inglês Ano de publicação: 2021 Tipo de documento: Preprint
Texto completo: Disponível Coleções: Preprints Base de dados: bioRxiv Tipo de estudo: Experimental_studies Idioma: Inglês Ano de publicação: 2021 Tipo de documento: Preprint
...