Pesquisa | Portal Regional da BVS

Data-driven approaches for genetic characterization of SARS-CoV-2 lineages

Fatima Mostefai; Isabel Gamache; Jessie Huang; Justin Pelletier; Ahmad Pesaranghader; David Hamelin; Carmen Lia Murall; Raphael Poujol; Jean-Christophe Grenier; Martin Smith; Etienne Caron; Morgan Craig; Jesse Shapiro; Guy Wolf; Smita Krishnaswamy; Julie Hussin.

Preprint em Inglês | bioRxiv | ID: ppbiorxiv-462270

RESUMO

The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale, leading to a tremendous amount of viral genome sequencing data. To understand the evolution of this virus in humans, and to assist in tracing infection pathways and designing preventive strategies, we present a set of computational tools that span phylogenomics, population genetics and machine learning approaches. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic, using 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets, enabling real-time analyses. Furthermore, time series change of Tajimas D provides a powerful metric of population expansion. Unsupervised learning techniques further highlight key steps in variant detection and facilitate the study of the role of this genomic variation in the context of SARS-CoV-2 infection, with Multiscale PHATE methodology identifying fine-scale structure in the SARS-CoV-2 genetic data that underlies the emergence of key lineages. The computational framework presented here is useful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of worldwide populations of humans and other organisms.

Multiscale PHATE Exploration of SARS-CoV-2 Data Reveals Multimodal Signatures of Disease

Manik Kuchroo; Jessie Huang; Patrick Wong; Jean-Christophe Grenier; Dennis Shung; Alexander Tong; Carolina Lucas; Jon Klein; Daniel Burkhardt; Scott Gigante; Abhinav Godavarthi; Benjamin Israelow; Tianyang Mao; Ji Eun Oh; Julio Silva; Takehiro Takahashi; Camila D. Odio; Arnau Casanovas-Massana; John Fournier; Yale IMPACT Team; Shelli Farhadian; Charles S. Dela Cruz; Albert I. Ko; F. Perry Wilson; Julie Hussin; Guy Wolf; Akiko Iwasaki; Smita Krishnaswamy.

Preprint em Inglês | bioRxiv | ID: ppbiorxiv-383661

RESUMO

1The biomedical community is producing increasingly high dimensional datasets, integrated from hundreds of patient samples, which current computational techniques struggle to explore. To uncover biological meaning from these complex datasets, we present an approach called Multiscale PHATE, which learns abstracted biological features from data that can be directly predictive of disease. Built on a continuous coarse graining process called diffusion condensation, Multiscale PHATE creates a tree of data granularities that can be cut at coarse levels for high level summarizations of data, as well as at fine levels for detailed representations on subsets. We apply Multiscale PHATE to study the immune response to COVID-19 in 54 million cells from 168 hospitalized patients. Through our analysis of patient samples, we identify CD16hi CD66blo neutrophil and IFN{gamma}+GranzymeB+ Th17 cell responses enriched in patients who die. Further, we show that population groupings Multiscale PHATE discovers can be directly fed into a classifier to predict disease outcome. We also use Multiscale PHATE-derived features to construct two different manifolds of patients, one from abstracted flow cytometry features and another directly on patient clinical features, both associating immune subsets and clinical markers with outcome.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA