Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Preprint em Inglês | bioRxiv | ID: ppbiorxiv-462270

RESUMO

The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale, leading to a tremendous amount of viral genome sequencing data. To understand the evolution of this virus in humans, and to assist in tracing infection pathways and designing preventive strategies, we present a set of computational tools that span phylogenomics, population genetics and machine learning approaches. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic, using 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets, enabling real-time analyses. Furthermore, time series change of Tajimas D provides a powerful metric of population expansion. Unsupervised learning techniques further highlight key steps in variant detection and facilitate the study of the role of this genomic variation in the context of SARS-CoV-2 infection, with Multiscale PHATE methodology identifying fine-scale structure in the SARS-CoV-2 genetic data that underlies the emergence of key lineages. The computational framework presented here is useful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of worldwide populations of humans and other organisms.

2.
Preprint em Inglês | bioRxiv | ID: ppbiorxiv-456305

RESUMO

We describe a new deep learning approach for the imputation of SARS-CoV-2 variants. Our model, ImputeCoVNet, consists of a 2D ResNet Autoencoder that aims at imputing missing genetic variants in SARS-CoV-2 sequences in an efficient manner. We show that ImputeCoVNet leads to accurate results at minor allele frequencies as low as 0.0001. When compared with an approach based on Hamming distance, ImputeCoVNet achieved comparable results with significantly less computation time. We also present the provision of geographical metadata (e.g., exposed country) to decoder increases the imputation accuracy. Additionally, by visualizing the embedding results of SARS-CoV-2 variants, we show that the trained encoder of ImputeCoVNet, or the embedded results from it, recapitulates viral clades information, which means it could be used for predictive tasks using virus sequence analysis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...