ImputeCoVNet: 2D ResNet Autoencoder for Imputation of SARS-CoV-2 Sequences

Ahmad Pesaranghader; Justin Pelletier; Jean-Christophe Grenier; Raphaël Poujol; Julie Hussin

This article is a Preprint

Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.

ImputeCoVNet: 2D ResNet Autoencoder for Imputation of SARS-CoV-2 Sequences

Ahmad Pesaranghader; Justin Pelletier; Jean-Christophe Grenier; Raphaël Poujol; Julie Hussin.

Affiliation

Ahmad Pesaranghader; McGill University
Justin Pelletier; Montreal Heart Institute
Jean-Christophe Grenier; Montreal Heart Institute
Raphaël Poujol; Montreal Heart Institute
Julie Hussin; Université de Montréal

Preprint in English | bioRxiv | ID: ppbiorxiv-456305

ABSTRACT

ABSTRACT

We describe a new deep learning approach for the imputation of SARS-CoV-2 variants. Our model, ImputeCoVNet, consists of a 2D ResNet Autoencoder that aims at imputing missing genetic variants in SARS-CoV-2 sequences in an efficient manner. We show that ImputeCoVNet leads to accurate results at minor allele frequencies as low as 0.0001. When compared with an approach based on Hamming distance, ImputeCoVNet achieved comparable results with significantly less computation time. We also present the provision of geographical metadata (e.g., exposed country) to decoder increases the imputation accuracy. Additionally, by visualizing the embedding results of SARS-CoV-2 variants, we show that the trained encoder of ImputeCoVNet, or the embedded results from it, recapitulates viral clades information, which means it could be used for predictive tasks using virus sequence analysis.

License

cc_by

Fulltext

Add to My VHL

XML

Search on Google

Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2021 Document type: Preprint

Fulltext

Add to My VHL

XML

Search on Google

Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2021 Document type: Preprint