Your browser doesn't support javascript.
Genomic Surveillance of COVID-19 Variants With Language Models and Machine Learning.
Nagpal, Sargun; Pal, Ridam; Tyagi, Ananya; Tripathi, Sadhana; Nagori, Aditya; Ahmad, Saad; Mishra, Hara Prasad; Malhotra, Rishabh; Kutum, Rintu; Sethi, Tavpritesh.
  • Nagpal S; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Pal R; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Ashima; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Tyagi A; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Tripathi S; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Nagori A; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Ahmad S; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Mishra HP; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Malhotra R; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Kutum R; Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Sethi T; Ashoka University, Sonipat, India.
Front Genet ; 13: 858252, 2022.
Article in English | MEDLINE | ID: covidwho-1809380
ABSTRACT
The global efforts to control COVID-19 are threatened by the rapid emergence of novel SARS-CoV-2 variants that may display undesirable characteristics such as immune escape, increased transmissibility or pathogenicity. Early prediction for emergence of new strains with these features is critical for pandemic preparedness. We present Strainflow, a supervised and causally predictive model using unsupervised latent space features of SARS-CoV-2 genome sequences. Strainflow was trained and validated on 0.9 million sequences for the period December, 2019 to June, 2021 and the frozen model was prospectively validated from July, 2021 to December, 2021. Strainflow captured the rise in cases 2 months ahead of the Delta and Omicron surges in most countries including the prediction of a surge in India as early as beginning of November, 2021. Entropy analysis of Strainflow unsupervised embeddings clearly reveals the explore-exploit cycles in genomic feature-space, thus adding interpretability to the deep learning based model. We also conducted codon-level analysis of our model for interpretability and biological validity of our unsupervised features. Strainflow application is openly available as an interactive web-application for prospective genomic surveillance of COVID-19 across the globe.
Keywords

Full text: Available Collection: International databases Database: MEDLINE Type of study: Observational study / Prognostic study Topics: Variants Language: English Journal: Front Genet Year: 2022 Document Type: Article Affiliation country: Fgene.2022.858252

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Type of study: Observational study / Prognostic study Topics: Variants Language: English Journal: Front Genet Year: 2022 Document Type: Article Affiliation country: Fgene.2022.858252