Your browser doesn't support javascript.
loading
Diversity and selection of SARS-CoV-2 minority variants in the early New York City outbreak
Allison Roder; Katherine E Johnson; Marissa Knoll; Mohammed Khalfan; Bessie Wang; Stephanie Banakis; Allie Kreitman; Christopher Mederos; Jung-Ho Youn; Rachel Mercado; Wei Wang; Denis Ruchnewitz; Marie I Samanovic; Mark J Mulligan; Michael Laessig; Marta Luksza; Sanchita Das; David Gresham; Elodie Ghedin.
Affiliation
  • Allison Roder; Systems Genomics Section, Laboratory of Parasitic Diseases, National Institutes of Allergy and Infectious Diseases
  • Katherine E Johnson; Center for Genomics and Systems Biology, Department of Biology, New York University
  • Marissa Knoll; Center for Genomics and Systems Biology, Department of Biology, New York University
  • Mohammed Khalfan; Center for Genomics and Systems Biology, Department of Biology, New York University
  • Bessie Wang; New York University
  • Stephanie Banakis; Systems Genomics Section, Laboratory of Parasitic Diseases, National Institutes of Allergy and Infectious Diseases
  • Allie Kreitman; National Institutes of Health
  • Christopher Mederos; National Institutes of Health
  • Jung-Ho Youn; National Institutes of Health
  • Rachel Mercado; National Institutes of Health
  • Wei Wang; J Craig Venter Institute
  • Denis Ruchnewitz; Institute for Biological Physics, University of Cologne
  • Marie I Samanovic; NYU Grossman School of Medicine
  • Mark J Mulligan; New York University Langone Medical Center
  • Michael Laessig; Institute for Biological Physics, University of Cologne
  • Marta Luksza; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai
  • Sanchita Das; National Institutes of Health
  • David Gresham; New York University
  • Elodie Ghedin; Systems Genomics Section, Laboratory of Parasitic Diseases, National Institutes of Allergy and Infectious Diseases
Preprint in English | bioRxiv | ID: ppbiorxiv-442873
ABSTRACT
High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller, and use of replicate sequencing have the most significant impact on single nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false negative rates. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intrahost viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intrahost variation, viral diversity, and viral evolution. IMPORTANCEWhen viruses replicate inside a host, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus, nor strongly beneficial, can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in inclusion of false positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant calling tools. We used simulated and synthetic data to test their performance against a true set of variants, and then used these studies to inform variant identification in data from clinical SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2021 Document type: Preprint
Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2021 Document type: Preprint
...