Your browser doesn't support javascript.
A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.
McBroome, Jakob; Thornlow, Bryan; Hinrichs, Angie S; Kramer, Alexander; De Maio, Nicola; Goldman, Nick; Haussler, David; Corbett-Detig, Russell; Turakhia, Yatish.
  • McBroome J; Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
  • Thornlow B; Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
  • Hinrichs AS; Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
  • Kramer A; Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
  • De Maio N; Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
  • Goldman N; Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
  • Haussler D; Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
  • Corbett-Detig R; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom.
  • Turakhia Y; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom.
Mol Biol Evol ; 38(12): 5819-5824, 2021 12 09.
Article in English | MEDLINE | ID: covidwho-1381034
Preprint
This scientific journal article is probably based on a previously available preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See preprint
ABSTRACT
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http//hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https//github.com/yatisht/usher, respectively.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Phylogeny / Evolution, Molecular / SARS-CoV-2 Type of study: Prognostic study / Randomized controlled trials Limits: Humans Language: English Journal: Mol Biol Evol Journal subject: Molecular Biology Year: 2021 Document Type: Article Affiliation country: Molbev

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Phylogeny / Evolution, Molecular / SARS-CoV-2 Type of study: Prognostic study / Randomized controlled trials Limits: Humans Language: English Journal: Mol Biol Evol Journal subject: Molecular Biology Year: 2021 Document Type: Article Affiliation country: Molbev