A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.
Mol Biol Evol
; 38(12): 5819-5824, 2021 12 09.
Article
in English
| MEDLINE | ID: covidwho-1381034
Preprint
This scientific journal article is probably based on a previously available preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See preprint
This scientific journal article is probably based on a previously available preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See preprint
ABSTRACT
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http//hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https//github.com/yatisht/usher, respectively.
Keywords
Full text:
Available
Collection:
International databases
Database:
MEDLINE
Main subject:
Phylogeny
/
Evolution, Molecular
/
SARS-CoV-2
Type of study:
Prognostic study
/
Randomized controlled trials
Limits:
Humans
Language:
English
Journal:
Mol Biol Evol
Journal subject:
Molecular Biology
Year:
2021
Document Type:
Article
Affiliation country:
Molbev
Similar
MEDLINE
...
LILACS
LIS