Your browser doesn't support javascript.
Taxonium, a web-based tool for exploring large phylogenetic trees.
Sanderson, Theo.
  • Sanderson T; The Francis Crick Institute, London, United Kingdom.
Elife ; 112022 11 15.
Article in English | MEDLINE | ID: covidwho-2203162
ABSTRACT
The COVID-19 pandemic has resulted in a step change in the scale of sequencing data, with more genomes of SARS-CoV-2 having been sequenced than any other organism on earth. These sequences reveal key insights when represented as a phylogenetic tree, which captures the evolutionary history of the virus, and allows the identification of transmission events and the emergence of new variants. However, existing web-based tools for exploring phylogenies do not scale to the size of datasets now available for SARS-CoV-2. We have developed Taxonium, a new tool that uses WebGL to allow the exploration of trees with tens of millions of nodes in the browser for the first time. Taxonium links each node to associated metadata and supports mutation-annotated trees, which are able to capture all known genetic variation in a dataset. It can either be run entirely locally in the browser, from a server-based backend, or as a desktop application. We describe insights that analysing a tree of five million sequences can provide into SARS-CoV-2 evolution, and provide a tool at cov2tree.org for exploring a public tree of more than five million SARS-CoV-2 sequences. Taxonium can be applied to any tree, and is available at taxonium.org, with source code at github.com/theosanderson/taxonium.
Since 2020, the SARS-CoV-2 virus has infected billions of people and spread to 185 countries. The virus spreads by making new copies of its genome inside human cells and exploits the cells' machinery to synthesise viral proteins it needs to infect further cells. Each time the virus copies its genetic material there's a chance that the replication process introduces an error to the genetic sequence. Over time, these mutations accumulate which can give rise to new variants with different properties. These new variants, originating from a common ancestor, may spread faster or be able to evade immune systems that have learnt to recognise previous variants. To understand where new variants of SARS-CoV-2 come from and how related they are to each other, scientists build family trees called 'phylogenetic trees' based on similarities in the genetic sequences of different variants of the virus. Looking at these trees researchers can track how a variant spreads geographically, and also attempt to identify new worrying variants that might lead to a new wave of infections. The scale of the COVID-19 pandemic together with the global effort by clinicians and researchers to sequence SARS-CoV-2 genetic material means a library of over 13 million SARS-CoV-2 genomes now exists, making it the largest such collection for any organism. Although phylogenetic trees of viruses have been studied for a long time, exploring the SARS-CoV-2 library presents technical and practical challenges due to its sheer size. Sanderson has developed an open-source web tool called Taxonium that allows users to explore phylogenetic trees with millions of sequences. With help from collaborators at the University of California, Santa Cruz, Sanderson built a website called Cov2Tree, that uses the Taxonium platform to allow immediate access to an expansive tree of all publicly available SARS-CoV-2 sequences. Cov2Tree enables users to visualise all SARS-CoV-2 genomes in a birds-eye view akin to a 'Google Earth for virus sequences' where anyone can zoom in on a related family of viruses down to the level of individual sequences. This can be used to compare variants and follow geographic spread. Using Taxonium, scientists can explore how virus sequences are related to each other. They can also see the individual mutations that have occurred at each branch of the tree, and can search for sequences based on mutation, geographical location, or other factors. Interestingly, a trend appearing in the SARS-CoV-2 phylogenetic tree is the emergence of identical mutations at different branches of the tree without a common origin. These mutations may be a result of convergent evolution, a phenomenon that occurs when a mutation appears independently in different variants as it confers an advantage to the virus making such mutations more likely to persist. This means that scientists may be able to expect certain mutations to appear in more distantly related variants if they have appeared independently in several different variants already. Overall, Taxonium is an important tool for monitoring SARS-CoV-2 genomes, but it also has broader applications. The tool can be used to browse phylogenetic trees of other viruses and organisms. Furthermore, the Taxonium website offers a way to browse a tree of life, with images and links to Wikipedia. The SARS-CoV-2 library might be the largest now, but in the future even bigger datasets will likely be available, highlighting the importance of tools like Taxonium.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: SARS-CoV-2 / COVID-19 Type of study: Observational study / Prognostic study / Randomized controlled trials Topics: Variants Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: ELife.82392

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: SARS-CoV-2 / COVID-19 Type of study: Observational study / Prognostic study / Randomized controlled trials Topics: Variants Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: ELife.82392