Search | VHL Regional Portal

Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny.

Hunt, Martin; Hinrichs, Angie S; Anderson, Daniel; Karim, Lily; Dearlove, Bethany L; Knaggs, Jeff; Constantinides, Bede; Fowler, Philip W; Rodger, Gillian; Street, Teresa; Lumley, Sheila; Webster, Hermione; Sanderson, Theo; Ruis, Christopher; de Maio, Nicola; Amenga-Etego, Lucas N; Amuzu, Dominic S Y; Avaro, Martin; Awandare, Gordon A; Ayivor-Djanie, Reuben; Bashton, Matthew; Batty, Elizabeth M; Bediako, Yaw; De Belder, Denise; Benedetti, Estefania; Bergthaler, Andreas; Boers, Stefan A; Campos, Josefina; Carr, Rosina Afua Ampomah; Cuba, Facundo; Dattero, Maria Elena; Dejnirattisai, Wanwisa; Dilthey, Alexander; Duedu, Kwabena Obeng; Endler, Lukas; Engelmann, Ilka; Francisco, Ngiambudulu M; Fuchs, Jonas; Gnimpieba, Etienne Z; Groc, Soraya; Gyamfi, Jones; Heemskerk, Dennis; Houwaart, Torsten; Hsiao, Nei-Yuan; Huska, Matthew; Hölzer, Martin; Iranzadeh, Arash; Jarva, Hanna; Jeewandara, Chandima; Jolly, Bani.

bioRxiv ; 2024 Apr 30.

Article in English | MEDLINE | ID: mdl-38746185

ABSTRACT

The SARS-CoV-2 genome occupies a unique place in infection biology - it is the most highly sequenced genome on earth (making up over 20% of public sequencing datasets) with fine scale information on sampling date and geography, and has been subject to unprecedented intense analysis. As a result, these phylogenetic data are an incredibly valuable resource for science and public health. However, the vast majority of the data was sequenced by tiling amplicons across the full genome, with amplicon schemes that changed over the pandemic as mutations in the viral genome interacted with primer binding sites. In combination with the disparate set of genome assembly workflows and lack of consistent quality control (QC) processes, the current genomes have many systematic errors that have evolved with the virus and amplicon schemes. These errors have significant impacts on the phylogeny, and therefore over the last few years, many thousands of hours of researchers time has been spent in "eyeballing" trees, looking for artefacts, and then patching the tree. Given the huge value of this dataset, we therefore set out to reprocess the complete set of public raw sequence data in a rigorous amplicon-aware manner, and build a cleaner phylogeny. Here we provide a global tree of 3,960,704 samples, built from a consistently assembled set of high quality consensus sequences from all available public data as of March 2023, viewable at https://viridian.taxonium.org. Each genome was constructed using a novel assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed specifically to process amplicon sequence data, eliminating artefactual errors and mask the genome at low quality positions. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny. Phase 2 of our project will address the fact that the data in the public archives is heavily geographically biased towards the Global North. We therefore have contributed new raw data to ENA/SRA from many countries including Ghana, Thailand, Laos, Sri Lanka, India, Argentina and Singapore. We will incorporate these, along with all public raw data submitted between March 2023 and the current day, into an updated set of assemblies, and phylogeny. We hope the tree, consensus sequences and Viridian will be a valuable resource for researchers.

Automating the Illumina DNA library preparation kit for whole genome sequencing applications on the flowbot ONE liquid handler robot.

Meijers, Erin; Verhees, Fabienne B; Heemskerk, Dennis; Wessels, Els; Claas, Eric C J; Boers, Stefan A.

Sci Rep ; 14(1): 8159, 2024 04 08.

Article in English | MEDLINE | ID: mdl-38589623

ABSTRACT

Whole-genome sequencing (WGS) is currently making its transition from research tool into routine (clinical) diagnostic practice. The workflow for WGS includes the highly labor-intensive library preparations (LP), one of the most critical steps in the WGS procedure. Here, we describe the automation of the LP on the flowbot ONE robot to minimize the risk of human error and reduce hands-on time (HOT). For this, the robot was equipped, programmed, and optimized to perform the Illumina DNA Prep automatically. Results obtained from 16 LP that were performed both manually and automatically showed comparable library DNA yields (median of 1.5-fold difference), similar assembly quality values, and 100% concordance on the final core genome multilocus sequence typing results. In addition, reproducibility of results was confirmed by re-processing eight of the 16 LPs using the automated workflow. With the automated workflow, the HOT was reduced to 25 min compared to the 125 min needed when performing eight LPs using the manual workflow. The turn-around time was 170 and 200 min for the automated and manual workflow, respectively. In summary, the automated workflow on the flowbot ONE generates consistent results in terms of reliability and reproducibility, while significantly reducing HOT as compared to manual LP.

Subject(s)

Lipopolysaccharides , Robotics , Humans , Reproducibility of Results , High-Throughput Nucleotide Sequencing/methods , Gene Library , Whole Genome Sequencing , DNA , Workflow

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL