Your browser doesn't support javascript.
loading
Comparisons of genome assembly tools for characterization of Mycobacterium tuberculosis genomes using hybrid sequencing technologies.
Trisakul, Kanwara; Hinwan, Yothin; Eisiri, Jukgarin; Salao, Kanin; Chaiprasert, Angkana; Kamolwat, Phalin; Tongsima, Sissades; Campino, Susana; Phelan, Jody; Clark, Taane G; Faksri, Kiatichai.
Affiliation
  • Trisakul K; Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand.
  • Hinwan Y; Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand.
  • Eisiri J; Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand.
  • Salao K; Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand.
  • Chaiprasert A; Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand.
  • Kamolwat P; Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand.
  • Tongsima S; Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand.
  • Campino S; Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand.
  • Phelan J; Division of Tuberculosis, Department of Disease Control, Ministry of Public Health, Bangkok, Thailand.
  • Clark TG; National Biobank of Thailand, National Center for Genetics Engineering and Biotechnology, Pathum Thani, Thailand.
  • Faksri K; Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, University of London, London, United Kingdom.
PeerJ ; 12: e17964, 2024.
Article in En | MEDLINE | ID: mdl-39221271
ABSTRACT

Background:

Next-generation sequencing of Mycobacterium tuberculosis, the infectious agent causing tuberculosis, is improving the understanding of genomic diversity of circulating lineages and strain-types, and informing knowledge of drug resistance mutations. An increasingly popular approach to characterizing M. tuberculosis genomes (size 4.4 Mbp) and variants (e.g., single nucleotide polymorphisms (SNPs)) involves the de novo assembly of sequence data.

Methods:

We compared the performance of genome assembly tools (Unicycler, RagOut, and RagTag) on sequence data from nine drug resistant M. tuberculosis isolates (multi-drug (MDR) n = 1; pre-extensively-drug (pre-XDR) n = 8) generated using Illumina HiSeq, Oxford Nanopore Technology (ONT) PromethION, and PacBio platforms.

Results:

Our investigation found that Unicycler-based assemblies had significantly higher genome completeness (~98.7%; p values = 0.01) compared to other assembler tools (RagOut = 98.6%, and RagTag = 98.6%). The genome assembly sizes (bp) across isolates and sequencers based on RagOut was significantly longer (p values < 0.001) (4,418,574 ± 8,824 bp) than Unicycler and RagTag assemblies (Unicycler = 4,377,642 ± 55,257 bp, and RagTag = 4,380,711 ± 51,164 bp). RagOut-based assemblies had the fewest contigs (~32) and the longest genome size (4,418,574 bp; vs. H37Rv reference size 4,411,532 bp) and therefore were chosen for downstream analysis. Pan-genome analysis of Illumina and PacBio hybrid assemblies revealed the greatest number of detected genes (4,639 genes; H37Rv reference contains 3,976 genes), while Illumina and ONT hybrid assemblies produced the highest number of SNPs. The number of genes from hybrid assemblies with ONT and PacBio long-reads (mean 4,620 genes) was greater than short-read assembly alone (4,478 genes). All nine RagOut hybrid genome assemblies detected known mutations in genes associated with MDR-TB and pre-XDR-TB.

Conclusions:

Unicycler software performed the best in terms of achieving contiguous genomes, whereas RagOut improved the quality of Unicycler's genome assemblies by providing a longer genome size. Overall, our approach has demonstrated that short-read and long-read hybrid assembly can provide a more complete genome assembly than short-read assembly alone by detecting pan-genomes and more genes, including IS6110, and SNPs.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome, Bacterial / High-Throughput Nucleotide Sequencing / Mycobacterium tuberculosis Limits: Humans Language: En Journal: PeerJ Year: 2024 Document type: Article Affiliation country: Thailand Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome, Bacterial / High-Throughput Nucleotide Sequencing / Mycobacterium tuberculosis Limits: Humans Language: En Journal: PeerJ Year: 2024 Document type: Article Affiliation country: Thailand Country of publication: United States