Search | VHL Regional Portal

Fast Phylogeny Reconstruction from Genomes of Closely Related Microbes.

Haubold, Bernhard; Klötzl, Fabian.

Methods Mol Biol ; 2242: 77-89, 2021.

Article in English | MEDLINE | ID: mdl-33961219

ABSTRACT

By tracking pathogen outbreaks using whole genome sequencing, medical microbiology is currently being transformed into genomic epidemiology. This change in technology is leading to the rapid accumulation of large samples of closely related genome sequences. Summarizing such samples into phylogenies can be computationally challenging. Our program andi quickly computes accurate pairwise distances between up to thousands of bacterial genomes. Working under the UNIX command line, we show how andi can be used to transform genomes to phylogenies with support values ready to be printed or integrated into documents.

Subject(s)

DNA, Bacterial/genetics , Escherichia coli/genetics , Genome, Bacterial , Genomics , Phylogeny , Shigella/genetics , Databases, Genetic , Research Design , Software Design , Workflow

Fur: Find unique genomic regions for diagnostic PCR.

Haubold, Bernhard; Klötzl, Fabian; Hellberg, Lars; Thompson, Daniel; Cavalar, Markus.

Bioinformatics ; 37(15): 2081-2087, 2021 Aug 09.

Article in English | MEDLINE | ID: mdl-33515232

ABSTRACT

MOTIVATION: Unique marker sequences are highly sought after in molecular diagnostics. Nevertheless, there are only few programs available to search for marker sequences, compared to the many programs for similarity search. We therefore wrote the program Fur for Finding Unique genomic Regions. RESULTS: Fur takes as input a sample of target sequences and a sample of closely related neighbors. It returns the regions present in all targets and absent from all neighbors. The recently published program genmap can also be used for this purpose and we compared it to fur. When analyzing a sample of 33 genomes representing the major phylogroups of E.coli, fur was 40 times faster than genmap but used three times more memory. On the other hand, genmap yielded three times more markers, but they were less accurate when tested in silico on a sample of 237 E.coli genomes. We also designed phylogroup-specific PCR primers based on the markers proposed by genmap and fur, and tested them by analyzing their virtual amplicons in GenBank. Finally, we used fur to design primers specific to a Lactobacillus species, and found excellent sensitivity and specificity in vitro. AVAILABILITY AND IMPLEMENTATION: Fur sources and documentation are available from https://github.com/evolbioinf/fur. The compiled software is posted as a docker container at https://hub.docker.com/r/haubold/fox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Phylonium: fast estimation of evolutionary distances from large samples of similar genomes.

Klötzl, Fabian; Haubold, Bernhard.

Bioinformatics ; 36(7): 2040-2046, 2020 04 01.

Article in English | MEDLINE | ID: mdl-31790149

ABSTRACT

MOTIVATION: Tracking disease outbreaks by whole-genome sequencing leads to the collection of large samples of closely related sequences. Five years ago, we published a method to accurately compute all pairwise distances for such samples by indexing each sequence. Since indexing is slow, we now ask whether it is possible to achieve similar accuracy when indexing only a single sequence. RESULTS: We have implemented this idea in the program phylonium and show that it is as accurate as its predecessor and roughly 100 times faster when applied to all 2678 Escherichia coli genomes contained in ENSEMBL. One of the best published programs for rapidly computing pairwise distances, mash, analyzes the same dataset four times faster but, with default settings, it is less accurate than phylonium. AVAILABILITY AND IMPLEMENTATION: Phylonium runs under the UNIX command line; its C++ sources and documentation are available from github.com/evolbioinf/phylonium. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genomics , Software , Algorithms , Genome , Sequence Analysis, DNA

hotspot: software to support sperm-typing for investigating recombination hotspots.

Odenthal-Hesse, Linda; Dutheil, Julien Y; Klötzl, Fabian; Haubold, Bernhard.

Bioinformatics ; 32(16): 2554-5, 2016 08 15.

Article in English | MEDLINE | ID: mdl-27153632

ABSTRACT

MOTIVATION: In many organisms, including humans, recombination clusters within recombination hotspots. The standard method for de novo detection of recombinants at hotspots is sperm typing. This relies on allele-specific PCR at single nucleotide polymorphisms. Designing allele-specific primers by hand is time-consuming. We have therefore written a package to support hotspot detection and analysis. RESULTS: hotspot consists of four programs: asp looks up SNPs and designs allele-specific primers; aso constructs allele-specific oligos for mapping recombinants; xov implements a maximum-likelihood method for estimating the crossover rate; six, finally, simulates typing data. AVAILABILITY AND IMPLEMENTATION: hotspot is written in C. Sources are freely available under the GNU General Public License from http://github.com/evolbioinf/hotspot/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Recombination, Genetic , Software , Spermatozoa , Alleles , Humans , Likelihood Functions , Male

Support Values for Genome Phylogenies.

Klötzl, Fabian; Haubold, Bernhard.

Life (Basel) ; 6(1)2016 Mar 07.

Article in English | MEDLINE | ID: mdl-26959064

ABSTRACT

We have recently developed a distance metric for efficiently estimating the number of substitutions per site between unaligned genome sequences. These substitution rates are called "anchor distances" and can be used for phylogeny reconstruction. Most phylogenies come with bootstrap support values, which are computed by resampling with replacement columns of homologous residues from the original alignment. Unfortunately, this method cannot be applied to anchor distances, as they are based on approximate pairwise local alignments rather than the full multiple sequence alignment necessary for the classical bootstrap. We explore two alternatives: pairwise bootstrap and quartet analysis, which we compare to classical bootstrap. With simulated sequences and 53 human primate mitochondrial genomes, pairwise bootstrap gives better results than quartet analysis. However, when applied to 29 E. coli genomes, quartet analysis comes closer to the classical bootstrap.

andi: fast and accurate estimation of evolutionary distances between closely related genomes.

Haubold, Bernhard; Klötzl, Fabian; Pfaffelhuber, Peter.

Bioinformatics ; 31(8): 1169-75, 2015 Apr 15.

Article in English | MEDLINE | ID: mdl-25504847

ABSTRACT

MOTIVATION: A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes. RESULTS: Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae. AVAILABILITY AND IMPLEMENTATION: We have implemented the computation of anchor distances in the multithreaded UNIX command-line program andi for ANchor DIstances. C sources and documentation are posted at http://github.com/evolbioinf/andi/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Biological Evolution , Genome , Genomics/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Animals , Databases, Genetic , Humans , Phylogeny

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL