Pesquisa | Portal Regional da BVS

EPIK: precise and scalable evolutionary placement with informative k-mers.

Romashchenko, Nikolai; Linard, Benjamin; Pardi, Fabio; Rivals, Eric.

Bioinformatics ; 39(12)2023 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-37975872

RESUMO

MOTIVATION: Phylogenetic placement enables phylogenetic analysis of massive collections of newly sequenced DNA, when de novo tree inference is too unreliable or inefficient. Assuming that a high-quality reference tree is available, the idea is to seek the correct placement of the new sequences in that tree. Recently, alignment-free approaches to phylogenetic placement have emerged, both to circumvent the need to align the new sequences and to avoid the calculations that typically follow the alignment step. A promising approach is based on the inference of k-mers that can be potentially related to the reference sequences, also called phylo-k-mers. However, its usage is limited by the time and memory-consuming stage of reference data preprocessing and the large numbers of k-mers to consider. RESULTS: We suggest a filtering method for selecting informative phylo-k-mers based on mutual information, which can significantly improve the efficiency of placement, at the cost of a small loss in placement accuracy. This method is implemented in IPK, a new tool for computing phylo-k-mers that significantly outperforms the software previously available. We also present EPIK, a new software for phylogenetic placement, supporting filtered phylo-k-mer databases. Our experiments on real-world data show that EPIK is the fastest phylogenetic placement tool available, when placing hundreds of thousands and millions of queries while still providing accurate placements. AVAILABILITY AND IMPLEMENTATION: IPK and EPIK are freely available at https://github.com/phylo42/IPK and https://github.com/phylo42/EPIK. Both are implemented in C++ and Python and supported on Linux and MacOS.

Assuntos

Algoritmos , Software , Filogenia , Análise de Sequência de DNA , Sequência de Bases

Computing Phylo- k-Mers.

Romashchenko, Nikolai; Linard, Benjamin; Rivals, Eric; Pardi, Fabio.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2889-2897, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37204943

RESUMO

Finding the correct position of new sequences within an established phylogenetic tree is an increasingly relevant problem in evolutionary bioinformatics and metagenomics. Recently, alignment-free approaches for this task have been proposed. One such approach is based on the concept of phylogenetically-informative k-mers or phylo- k-mers for short. In practice, phylo- k-mers are inferred from a set of related reference sequences and are equipped with scores expressing the probability of their appearance in different locations within the input reference phylogeny. Computing phylo- k-mers, however, represents a computational bottleneck to their applicability in real-world problems such as the phylogenetic analysis of metabarcoding reads and the detection of novel recombinant viruses. Here we consider the problem of phylo- k-mer computation: how can we efficiently find all k-mers whose probability lies above a given threshold for a given tree node? We describe and analyze algorithms for this problem, relying on branch-and-bound and divide-and-conquer techniques. We exploit the redundancy of adjacent windows of the alignment to save on computation. Besides computational complexity analyses, we provide an empirical evaluation of the relative performance of their implementations on simulated and real-world data. The divide-and-conquer algorithms are found to surpass the branch-and-bound approach, especially when many phylo- k-mers are found.

Rapid screening and detection of inter-type viral recombinants using phylo-k-mers.

Scholz, Guillaume E; Linard, Benjamin; Romashchenko, Nikolai; Rivals, Eric; Pardi, Fabio.

Bioinformatics ; 36(22-23): 5351-5360, 2021 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-33331849

RESUMO

MOTIVATION: Novel recombinant viruses may have important medical and evolutionary significance, as they sometimes display new traits not present in the parental strains. This is particularly concerning when the new viruses combine fragments coming from phylogenetically distinct viral types. Here, we consider the task of screening large collections of sequences for such novel recombinants. A number of methods already exist for this task. However, these methods rely on complex models and heavy computations that are not always practical for a quick scan of a large number of sequences. RESULTS: We have developed SHERPAS, a new program to detect novel recombinants and provide a first estimate of their parental composition. Our approach is based on the precomputation of a large database of 'phylogenetically-informed k-mers', an idea recently introduced in the context of phylogenetic placement in metagenomics. Our experiments show that SHERPAS is hundreds to thousands of times faster than existing software, and enables the analysis of thousands of whole genomes, or long-sequencing reads, within minutes or seconds, and with limited loss of accuracy. AVAILABILITY AND IMPLEMENTATION: The source code is freely available for download at https://github.com/phylo42/sherpas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PEWO: a collection of workflows to benchmark phylogenetic placement.

Linard, Benjamin; Romashchenko, Nikolai; Pardi, Fabio; Rivals, Eric.

Bioinformatics ; 36(21): 5264-5266, 2021 01 29.

Artigo em Inglês | MEDLINE | ID: mdl-32697844

RESUMO

MOTIVATION: Phylogenetic placement (PP) is a process of taxonomic identification for which several tools are now available. However, it remains difficult to assess which tool is more adapted to particular genomic data or a particular reference taxonomy. We developed Placement Evaluation WOrkflows (PEWO), the first benchmarking tool dedicated to PP assessment. Its automated workflows can evaluate PP at many levels, from parameter optimization for a particular tool, to the selection of the most appropriate genetic marker when PP-based species identifications are targeted. Our goal is that PEWO will become a community effort and a standard support for future developments and applications of PP. AVAILABILITY AND IMPLEMENTATION: https://github.com/phylo42/PEWO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Benchmarking , Software , Genoma , Filogenia , Fluxo de Trabalho

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA