Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 59, 2023 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-36814208

RESUMO

BACKGROUND: Protein-protein interaction (PPI) data is an important type of data used in functional genomics. However, high-throughput experiments are often insufficient to complete the PPI interactome of different organisms. Computational techniques are thus used to infer missing data, with link prediction being one such approach that uses the structure of the network of PPIs known so far to identify non-edges whose addition to the network would make it more sound, according to some underlying assumptions. Recently, a new idea called the L3 principle introduced biological motivation into PPI link predictions, yielding predictors that are superior to general-purpose link predictors for complex networks. Interestingly, the L3 principle can be interpreted in another way, so that other signatures of PPI networks can also be characterized for PPI predictions. This alternative interpretation uncovers candidate PPIs that the current L3-based link predictors may not be able to fully capture, underutilizing the L3 principle. RESULTS: In this article, we propose a formulation of link predictors that we call NormalizedL3 (L3N) which addresses certain missing elements within L3 predictors in the perspective of network modeling. Our computational validations show that the L3N predictors are able to find missing PPIs more accurately (in terms of true positives among the predicted PPIs) than the previously proposed methods on several datasets from the literature, including BioGRID, STRING, MINT, and HuRI, at the cost of using more computation time in some of the cases. In addition, we found that L3-based link predictors (including L3N) ranked a different pool of PPIs higher than the general-purpose link predictors did. This suggests that different types of PPIs can be predicted based on different topological assumptions, and that even better PPI link predictors may be obtained in the future by improved network modeling.


Assuntos
Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Genômica
2.
J Comput Biol ; 26(9): 893-907, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30990336

RESUMO

The previous fastest algorithm for computing the rooted triplet distance between two input galled trees (i.e., phylogenetic networks whose cycles are vertex-disjoint) runs in [Formula: see text] time, where n is the cardinality of the leaf label set. In this article, we present an [Formula: see text]-time solution. Our strategy is to transform the input so that the answer can be obtained by applying an existing [Formula: see text]-time algorithm for the simpler case of two phylogenetic trees a constant number of times. The new algorithm has been implemented, and applying it to pairs of randomly generated galled trees with up to [Formula: see text] leaves confirms that it is fast in practice.


Assuntos
Algoritmos , Biologia Computacional/métodos , Filogenia , Biologia Computacional/normas
3.
PLoS One ; 13(4): e0195545, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29698482

RESUMO

The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Algoritmos , Biologia Computacional , Simulação por Computador
4.
J Comput Biol ; 25(7): 740-754, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29451395

RESUMO

The [Formula: see text] Consistency problem takes as input two sets [Formula: see text] and [Formula: see text] of resolved triplets and two sets [Formula: see text] and [Formula: see text] of fan triplets, and asks for a distinctly leaf-labeled tree that contains all elements in [Formula: see text] and no elements in [Formula: see text] as embedded subtrees, if such a tree exists. This article presents a detailed characterization of how the computational complexity of the problem changes under various restrictions. Our main result is an efficient algorithm for dense inputs satisfying [Formula: see text] whose running time is linear in the size of the input and therefore optimal.


Assuntos
Biologia Computacional/estatística & dados numéricos , Filogenia , Algoritmos , Humanos , Modelos Genéticos
5.
Artigo em Inglês | MEDLINE | ID: mdl-27662679

RESUMO

This article presents two new deterministic algorithms for constructing consensus trees. Given an input of  phylogenetic trees with identical leaf label sets and  leaves each, the first algorithm constructs the majority rule (+) consensus tree in time, which is optimal since the input size is , and the second one constructs the frequency difference consensus tree in time.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Análise por Conglomerados
6.
J Comput Biol ; 24(2): 106-126, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27983874

RESUMO

The rooted triplet distance is a measure of the dissimilarity of two phylogenetic trees with identical leaf label sets. An algorithm by Brodal et al. that computes it in [Formula: see text] time and [Formula: see text] space, where n is the number of leaf labels, has recently been implemented in the software package tqDist. In this article, we show that replacing the hierarchical decomposition tree used in Brodal et al.'s algorithm by a centroid paths-based data structure yields an [Formula: see text]-time and [Formula: see text]-space algorithm that, although slower in theory, is faster in practice as well as less memory consuming. Simulations for values of n up to 4,000,000 support our claims experimentally.


Assuntos
Algoritmos , Modelos Genéticos , Filogenia , Simulação por Computador , Software
7.
J Comput Biol ; 19(9): 1073-88, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22963134

RESUMO

A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.


Assuntos
Algoritmos , Filogenia , Animais , Biologia Computacional , Evolução Molecular , Humanos , Modelos Genéticos
8.
BMC Bioinformatics ; 12: 8, 2011 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-21211059

RESUMO

BACKGROUND: To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads) are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. RESULTS: We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. CONCLUSIONS: The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results.


Assuntos
Bactérias/classificação , Biologia Computacional/métodos , Metagenômica , Análise de Sequência de DNA/métodos , Algoritmos , Bactérias/genética , DNA Bacteriano/genética
9.
Artigo em Inglês | MEDLINE | ID: mdl-20733243

RESUMO

We investigate the computational complexity of inferring a smallest possible multilabeled phylogenetic tree (MUL tree) which is consistent with each of the rooted triplets in a given set. This problem has not been studied previously in the literature. We prove that even the very restricted case of determining if there exists a MUL tree consistent with the input and having just one leaf duplication is an NP-hard problem. Furthermore, we show that the general minimization problem is difficult to approximate, although a simple polynomial-time approximation algorithm achieves an approximation ratio close to our derived inapproximability bound. Finally, we provide an exact algorithm for the problem running in exponential time and space. As a by-product, we also obtain new, strong inapproximability results for two partitioning problems on directed graphs called ACYCLIC PARTITION and ACYCLIC TREE-PARTITION.


Assuntos
Algoritmos , Biologia Computacional/métodos , Filogenia , Modelos Genéticos , Modelos Estatísticos
10.
Algorithms Mol Biol ; 5: 7, 2010 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-20047663

RESUMO

BACKGROUND: Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. RESULTS: We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (i.e., insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels k is bounded by a constant. Our algorithm solves the above problem for a query of size m and a database of size N in average-case O(N) time, whereas the time complexity of the previously best algorithm was O(Nm(k+1)). CONCLUSIONS: Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.

11.
Pac Symp Biocomput ; : 3-9, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-19908352

RESUMO

Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment of short reads can be made by mapping each read to the node in the reference taxonomy that provides the best precision and recall. We show that given a suffix array for the sequences in the reference taxonomy, a short read can be mapped to the node of the reference taxonomy with the best combined value of precision and recall in time linear in the size of the taxonomy subtree rooted at the lowest common ancestor of the matching sequences. An accurate taxonomic assignment of short reads can thus be made with about the same efficiency as when mapping each read to the lowest common ancestor of all matching sequences in a reference taxonomy. We demonstrate the effectiveness of our approach on several metagenomic datasets of marine and gut microbiota.


Assuntos
Bactérias/classificação , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Biologia Computacional , Sistema Digestório/microbiologia , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Metagenoma/genética , Metagenômica/métodos , Metagenômica/estatística & dados numéricos , Filogenia , RNA Bacteriano/genética , RNA Ribossômico 16S/genética , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/estatística & dados numéricos
12.
J Bioinform Comput Biol ; 4(4): 807-32, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17007069

RESUMO

Given a distance matrix M that specifies the pairwise evolutionary distances between n species, the phylogenetic tree reconstruction problem asks for an edge-weighted phylogenetic tree that satisfies M, if one exists. We study some extensions of this problem to rooted phylogenetic networks. Our main result is an O(n(2) log n)-time algorithm for determining whether there is an ultrametric galled network that satisfies M, and if so, constructing one. In fact, if such an ultrametric galled network exists, our algorithm is guaranteed to construct one containing the minimum possible number of nodes with more than one parent (hybrid nodes). We also prove that finding a largest possible submatrix M' of M such that there exists an ultrametric galled network that satisfies M' is NP-hard. Furthermore, we show that given an incomplete distance matrix (i.e. where some matrix entries are missing), it is also NP-hard to determine whether there exists an ultrametric galled network which satisfies it.


Assuntos
Algoritmos , Evolução Biológica , Mapeamento Cromossômico/métodos , Evolução Molecular , Modelos Genéticos , Filogenia , Locos de Características Quantitativas/genética , Análise de Sequência de DNA/métodos , Simulação por Computador
13.
J Comput Biol ; 13(3): 702-18, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16706720

RESUMO

RNA molecules whose secondary structures contain similar substructures often have similar functions. Therefore, an important task in the study of RNA is to develop methods for discovering substructures in RNA secondary structures that occur frequently (also referred to as motifs). In this paper, we consider the problem of computing an optimal local alignment of two given labeled ordered forests F1 and F2. This problem asks for a substructure of F1 and a substructure of F2 that exhibit a high similarity. Since an RNA molecule's secondary structure can be represented as a labeled ordered forest, the problem we study has a direct application to finding potential motifs. We generalize the previously studied concept of a closed subforest to a gapped subforest and present the first algorithm for computing the optimal local gapped subforest alignment of F1 and F2. We also show that our technique can improve the time and space complexity of the previously most efficient algorithm for optimal local closed subforest alignment. Furthermore, we prove that a special case of our local gapped subforest alignment problem is equivalent to a problem known in the literature as the local sequence-structure alignment problem (lssa) and modify our main algorithm to obtain a much faster algorithm for lssa than the one previously proposed. An implementation of our algorithm is available at www.comp.nus.edu.sg/~bioinfo/LGSFAligner/. Its running time is significantly faster than the original lssa program.


Assuntos
Conformação de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de RNA , Software
14.
J Bioinform Comput Biol ; 4(1): 59-74, 2006 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-16568542

RESUMO

To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set C and none of the rooted triplets in another given set F. Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.


Assuntos
Biologia Computacional , Filogenia , Algoritmos , DNA/genética , Evolução Molecular , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...