Search | VHL Regional Portal

A clique-based method using dynamic programming for computing edit distance between unordered trees.

Mori, Tomoya; Tamura, Takeyuki; Fukagawa, Daiji; Takasu, Atsuhiro; Tomita, Etsuji; Akutsu, Tatsuya.

J Comput Biol ; 19(10): 1089-104, 2012 Oct.

Article in English | MEDLINE | ID: mdl-23057820

ABSTRACT

Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees. This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structures. For large tree-structures, the improved method is much faster than the previous method. In particular, for hard instances, the improved method achieved more than 100 times speed-up.

Subject(s)

Carbohydrate Conformation , Polysaccharides/chemistry , Polysaccharides/genetics , Software , Nucleic Acid Conformation , RNA/chemistry , RNA/genetics

A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures.

Fukagawa, Daiji; Tamura, Takeyuki; Takasu, Atsuhiro; Tomita, Etsuji; Akutsu, Tatsuya.

BMC Bioinformatics ; 12 Suppl 1: S13, 2011 Feb 15.

Article in English | MEDLINE | ID: mdl-21342542

ABSTRACT

BACKGROUND: Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees. RESULTS: In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search. CONCLUSIONS: The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request.

Subject(s)

Algorithms , Computational Biology/methods , Polysaccharides/analysis , Software

Clique-based data mining for related genes in a biomedical database.

Matsunaga, Tsutomu; Yonemori, Chikara; Tomita, Etsuji; Muramatsu, Masaaki.

BMC Bioinformatics ; 10: 205, 2009 Jul 01.

Article in English | MEDLINE | ID: mdl-19566964

ABSTRACT

BACKGROUND: Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph. RESULTS: We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes. CONCLUSION: We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.

Subject(s)

Computational Biology/methods , Databases, Genetic , Information Storage and Retrieval/methods

Protein threading with profiles and distance constraints using clique based algorithms.

Dukka, Bahadur K C; Tomita, Etsuji; Suzuki, Jun'ichi; Horimoto, Katsuhisa; Akutsu, Tatsuya.

J Bioinform Comput Biol ; 4(1): 19-42, 2006 Feb.

Article in English | MEDLINE | ID: mdl-16568540

ABSTRACT

With the advent of experimental technologies like chemical cross-linking, it has become possible to obtain distances between specific residues of a newly sequenced protein. These types of experiments usually are less time consuming than X-ray crystallography or NMR. Consequently, it is highly desired to develop a method that incorporates this distance information to improve the performance of protein threading methods. However, protein threading with profiles in which constraints on distances between residues are given is known to be NP-hard. By using the notion of a maximum edge-weight clique finding algorithm, we introduce a more efficient method called FTHREAD for profile threading with distance constraints that is 18 times faster than its predecessor CLIQUETHREAD. Moreover, we also present a novel practical algorithm NTHREAD for profile threading with Non-strict constraints. The overall performance of FTHREAD on a data set shows that although our algorithm uses a simple threading function, our algorithm performs equally well as some of the existing methods. Particularly, when there are some unsatisfied constraints, NTHREAD (Non-strict constraints threading algorithm) performs better than threading with FTHREAD (Strict constraints threading algorithm). We have also analyzed the effects of using a number of distance constraints. This algorithm helps the enhancement of alignment quality between the query sequence and template structure, once the corresponding template structure is determined for the target sequence.

Subject(s)

Algorithms , Proteins/chemistry , Amino Acid Sequence , Computational Biology , Cross-Linking Reagents , Molecular Structure , Proteins/genetics

Multiple methods for protein side chain packing using maximum weight cliques.

Brown, J B; Dukka Bahadur, K C; Tomita, Etsuji; Akutsu, Tatsuya.

Genome Inform ; 17(1): 3-12, 2006.

Article in English | MEDLINE | ID: mdl-17503351

ABSTRACT

In this paper, we present several methods for computing a solution to the protein side chain packing problem, with all methods having a common solution approach of breaking the polymer into subpolymers and using maximum edge weight cliques to prune the search space for the optimal side chain packing. We characterize the graph sizes generated for each method and compare their prediction accuracies. These methods are demonstrated for computing proteins up to approximately 8000 residues. In addition, we update a result published previously.

Subject(s)

Computational Biology/methods , Oligopeptides/chemistry , Proteins/chemistry , Algorithms , Amino Acids/chemistry , Models, Chemical , Models, Molecular , Predictive Value of Tests , Protein Conformation

Protein side-chain packing problem: a maximum edge-weight clique algorithmic approach.

Dukka Bahadur, K C; Tomita, Etsuji; Suzuki, Jun'ichi; Akutsu, Tatsuya.

J Bioinform Comput Biol ; 3(1): 103-26, 2005 Feb.

Article in English | MEDLINE | ID: mdl-15751115

ABSTRACT

"Protein Side-chain Packing" has an ever-increasing application in the field of bio-informatics, dating from the early methods of homology modeling to protein design and to the protein docking. However, this problem is computationally known to be NP-hard. In this regard, we have developed a novel approach to solve this problem using the notion of a maximum edge-weight clique. Our approach is based on efficient reduction of protein side-chain packing problem to a graph and then solving the reduced graph to find the maximum clique by applying an efficient clique finding algorithm developed by our co-authors. Since our approach is based on deterministic algorithms in contrast to the various existing algorithms based on heuristic approaches, our algorithm guarantees of finding an optimal solution. We have tested this approach to predict the side-chain conformations of a set of proteins and have compared the results with other existing methods. We have found that our results are favorably comparable or better than the results produced by the existing methods. As our test set contains a protein of 494 residues, we have obtained considerable improvement in terms of size of the proteins and in terms of the efficiency and the accuracy of prediction.

Subject(s)

Algorithms , Crystallography/methods , Models, Chemical , Models, Molecular , Proteins/analysis , Proteins/chemistry , Sequence Analysis, Protein/methods , Computer Simulation , Likelihood Functions , Protein Conformation , Protein Structure, Secondary , Sequence Alignment/methods , Sequence Homology, Amino Acid

Point matching under non-uniform distortions and protein side chain packing based on an efficient maximum clique algorithm.

Dukka, Bahadur K C; Akutsu, Tatsuya; Tomita, Etsuji; Seki, Tomokazu; Fujiyama, Asao.

Genome Inform ; 13: 143-52, 2002.

Article in English | MEDLINE | ID: mdl-14571383

ABSTRACT

We developed maximum clique-based algorithms for spot matching for two-dimensional gel electrophoresis images, protein structure alignment and protein side-chain packing, where these problems are known to be NP-hard. Algorithms based on direct reductions to the maximum clique can find optimal solutions for instances of size (the number of points or residues) up to 50-150 using a standard PC. We also developed pre-processing techniques to reduce the sizes of graphs. Combined with some heuristics, many realistic instances can be solved approximately.

Subject(s)

Algorithms , Data Interpretation, Statistical , Protein Structure, Tertiary , Sequence Analysis, Protein/methods , Animals , Computational Biology/methods , Electrophoresis, Gel, Two-Dimensional/methods , Humans , Sequence Alignment/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL