Pesquisa | Portal Regional da BVS

Efficient RNA structure comparison algorithms.

Arslan, Abdullah N; Anandan, Jithendar; Fry, Eric; Monschke, Keith; Ganneboina, Nitin; Bowerman, Jason.

J Bioinform Comput Biol ; 15(6): 1740009, 2017 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-29113560

RESUMO

Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.

Assuntos

Algoritmos , Biologia Computacional/métodos , RNA/química , Bases de Dados de Ácidos Nucleicos , Conformação de Ácido Nucleico

PMBC: pattern mining from biological sequences with wildcard constraints.

Wu, Xindong; Zhu, Xingquan; He, Yu; Arslan, Abdullah N.

Comput Biol Med ; 43(5): 481-92, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-23566394

RESUMO

Patterns/subsequences frequently appearing in sequences provide essential knowledge for domain experts, such as molecular biologists, to discover rules or patterns hidden behind the data. Due to the inherent complex nature of the biological data, patterns rarely exactly reproduce and repeat themselves, but rather appear with a slightly different form in each of its appearances. A gap constraint (In this paper, a gap constraint (also referred to as a wildcard) is a character that can be substituted for any character predefined in an alphabet.) provides flexibility for users to capture useful patterns even if their appearances vary in the sequences. In order to find patterns, existing tools require users to explicitly specify gap constraints beforehand. In reality, it is often nontrivial or time-consuming for users to provide proper gap constraint values. In addition, a change made to the gap values may give completely different results, and require a separate time-consuming re-mining procedure. Therefore, it is desirable to automatically and efficiently find patterns without involving user-specified gap requirements. In this paper, we study the problem of frequent pattern mining without user-specified gap constraints and propose PMBC (namely PÌ²atternMÌ²ining from BÌ²iological sequences with wildcard C onstraints) to solve the problem. Given a sequence and a support threshold value (i.e. pattern frequency threshold), PMBC intends to discover all subsequences with their support values equal to or greater than the given threshold value. The frequent subsequences then form patterns later on. Two heuristic methods (one-way vs. two-way scans) are proposed to discover frequent subsequences and estimate their frequency in the sequences. Experimental results on both synthetic and real-world DNA sequences demonstrate the performance of both methods for frequent pattern mining and pattern frequency estimation.

Assuntos

Mineração de Dados/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , DNA/química , Humanos , Proteínas/química

A space-efficient algorithm for the constrained pairwise sequence alignment problem.

He, Dan; Arslan, Abdullah N.

Genome Inform ; 16(2): 237-46, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16901106

RESUMO

The constrained pairwise sequence alignment (CPSA) problem aims to align two given sequences by aligning their similar subsequences in the same region under the guidance of a given pattern (constraint). Let the lengths of the sequences be m, and n where n

Assuntos

Algoritmos , Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Animais , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA