Búsqueda | Portal Regional de la BVS

DCJ-RNA - double cut and join for RNA secondary structures.

Badr, Ghada H; Al-Aqel, Haifa A.

BMC Bioinformatics ; 18(Suppl 12): 427, 2017 Oct 16.

Artículo en Inglés | MEDLINE | ID: mdl-29072145

RESUMEN

BACKGROUND: Genome rearrangements are essential processes for evolution and are responsible for existing varieties of genome architectures. Many studies have been conducted to obtain an algorithm that identifies the minimum number of inversions that are necessary to transform one genome into another; this allows for genome sequence representation in polynomial time. Studies have not been conducted on the topic of rearranging a genome when it is represented as a secondary structure. Unlike sequences, the secondary structure preserves the functionality of the genome. Sequences can be different, but they all share the same structure and, therefore, the same functionality. RESULTS: This paper proposes a double cut and join for RNA secondary structures (DCJ-RNA) algorithm. This algorithm allows for the description of evolutionary scenarios that are based on secondary structures rather than sequences. The main aim of this paper is to suggest an efficient algorithm that can help researchers compare two ribonucleic acid (RNA) secondary structures based on rearrangement operations. The results, which are based on real datasets, show that the algorithm is able to count the minimum number of rearrangement operations, as well as to report an optimum scenario that can increase the similarity between the two structures. CONCLUSION: The algorithm calculates the distance between structures and reports a scenario based on the minimum rearrangement operations required to make the given structure similar to the other. DCJ-RNA can also be used to measure the distance between the two structures. This can help identify the common functionalities between different species.

Asunto(s)

Conformación de Ácido Nucleico , ARN/química , Algoritmos , Secuencia de Bases , Inversión Cromosómica , Bases de Datos de Ácidos Nucleicos , Modelos Genéticos , Factores de Tiempo

TrieAMD: a scalable and efficient apriori motif discovery approach.

Al-Turaiki, Isra; Badr, Ghada; Mathkour, Hassan.

Int J Data Min Bioinform ; 13(1): 13-30, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26529905

RESUMEN

Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori algorithm and use the Trie data structure to discover motifs. We propose several modifications so that we can adapt the classic Apriori to our problem. Experiments are conducted on Tompa's benchmark to investigate the performance of our proposed algorithm, the Trie-based Apriori Motif Discovery (TrieAMD). Results show that our algorithm outperforms all of the tested tools on real datasets for the average sensitivity measure, which means that our approach is able to discover more motifs. In terms of specificity, the performance of our algorithm is comparable to the other tools. The results also confirm both linear time and linear space scalability of the algorithm.

Asunto(s)

Algoritmos , Minería de Datos/métodos , Bases de Datos de Proteínas , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Secuencias de Aminoácidos , Proteínas/química

mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

Alshamlan, Hala; Badr, Ghada; Alohali, Yousef.

Biomed Res Int ; 2015: 604910, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-25961028

RESUMEN

An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

Asunto(s)

Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica/genética , Neoplasias/clasificación , Máquina de Vectores de Soporte , Algoritmos , Humanos , Análisis por Micromatrices/métodos , Neoplasias/genética

Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A.

Comput Biol Chem ; 56: 49-60, 2015 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-25880524

RESUMEN

Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

Asunto(s)

Algoritmos , Inteligencia Artificial , Neoplasias/genética , Humanos , Neoplasias/clasificación , Análisis de Secuencia por Matrices de Oligonucleótidos , Transcriptoma

IncMD: incremental trie-based structural motif discovery algorithm.

Badr, Ghada; Al-Turaiki, Isra; Turcotte, Marcel; Mathkour, Hassan.

J Bioinform Comput Biol ; 12(5): 1450027, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-25362841

RESUMEN

The discovery of common RNA secondary structure motifs is an important problem in bioinformatics. The presence of such motifs is usually associated with key biological functions. However, the identification of structural motifs is far from easy. Unlike motifs in sequences, which have conserved bases, structural motifs have common structure arrangements even if the underlying sequences are different. Over the past few years, hundreds of algorithms have been published for the discovery of sequential motifs, while less work has been done for the structural motifs case. Current structural motif discovery algorithms are limited in terms of accuracy and scalability. In this paper, we present an incremental and scalable algorithm for discovering RNA secondary structure motifs, namely IncMD. We consider the structural motif discovery as a frequent pattern mining problem and tackle it using a modified a priori algorithm. IncMD uses data structures, trie-based linked lists of prefixes (LLP), to accelerate the search and retrieval of patterns, support counting, and candidate generation. We modify the candidate generation step in order to adapt it to the RNA secondary structure representation. IncMD constructs the frequent patterns incrementally from RNA secondary structure basic elements, using nesting and joining operations. The notion of a motif group is introduced in order to simulate an alignment of motifs that only differ in the number of unpaired bases. In addition, we use a cluster beam approach to select motifs that will survive to the next iterations of the search. Results indicate that IncMD can perform better than some of the available structural motif discovery algorithms in terms of sensitivity (Sn), positive predictive value (PPV), and specificity (Sp). The empirical results also show that the algorithm is scalable and runs faster than all of the compared algorithms.

Asunto(s)

Algoritmos , Conformación de Ácido Nucleico , ARN/química , Secuencia de Bases , Biología Computacional , Simulación por Computador , Minería de Datos , Bases de Datos de Ácidos Nucleicos , Modelos Moleculares

Classification and assessment tools for structural motif discovery algorithms.

Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan.

BMC Bioinformatics ; 14 Suppl 9: S4, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23902564

RESUMEN

BACKGROUND: Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. METHODS: In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. RESULTS: Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. CONCLUSIONS: We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Motivos de Nucleótidos , Análisis de Secuencia de ARN/métodos , Secuencia Conservada , Modelos Estadísticos

Listing all parsimonious reversal sequences: new algorithms and perspectives.

Badr, Ghada; Swenson, Krister M; Sankoff, David.

J Comput Biol ; 18(9): 1201-10, 2011 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-21899425

RESUMEN

In comparative genomics studies, finding a minimum length sequences of reversals, so-called sorting by reversals, has been the topic of a huge literature. Since there are many minimum length sequences, another important topic has been the problem of listing all parsimonious sequences between two genomes, called the All Sorting Sequences by Reversals (ASSR) problem. In this article, we revisit the ASSR problem for uni-chromosomal genomes when no duplications are allowed and when the relative order of the genes is known. We put the current body of work in perspective by illustrating the fundamental framework that is common for all of them, a perspective that allows us for the first time to theoretically compare their running times. The article also proposes an improved framework that empirically speeds up all known algorithms.

Asunto(s)

Algoritmos , Simulación por Computador , Modelos Genéticos , Análisis de Secuencia de ADN/métodos

Listing all sorting reversals in quadratic time.

Swenson, Krister M; Badr, Ghada; Sankoff, David.

Algorithms Mol Biol ; 6: 11, 2011 Apr 19.

Artículo en Inglés | MEDLINE | ID: mdl-21504604

RESUMEN

We describe an average-case O(n2) algorithm to list all reversals on a signed permutation π that, when applied to π, produce a permutation that is closer to the identity. This algorithm is optimal in the sense that, the time it takes to write the list is Ω(n2) in the worst case.

On optimizing syntactic pattern recognition using tries and AI-based heuristic-search strategies.

Badr, Ghada; Oommen, B John.

IEEE Trans Syst Man Cybern B Cybern ; 36(3): 611-22, 2006 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-16761814

RESUMEN

This paper deals with the problem of estimating, using enhanced artificial-intelligence (AI) techniques, a transmitted string X* by processing the corresponding string Y, which is a noisy version of X*. It is assumed that Y contains substitution, insertion, and deletion (SID) errors. The best estimate X+ of X* is defined as that element of a dictionary H that minimizes the generalized Levenshtein distance (GLD) D (X, Y) between X and Y, for all X epsilon H. In this paper, it is shown how to evaluate D (X, Y) for every X epsilon H simultaneously, when the edit distances are general and the maximum number of errors is not given a priori, and when H is stored as a trie. A new scheme called clustered beam search (CBS) is first introduced, which is a heuristic-based search approach that enhances the well-known beam-search (BS) techniques used in AI. The new scheme is then applied to the approximate string-matching problem when the dictionary is stored as a trie. The new technique is compared with the benchmark depth-first search (DFS) trie-based technique (with respect to time and accuracy) using large and small dictionaries. The results demonstrate a marked improvement of up to 75% with respect to the total number of operations needed on three benchmark dictionaries, while yielding an accuracy comparable to the optimal. Experiments are also done to show the benefits of the CBS over the BS when the search is done on the trie. The results also demonstrate a marked improvement (more than 91%) for large dictionaries.

Asunto(s)

Algoritmos , Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Lenguaje , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Software de Reconocimiento del Habla

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA