An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Md-Rezaul KARIM; Md-Mamunur RASHID; Byeong-Soo JEONG; Ho-Jin CHOI

Md-Rezaul KARIM; Md-Mamunur RASHID; Byeong-Soo JEONG; Ho-Jin CHOI.

Genomics & Informatics ; : 51-57, 2012.

Article em En | WPRIM | ID: wpr-155514

Biblioteca responsável: WPRO

ABSTRACT

ABSTRACT

Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

Assuntos

Sequência de Bases; Biologia Computacional; Bases de Dados de Ácidos Nucleicos; DNA; Mineração

Palavras-chave

DNA sequence; maximal contiguous frequent pattern; pattern mining; suffix tree

Texto completo

Adicionar na Minha BVS

Imprimir

XML

Buscar no Google

Texto completo: 1 Índice: WPRIM Assunto principal: DNA / Sequência de Bases / Biologia Computacional / Bases de Dados de Ácidos Nucleicos / Mineração Idioma: En Revista: Genomics & Informatics Ano de publicação: 2012 Tipo de documento: Article

Texto completo

Adicionar na Minha BVS

Imprimir

XML

Buscar no Google