Your browser doesn't support javascript.
loading
Identification of microRNA precursors with support vector machine and string kernel / 基因组蛋白质组与生物信息学报·英文版
Genomics, Proteomics & Bioinformatics ; (4): 121-128, 2008.
Artículo en Inglés | WPRIM | ID: wpr-316991
ABSTRACT
MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%.
Asunto(s)
Texto completo: Disponible Índice: WPRIM (Pacífico Occidental) Asunto principal: Especificidad de la Especie / Inteligencia Artificial / Datos de Secuencia Molecular / Precursores del ARN / Secuencia de Bases / Química / Biología Computacional / Bases de Datos de Ácidos Nucleicos / MicroARNs / Genética Tipo de estudio: Estudio diagnóstico Límite: Animales / Humanos Idioma: Inglés Revista: Genomics, Proteomics & Bioinformatics Año: 2008 Tipo del documento: Artículo

Similares

MEDLINE

...
LILACS

LIS

Texto completo: Disponible Índice: WPRIM (Pacífico Occidental) Asunto principal: Especificidad de la Especie / Inteligencia Artificial / Datos de Secuencia Molecular / Precursores del ARN / Secuencia de Bases / Química / Biología Computacional / Bases de Datos de Ácidos Nucleicos / MicroARNs / Genética Tipo de estudio: Estudio diagnóstico Límite: Animales / Humanos Idioma: Inglés Revista: Genomics, Proteomics & Bioinformatics Año: 2008 Tipo del documento: Artículo