Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters











Database
Language
Publication year range
1.
Genet Mol Res ; 14(1): 123-33, 2015 Jan 15.
Article in English | MEDLINE | ID: mdl-25729943

ABSTRACT

Imbalances typically exist in bioinformatics and are also common in other areas. A drawback of traditional machine learning methods is the relatively little attention given to small sample classification. Thus, we developed imDC, which uses an ensemble learning concept in combination with weights and sample misclassification information to effectively classify imbalanced data. Our method showed better results when compared to other algorithms with UCI machine learning datasets and microRNA data.


Subject(s)
Algorithms , Machine Learning , MicroRNAs/genetics , Databases, Genetic , MicroRNAs/metabolism , ROC Curve , Reproducibility of Results
2.
Genet Mol Res ; 11(3): 1909-22, 2012 Jul 19.
Article in English | MEDLINE | ID: mdl-22869546

ABSTRACT

Reconstructing the evolutionary history of a set of species is an elementary problem in biology, and methods for solving this problem are evaluated based on two characteristics: accuracy and efficiency. Neighbor-joining reconstructs phylogenetic trees by iteratively picking a pair of nodes to merge as a new node until only one node remains; due to its good accuracy and speed, it has been embraced by the phylogeny research community. With the advent of large amounts of data, improved fast and precise methods for reconstructing evolutionary trees have become necessary. We improved the neighbor-joining algorithm by iteratively picking two pairs of nodes and merging as two new nodes, until only one node remains. We found that another pair of true neighbors could be chosen to merge as a new node besides the pair of true neighbors chosen by the criterion of the neighbor-joining method, in each iteration of the clustering procedure for the purely additive tree. These new neighbors will be selected by another iteration of the neighbor-joining method, so that they provide an improved neighbor-joining algorithm, by iteratively picking two pairs of nodes to merge as two new nodes until only one node remains, constructing the same phylogenetic tree as the neighbor-joining algorithm for the same input data. By combining the improved neighbor-joining algorithm with styles upper bound computation optimization of RapidNJ and external storage of ERapidNJ methods, a new method of reconstructing phylogenetic trees, FastJoin, was proposed. Experiments with sets of data showed that this new neighbor-joining algorithm yields a significant speed-up compared to classic neighbor-joining, showing empirically that FastJoin is superior to almost all other neighbor-joining implementations.


Subject(s)
Algorithms , Computational Biology/methods , Phylogeny , Databases, Genetic
3.
Genet Mol Res ; 10(3): 1986-98, 2011 Sep 09.
Article in English | MEDLINE | ID: mdl-21948761

ABSTRACT

We propose a novel representation of RNA secondary structure for a quick comparison of different structures. Secondary structure was viewed as a set of stems and each stem was represented by two values according to its position. Using this representation, we improved the comparative sequence analysis method results and the minimum free-energy model. In the comparative sequence analysis method, a novel algorithm independent of multiple sequence alignment was developed to improve performance. When dealing with a single-RNA sequence, the minimum free-energy model is improved by combining it with RNA class information. Secondary structure prediction experiments were done on tRNA and RNAse P RNA; sensitivity and specificity were both improved. Furthermore, software programs were developed for non-commercial use.


Subject(s)
Algorithms , Nucleic Acid Conformation , RNA, Archaeal/chemistry , RNA, Bacterial/chemistry , RNA, Protozoan/chemistry , RNA, Transfer/chemistry , Anaplasma marginale/genetics , Base Sequence , Halobacterium/genetics , Molecular Sequence Data , Plasmodium falciparum/genetics , Sequence Alignment , Sequence Analysis, RNA/methods , Thermodynamics
4.
Genet Mol Res ; 10(2): 588-603, 2011 Apr 12.
Article in English | MEDLINE | ID: mdl-21491369

ABSTRACT

In order to classify the real/pseudo human precursor microRNA (pre-miRNAs) hairpins with ab initio methods, numerous features are extracted from the primary sequence and second structure of pre-miRNAs. However, they include some redundant and useless features. It is essential to select the most representative feature subset; this contributes to improving the classification accuracy. We propose a novel feature selection method based on a genetic algorithm, according to the characteristics of human pre-miRNAs. The information gain of a feature, the feature conservation relative to stem parts of pre-miRNA, and the redundancy among features are all considered. Feature conservation was introduced for the first time. Experimental results were validated by cross-validation using datasets composed of human real/pseudo pre-miRNAs. Compared with microPred, our classifier miPredGA, achieved more reliable sensitivity and specificity. The accuracy was improved nearly 12%. The feature selection algorithm is useful for constructing more efficient classifiers for identification of real human pre-miRNAs from pseudo hairpins.


Subject(s)
Inverted Repeat Sequences/genetics , MicroRNAs , Nucleic Acid Conformation , Algorithms , Base Sequence , Computational Biology/methods , Humans , MicroRNAs/chemistry , MicroRNAs/genetics , MicroRNAs/ultrastructure , Molecular Sequence Data , RNA Precursors/chemistry , RNA Precursors/genetics , Sequence Analysis, DNA
5.
Genet Mol Res ; 9(2): 820-34, 2010 May 04.
Article in English | MEDLINE | ID: mdl-20449815

ABSTRACT

Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.


Subject(s)
Computational Biology/methods , Data Mining/methods , Expressed Sequence Tags , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/methods , Software , Animals , Base Sequence , Humans
SELECTION OF CITATIONS
SEARCH DETAIL