Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Biol Med ; 43(7): 865-9, 2013 Aug 01.
Article in English | MEDLINE | ID: mdl-23746728

ABSTRACT

The classification of normal and cardiovascular disease groups with consensus models according to metal concentration in blood/urine samples is discussed in this study. The concentrations of nine elements (i.e., chromium, iron, manganese, aluminum, cadmium, copper, zinc, nickel and selenium) were analyzed using three types of chemometric methods including fisher linear discriminant analysis (FLDA), support vector machine (SVM) and decision tree (DTree). Data from 60 healthy individuals and 24 cardiovascular patients were collected and analyzed. Principal component analysis (PCA) was initially used in a preliminary analysis; however, it proved a difficult task to distinguish normal samples from cardiovascular ones using this method. Then, based on the consensus strategy, a series of classifiers were constructed and compared. In terms of three performance indices, i.e., accuracy, sensitivity and specificity, the DTree classifier exhibited the best overall performance, followed by SVM and FLDA is the poorest. In addition, analysis of blood samples was superior to urine samples. In conclusion, the combination of a consensus DTree classifier and elemental analysis of blood samples can serve as an aid for diagnosis of cardiovascular diseases, especially in routine physical examination.


Subject(s)
Cardiovascular Diseases/blood , Cardiovascular Diseases/urine , Diagnosis, Computer-Assisted/methods , Metals, Heavy/blood , Metals, Heavy/urine , Adult , Aged , Aged, 80 and over , Algorithms , Cardiovascular Diseases/diagnosis , Cardiovascular Diseases/epidemiology , Case-Control Studies , Computational Biology , Feasibility Studies , Humans , Middle Aged , Models, Cardiovascular , Models, Statistical , Principal Component Analysis , Reproducibility of Results , Sensitivity and Specificity
2.
Article in English | MEDLINE | ID: mdl-23274502

ABSTRACT

Near and mid-infrared (NIR/MIR) spectroscopy techniques have gained great acceptance in the industry due to their multiple applications and versatility. However, a success of application often depends heavily on the construction of accurate and stable calibration models. For this purpose, a simple multi-model fusion strategy is proposed. It is actually the combination of Kohonen self-organizing map (KSOM), mutual information (MI) and partial least squares (PLSs) and therefore named as KMICPLS. It works as follows: First, the original training set is fed into a KSOM for unsupervised clustering of samples, on which a series of training subsets are constructed. Thereafter, on each of the training subsets, a MI spectrum is calculated and only the variables with higher MI values than the mean value are retained, based on which a candidate PLS model is constructed. Finally, a fixed number of PLS models are selected to produce a consensus model. Two NIR/MIR spectral datasets from brewing industry are used for experiments. The results confirms its superior performance to two reference algorithms, i.e., the conventional PLS and genetic algorithm-PLS (GAPLS). It can build more accurate and stable calibration models without increasing the complexity, and can be generalized to other NIR/MIR applications.


Subject(s)
Beer/analysis , Spectrophotometry, Infrared/methods , Spectroscopy, Near-Infrared/methods , Wine/analysis , Calibration , Least-Squares Analysis , Models, Statistical
3.
Comput Biol Chem ; 35(3): 131-6, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21704258

ABSTRACT

MicroRNA (miRNA) is the negative regulator of gene expression, also known as guide strand of transient miRNA:miRNA* duplex. It is critical in maintaining the normal physiological processes such as development, differentiation, and apoptosis in many organisms. With increasing miRNA data, it is desirable to design methods to identify guide strand based on machine learning algorithms. In this study, the random forest models based on local sequence-structure features were proposed to identify miRNA in four species. The accuracies achieved were 86.51% for Homo sapiens, 81.66% for Ornithorhynchus anatinus, 82.33% for Mus musculus and 85.71% for Schmidtea mediterranea, respectively. Furthermore, the important analysis of feature elements was carried out by using the conditional feature importance strategy. The analysis results revealed that most of the significant elements were related to guanine-cytosine (GC) base pair. We believed that our method could be beneficial to annotate the function of miRNA and help the further understanding of the RNA interference mechanism.


Subject(s)
Computational Biology , MicroRNAs/genetics , Algorithms , Animals , Base Pairing , Cytosine/analysis , Databases, Genetic , Guanine/analysis , Mice , MicroRNAs/metabolism , Planarians , Platypus , RNA Interference , ROC Curve
4.
Protein Pept Lett ; 18(9): 906-11, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21529343

ABSTRACT

Protein-protein interactions (PPIs) are crucial to most biochemical processes in human beings. Although many human PPIs have been identified by experiments, the number is still limited compared to the available protein sequences of human organisms. Recently, many computational methods have been proposed to facilitate the recognition of novel human PPIs. However the existing methods only concentrated on the information of individual PPI, while the systematic characteristic of protein-protein interaction networks (PINs) was ignored. In this study, a new method was proposed by combining the global information of PINs and protein sequence information. Random forest (RF) algorithm was implemented to develop the prediction model, and a high accuracy of 91.88% was obtained. Furthermore, the RF model was tested using three independent datasets with good performances, suggesting that our method is a useful tool for identification of PPIs and investigation into PINs as well.


Subject(s)
Algorithms , Protein Interaction Mapping/methods , Proteins/metabolism , Databases, Protein , Humans , Metabolic Networks and Pathways , Models, Biological , Sequence Analysis, Protein/methods
5.
Interdiscip Sci ; 1(2): 151-5, 2009 Jun.
Article in English | MEDLINE | ID: mdl-20640829

ABSTRACT

Pattern recognition methods could be of great help to disease diagnosis. In this study, a semi-supervised learning based method, Laplacian support vector machine (LapSVM), was used in diabetes diseases prediction. The diabetes disease dataset used in this article is Pima Indians diabetes dataset obtained from the UCI Repository of Machine Learning Databases and all patients in the dataset are females at least 21 years old of Pima Indian heritage. Firstly, LapSVM was trained as a fully-supervised learning classifier to predict diabetes dataset and 79.17% accuracy was obtained. Then, it was trained as a semi-supervised learning classifier and we got the prediction accuracy 82.29%. The obtained accuracy 82.29% is higher than other previous reports. The experiments led to the finding that LapSVM offers a very promising application, i.e., LapSVM can be used to solve a fully-supervised learning problem by solving a semi-supervised learning problem. The result suggests that LapSVM can be of great help to physicians in the process of diagnosing diabetes disease and it could be a very promising method in the situations where a lot of data are not class-labeled.


Subject(s)
Artificial Intelligence , Decision Support Techniques , Diabetes Mellitus/diagnosis , Algorithms , Computer Simulation , Computers , Databases, Factual , Diabetes Mellitus/ethnology , Female , Humans , Indians, North American , Models, Statistical , Models, Theoretical , Reproducibility of Results
6.
J Theor Biol ; 247(4): 608-15, 2007 Aug 21.
Article in English | MEDLINE | ID: mdl-17540409

ABSTRACT

Living cell is highly responsive to specific chemicals in its environment, such as hormones and molecules in food or aromas. The reason is ascribed to the existence of widespread and diverse signal transduction pathways, between which crosstalks usually exist, thus constitute a complex signaling network. Evidently, knowledge of topology characteristic of this network could contribute a lot to the understanding of diverse cellular behaviors and life phenomena thus come into being. In this presentation, signal transduction data is extracted from KEGG to construct a cellular signaling network of Homo sapiens, which has 931 nodes and 6798 links in total. Computing the degree distribution, we find it is not a random network, but a scale-free network following a power-law of P(K) approximately K(-gamma), with gamma approximately equal to 2.2. Among three graph partition algorithms, the Guimera's simulated annealing method is chosen to study the details of topology structure and other properties of this cellular signaling network, as it shows the best performance. To reveal the underlying biological implications, further investigation is conducted on ad hoc community and sketch map of individual community is drawn accordingly. The involved experiment data can be found in the supplementary material.


Subject(s)
Algorithms , Cell Physiological Phenomena , Computer Simulation , Models, Statistical , Signal Transduction , Computational Biology , Humans , Models, Biological
7.
Protein J ; 25(4): 241-9, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16703470

ABSTRACT

A new method was proposed for prediction of mitochondrial proteins by the discrete wavelet transform, based on the sequence-scale similarity measurement. This sequence-scale similarity, revealing more information than other conventional methods, does not rely on subcellular location information and can directly predict protein sequences with different length. In our experiments, 499 mitochondrial protein sequences, constituting a mitochondria database, were used as training dataset, and 681 non-mitochondrial protein sequences were tested. The system can predict these sequences with sensitivity, specificity, accuracy and MCC of 50.30%, 95.74%, 76.53% and 0.54, respectively. Source code of the new program is available on request from the authors.


Subject(s)
Computational Biology/methods , Mitochondrial Proteins/chemistry , Mitochondrial Proteins/metabolism , Amino Acid Sequence , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...