Search | VHL Regional Portal

Effect of training datasets on support vector machine prediction of protein-protein interactions.

Lo, Siaw Ling; Cai, Cong Zhong; Chen, Yu Zong; Chung, Maxey C M.

Proteomics ; 5(4): 876-84, 2005 Mar.

Article in English | MEDLINE | ID: mdl-15717327

ABSTRACT

Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.

Subject(s)

Computational Biology/methods , Databases, Protein , Proteomics/methods , Algorithms , Animals , Drosophila melanogaster , Humans , Protein Binding , Protein Conformation , Protein Folding , Proteins/chemistry , ROC Curve , Reproducibility of Results , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Thioredoxins/chemistry

Prediction of RNA-binding proteins from primary sequence by a support vector machine approach.

Han, Lian Yi; Cai, Cong Zhong; Lo, Siew Lin; Chung, Maxey C M; Chen, Yu Zong.

RNA ; 10(3): 355-68, 2004 Mar.

Article in English | MEDLINE | ID: mdl-14970381

ABSTRACT

Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions.

Subject(s)

RNA-Binding Proteins/chemistry , Sequence Analysis, Protein , Algorithms , Amino Acid Sequence , Animals , Computational Biology , Data Interpretation, Statistical , Databases, Protein , Humans , Protein Structure, Tertiary , RNA-Binding Proteins/classification , RNA-Binding Proteins/genetics

Advances in modeling of biomolecular interactions.

Cai, Cong-Zhong; Li, Ze-Rong; Wang, Wan-Lu; Chen, Yu-Zong.

Acta Pharmacol Sin ; 25(1): 1-8, 2004 Jan.

Article in English | MEDLINE | ID: mdl-14704115

ABSTRACT

Modeling of molecular interactions is increasingly used in life science research and biotechnology development. Examples are computer aided drug design, prediction of protein interactions with other molecules, and simulation of networks of biomolecules in a particular process in human body. This article reviews recent progress in the related fields and provides a brief overview on the methods used in molecular modeling of biological systems.

Subject(s)

Computer Simulation , Computer-Aided Design , Drug Design , Proteins , Drug Interactions , Humans , Ligands , Models, Chemical , Models, Molecular , Proteins/chemistry , Proteins/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL