Search | VHL Regional Portal

Prediction of protein-protein interactions based on feature selection and data balancing.

Liu, Liang; Lu, Wen-Cong; Cai, Yu-Dong; Feng, Kai-Yan; Peng, Chunrong; Zhu, Yubei.

Protein Pept Lett ; 20(3): 336-45, 2013 Mar.

Article in English | MEDLINE | ID: mdl-22591478

ABSTRACT

Computational approaches are able to analyze protein-protein interactions (PPIs) from a different angle of view by complementing the experimental ones. And they are very efficient in determining whether two proteins can interact with each other. In this paper, KNNs (K-nearest neighbors) is applied to predict the PPIs by coding each protein with the physical and chemical properties of its residues, predicted secondary structures and amino acid compositions. mRMR (minimum-redundancy maximum-relevance) feature selection is adopted to select a compact feature set, features of which are considered to be important for the determination of PPI-nesses. Because the size of the negative dataset (containing non-interactive protein pairs) is much larger than that of the positive dataset (containing interactive protein pairs), the negative dataset is divided into 5 portions and each portion is combined with the positive dataset for one prediction. Thus 5 predictions are performed and the final results are obtained through voting. As a result, the prediction achieves an overall accuracy of 0.8369 with sensitivity of 0.7356. The predictor, developed by this research for the prediction of the fruit fly PPI-nesses, is available for public use at http://chemdata.shu.edu.cn/ppip.

Subject(s)

Amino Acids/chemistry , Computational Biology/methods , Protein Binding , Proteins/chemistry , Algorithms , Protein Interaction Maps

A novel sequence-based method for phosphorylation site prediction with feature selection and analysis.

He, Zhi-Song; Shi, Xiao-He; Kong, Xiang-Ying; Zhu, Yu-Bei; Chou, Kuo-Chen.

Protein Pept Lett ; 19(1): 70-8, 2012 Jan.

Article in English | MEDLINE | ID: mdl-21919857

ABSTRACT

Phosphorylation is one of the most important post-translational modifications, and the identification of protein phosphorylation sites is particularly important for studying disease diagnosis. However, experimental detection of phosphorylation sites is labor intensive. It would be beneficial if computational methods are available to provide an extra reference for the phosphorylation sites. Here we developed a novel sequence-based method for serine, threonine, and tyrosine phosphorylation site prediction. Nearest Neighbor algorithm was employed as the prediction engine. The peptides around the phosphorylation sites with a fixed length of thirteen amino acid residues were extracted via a sliding window along the protein chains concerned. Each of such peptides was coded into a vector with 6,072 features, derived from Amino Acid Index (AAIndex) database, for the classification/detection. Incremental Feature Selection, a feature selection algorithm based on the Maximum Relevancy Minimum Redundancy (mRMR) method was used to select a compact feature set for a further improvement of the classification performance. Three predictors were established for identifying the three types of phosphorylation sites, achieving the overall accuracies of 66.64%, 66.11%% and 66.69%, respectively. These rates were obtained by rigorous jackknife cross-validation tests.

Subject(s)

Peptides/chemistry , Phosphoproteins/chemistry , Sequence Analysis, Protein/methods , Support Vector Machine , Binding Sites , Computational Biology , Data Mining , Databases, Protein , Peptides/metabolism , Phosphoproteins/metabolism , Phosphorylation , Predictive Value of Tests , Protein Processing, Post-Translational , Serine/metabolism , Threonine/metabolism , Tyrosine/metabolism

Prediction of interaction between enzymes and small molecules in metabolic pathways through integrating multiple classifiers.

Lu, Jin; Zhu, Yubei; Li, Yajun; Lu, Wencong; Hu, Lele; Niu, Bing; Qing, Pengfei; Gu, Lei.

Protein Pept Lett ; 17(12): 1536-41, 2010 Dec.

Article in English | MEDLINE | ID: mdl-20937036

ABSTRACT

Information about interactions between enzymes and small molecules is important for understanding various metabolic bioprocesses. In this article we applied a majority voting system to predict the interactions between enzymes and small molecules in the metabolic pathways, by combining several classifiers including AdaBoost, Bagging and KNN together. The advantage of such a strategy is based on the principle that a predictor based majority voting systems usually provide more reliable results than any single classifier. The prediction accuracies thus obtained on a training dataset and an independent testing dataset were 82.8% and 84.8%, respectively. The prediction accuracy for the networking couples in the independent testing dataset was 75.5%, which is about 4% higher than that reported in a previous study. The web-server for the prediction method presented in this paper is available at http://chemdata.shu.edu.cn/small-enz.

Subject(s)

Computer Simulation , Enzymes/chemistry , Metabolic Networks and Pathways , Algorithms , Models, Biological , Models, Chemical , Protein Binding

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL