Search | VHL Regional Portal

EPDRNA: A Model for Identifying DNA-RNA Binding Sites in Disease-Related Proteins.

Sun, CanZhuang; Feng, YongE.

Protein J ; 43(3): 513-521, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38491248

ABSTRACT

Protein-DNA and protein-RNA interactions are involved in many biological processes and regulate many cellular functions. Moreover, they are related to many human diseases. To understand the molecular mechanism of protein-DNA binding and protein-RNA binding, it is important to identify which residues in the protein sequence bind to DNA and RNA. At present, there are few methods for specifically identifying the binding sites of disease-related protein-DNA and protein-RNA. In this study, so we combined four machine learning algorithms into an ensemble classifier (EPDRNA) to predict DNA and RNA binding sites in disease-related proteins. The dataset used in model was collated from UniProt and PDB database, and PSSM, physicochemical properties and amino acid type were used as features. The EPDRNA adopted soft voting and achieved the best AUC value of 0.73 at the DNA binding sites, and the best AUC value of 0.71 at the RNA binding sites in 10-fold cross validation in the training sets. In order to further verify the performance of the model, we assessed EPDRNA for the prediction of DNA-binding sites and the prediction of RNA-binding sites on the independent test dataset. The EPDRNA achieved 85% recall rate and 25% precision on the protein-DNA interaction independent test set, and achieved 82% recall rate and 27% precision on the protein-RNA interaction independent test set. The online EPDRNA webserver is freely available at http://www.s-bioinformatics.cn/epdrna .

Subject(s)

DNA , Machine Learning , RNA , Binding Sites , RNA/metabolism , RNA/chemistry , Humans , DNA/metabolism , DNA/chemistry , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/metabolism , RNA-Binding Proteins/genetics , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics , Algorithms , Databases, Protein , Protein Binding , Computational Biology/methods

IDPsBind: a repository of binding sites for intrinsically disordered proteins complexes with known 3D structures.

Sun, CanZhuang; Feng, YongE; Fan, GuoLiang.

BMC Mol Cell Biol ; 23(1): 33, 2022 Jul 26.

Article in English | MEDLINE | ID: mdl-35883018

ABSTRACT

BACKGROUND: Intrinsically disordered proteins (IDPs) lack a stable three-dimensional structure under physiological conditions but play crucial roles in many biological processes. Intrinsically disordered proteins perform various biological functions by interacting with other ligands. RESULTS: Here, we present a database, IDPsBind, which displays interacting sites between IDPs and interacting ligands by using the distance threshold method in known 3D structure IDPs complexes from the PDB database. IDPsBind contains 9626 IDPs complexes and 880 intrinsically disordered proteins verified by experiments. The current release of the IDPsBind database is defined as version 1.0. IDPsBind is freely accessible at http://www.s-bioinformatics.cn/idpsbind/home/ . CONCLUSIONS: IDPsBind provides more comprehensive interaction sites for IDPs complexes of known 3D structures. It can not only help the subsequent studies of the interaction mechanism of intrinsically disordered proteins but also provides a suitable background for developing the algorithms for predicting the interaction sites of intrinsically disordered proteins.

Subject(s)

Intrinsically Disordered Proteins , Algorithms , Binding Sites , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism

Identification of metal ion binding sites based on amino acid sequences.

Cao, Xiaoyong; Hu, Xiuzhen; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua.

PLoS One ; 12(8): e0183756, 2017.

Article in English | MEDLINE | ID: mdl-28854211

ABSTRACT

The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.

Subject(s)

Amino Acid Motifs , Amino Acids/chemistry , Metals/chemistry , Proteins/chemistry , Algorithms , Amino Acid Sequence , Amino Acids/genetics , Amino Acids/metabolism , Binding Sites/genetics , Computational Biology/methods , Databases, Protein , Ions/chemistry , Ions/metabolism , Metals/metabolism , Protein Binding , Proteins/genetics , Proteins/metabolism , Support Vector Machine

Identify Secretory Protein of Malaria Parasite with Modified Quadratic Discriminant Algorithm and Amino Acid Composition.

Feng, Yong-E.

Interdiscip Sci ; 8(2): 156-161, 2016 Jun.

Article in English | MEDLINE | ID: mdl-26286010

ABSTRACT

Malaria parasite secretes various proteins in infected red blood cell for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine or drug against malaria. In this study, the modified method of quadratic discriminant analysis is presented for predicting the secretory proteins. Firstly, 20 amino acids are divided into five types according to the physical and chemical characteristics of amino acids. Then, we used five types of amino acids compositions as inputs of the modified quadratic discriminant algorithm. Finally, the best prediction performance is obtained by using 20 amino acid compositions, the sensitivity of 96 %, the specificity of 92 % with 0.88 of Mathew's correlation coefficient in fivefold cross-validation test. The results are also compared with those of existing prediction methods. The compared results shown our method are prominent in the prediction of secretory proteins.

Subject(s)

Algorithms , Amino Acids/analysis , Malaria/metabolism , Protozoan Proteins/metabolism , Animals

Identify five kinds of simple super-secondary structures with quadratic discriminant algorithm based on the chemical shifts.

Kou, Gaoshan; Feng, Yonge.

J Theor Biol ; 380: 392-8, 2015 Sep 07.

Article in English | MEDLINE | ID: mdl-26087283

ABSTRACT

The biological function of protein is largely determined by its spatial structure. The research on the relationship between structure and function is the basis of protein structure prediction. However, the prediction of super secondary structure is an important step in the prediction of protein spatial structure. Many algorithms have been proposed for the prediction of protein super secondary structure. However, the parameters used by these methods were primarily based on amino acid sequences. In this paper, we proposed a novel model for predicting five kinds of protein super secondary structures based on the chemical shifts (CSs). Firstly, we analyzed the statistical distribution of chemical shifts of six nuclei in five kinds of protein super secondary structures by using the analysis of variance (ANOVA). Secondly, we used chemical shifts of six nuclei as features, and combined with quadratic discriminant analysis (QDA) to predict five kinds of protein super secondary structures. Finally, we achieved the averaged sensitivity, specificity and the overall accuracy of 81.8%, 95.19%, 82.91%, respectively in seven-fold cross-validation. Moreover, we have performed the prediction by combining the five different chemical shifts as features, the maximum overall accuracy up to 89.87% by using the C,Cα,Cß,N,Hα of Hα chemical shifts, which are clearly superior to that of the quadratic discriminant analysis (QDA) algorithm by using 20 amino acid compositions (AAC) as feature in the seven-fold cross-validation. These results demonstrated that chemical shifts (CSs) are indeed an outstanding parameter for the prediction of five kinds of super secondary structures. In addition, we compared the prediction of the quadratic discriminant analysis (QDA) with that of support vector machine (SVM) by using the same six CSs as features. The result suggested that the quadratic discriminant analysis method by using chemical shifts as features is a good predictor for protein super secondary structures.

Subject(s)

Algorithms , Protein Conformation , Proteins/chemistry , Support Vector Machine

Prediction of protein secondary structure using feature selection and analysis approach.

Feng, Yonge; Lin, Hao; Luo, Liaofu.

Acta Biotheor ; 62(1): 1-14, 2014 Mar.

Article in English | MEDLINE | ID: mdl-24052343

ABSTRACT

The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80% currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25% was used to train and test the proposed method. The results indicate that overall accuracy of 87.8% was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89% at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.

Subject(s)

Algorithms , Peptide Fragments/chemistry , Protein Structure, Secondary , Proteins/chemistry , Databases, Protein , Humans , Models, Molecular , Sequence Analysis, Protein

Use of tetrapeptide signals for protein secondary-structure prediction.

Feng, Yonge; Luo, Liaofu.

Amino Acids ; 35(3): 607-14, 2008 Oct.

Article in English | MEDLINE | ID: mdl-18431531

ABSTRACT

This paper develops a novel sequence-based method, tetra-peptide-based increment of diversity with quadratic discriminant analysis (TPIDQD for short), for protein secondary-structure prediction. The proposed TPIDQD method is based on tetra-peptide signals and is used to predict the structure of the central residue of a sequence fragment. The three-state overall per-residue accuracy (Q (3)) is about 80% in the threefold cross-validated test for 21-residue fragments in the CB513 dataset. The accuracy can be further improved by taking long-range sequence information (fragments of more than 21 residues) into account in prediction. The results show the tetra-peptide signals can indeed reflect some relationship between an amino acid's sequence and its secondary structure, indicating the importance of tetra-peptide signals as the protein folding code in the protein structure prediction.

Subject(s)

Protein Structure, Secondary , Proteins/chemistry , Algorithms , Peptides/chemistry , Sequence Analysis, Protein

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL