Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Protein Pept Lett ; 20(3): 243-8, 2013 Mar.
Article in English | MEDLINE | ID: mdl-22591473

ABSTRACT

Protein disordered regions are associated with some critical cellular functions such as transcriptional regulation, translation and cellular signal transduction, and they are responsible for various diseases. Although experimental methods have been developed to determine these regions, they are time-consuming and expensive. Therefore, it is highly desired to develop computational methods that can provide us with this kind information in a rapid and inexpensive manner. Here we propose a sequence-based computational approach for predicting protein disordered regions by means of the Nearest Neighbor algorithm, in which conservation, amino acid factor and secondary structure status of each amino acid in a fixed-length sliding window are taken as the encoding features. Also, the feature selection based on mRMR (maximum Relevancy Minimum Redundancy) is applied to obtain an optimal 51-feature set that includes 39 conservation features and 12 secondary structure features. With the optimal 51 features, our predictor yielded quite promising MCC (Mathew's correlation coefficients): 0.371 on a rigorous benchmark dataset tested by 5-fold cross-validation and 0.219 on an independent test dataset. Our results suggest that conservation and secondary structure play important roles in intrinsically disordered proteins.


Subject(s)
Amino Acids/chemistry , Protein Structure, Secondary , Proteins/chemistry , Sequence Analysis, Protein , Algorithms , Humans
2.
PLoS One ; 7(8): e42517, 2012.
Article in English | MEDLINE | ID: mdl-22880014

ABSTRACT

Bacterial pathogens continue to threaten public health worldwide today. Identification of bacterial virulence factors can help to find novel drug/vaccine targets against pathogenicity. It can also help to reveal the mechanisms of the related diseases at the molecular level. With the explosive growth in protein sequences generated in the postgenomic age, it is highly desired to develop computational methods for rapidly and effectively identifying virulence factors according to their sequence information alone. In this study, based on the protein-protein interaction networks from the STRING database, a novel network-based method was proposed for identifying the virulence factors in the proteomes of UPEC 536, UPEC CFT073, P. aeruginosa PAO1, L. pneumophila Philadelphia 1, C. jejuni NCTC 11168 and M. tuberculosis H37Rv. Evaluated on the same benchmark datasets derived from the aforementioned species, the identification accuracies achieved by the network-based method were around 0.9, significantly higher than those by the sequence-based methods such as BLAST, feature selection and VirulentPred. Further analysis showed that the functional associations such as the gene neighborhood and co-occurrence were the primary associations between these virulence factors in the STRING database. The high success rates indicate that the network-based method is quite promising. The novel approach holds high potential for identifying virulence factors in many other various organisms as well because it can be easily extended to identify the virulence factors in many other bacterial species, as long as the relevant significant statistical data are available for them.


Subject(s)
Computational Biology/methods , Virulence Factors/chemistry , Algorithms , Bacteria/pathogenicity , Bacterial Proteins/chemistry , Databases, Protein , Protein Interaction Maps , ROC Curve , Sequence Alignment , Sequence Analysis, Protein
3.
PLoS One ; 7(6): e39308, 2012.
Article in English | MEDLINE | ID: mdl-22720092

ABSTRACT

The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28-40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine.


Subject(s)
Proteins/chemistry , Protein Conformation , Solvents/chemistry
4.
J Biomol Struct Dyn ; 29(6): 650-8, 2012.
Article in English | MEDLINE | ID: mdl-22545996

ABSTRACT

Protein oxidation is a ubiquitous post-translational modification that plays important roles in various physiological and pathological processes. Owing to the fact that protein oxidation can also take place as an experimental artifact or caused by oxygen in the air during the process of sample collection and analysis, and that it is both time-consuming and expensive to determine the protein oxidation sites purely by biochemical experiments, it would be of great benefit to develop in silico methods for rapidly and effectively identifying protein oxidation sites. In this study, we developed a computational method to address this problem. Our method was based on the nearest neighbor algorithm in which, however, the maximum relevance minimum redundancy and incremental feature selection approaches were incorporated. From the initial 735 features, 16 features were selected as the optimal feature set. Of such 16 optimized features, 10 features were associated with the position-specific scoring matrix conservation scores, three with the amino acid factors, one with the propensity of conservation of residues on protein surface, one with the side chain count of carbon atom deviation from mean, and one with the solvent accessibility. It was observed that our prediction model achieved an overall success rate of 75.82%, indicating that it is quite encouraging and promising for practical applications. Also, the 16 optimal features obtained through this study may provide useful clues and insights for in-depth understanding the action mechanism of protein oxidation.


Subject(s)
Proteins/chemistry , Algorithms , Computational Biology , Oxidation-Reduction , Protein Processing, Post-Translational , Proteins/metabolism
5.
Protein Pept Lett ; 19(6): 644-51, 2012 Jun 01.
Article in English | MEDLINE | ID: mdl-22519536

ABSTRACT

The information of protein subcellular localization is vitally important for in-depth understanding the intricate pathways that regulate biological processes at the cellular level. With the rapidly increasing number of newly found protein sequence in the Post-Genomic Age, many automated methods have been developed attempting to help annotate their subcellular locations in a timely manner. However, very few of them were developed using the protein-protein interaction (PPI) network information. In this paper, we have introduced a new concept called "tethering potential" by which the PPI information can be effectively fused into the formulation for protein samples. Based on such a network frame, a new predictor called Yeast-PLoc has been developed for identifying budding yeast proteins among their 19 subcellular location sites. Meanwhile, a purely sequence-based approach, called the "hybrid-property" method, is integrated into Yeast-PLoc as a fall-back to deal with those proteins without sufficient PPI information. The overall success rate by the jackknife test on the 4,683 yeast proteins in the training dataset was 70.25%. Furthermore, it was shown that the success rate by Yeast- PLoc on an independent dataset was remarkably higher than those by some other existing predictors, indicating that the current approach by incorporating the PPI information is quite promising. As a user-friendly web-server, Yeast-PLoc is freely accessible at http://yeastloc.biosino.org/.


Subject(s)
Proteomics/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Databases, Protein , Intracellular Space/metabolism , Models, Statistical , Protein Interaction Maps , Proteome/metabolism , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae Proteins/chemistry , Software , Subcellular Fractions/chemistry , Subcellular Fractions/metabolism
6.
Biochimie ; 94(4): 1017-25, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22239951

ABSTRACT

Longevity is one of the most basic and one of the most essential properties of all living organisms. Identification of genes that regulate longevity would increase understanding of the mechanisms of aging, so as to help facilitate anti-aging intervention and extend the life span. In this study, based on the network features and the biochemical/physicochemical features of the deletion network and deletion genes, as well as their functional features, a two-layer model was developed for predicting the deletion effects on yeast longevity. The first stage of our prediction approach was to identify whether the deletion of one gene would change the life span of yeast; if it did, the second stage of our procedure would automatically proceed to predict whether the deletion of one gene would increase or decrease the life span. It was observed by analyzing the predicted results that the functional features (such as mitochondrial function and chromatin silencing), the network features (such as the edge density and edge weight density of the deletion network), and the local centrality of deletion gene, would have important impact for predicting the deletion effects on longevity. It is anticipated that our model may become a useful tool for studying longevity from the angle of genes and networks. Moreover, it has not escaped our notice that, after some modification, the current model can also be used to study many other phenotype prediction problems from the angle of systems biology.


Subject(s)
Artificial Intelligence , Gene Deletion , Microbial Viability/genetics , Models, Genetic , Saccharomyces cerevisiae/genetics , Aging , Algorithms , Animals , Caenorhabditis elegans/genetics , Caenorhabditis elegans/physiology , Computer Simulation , Genes, Fungal , Longevity , Saccharomyces cerevisiae/physiology
7.
Protein Pept Lett ; 19(1): 113-9, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21919852

ABSTRACT

Induced pluripotent stem cells have displayed great potential in disease investigation and drug development applications. However, selection of reprogramming factors in each cell type or disease state is both expensive and time consuming. To deal with this kind of situation, a fast computational framework was developed by optimize the reprogramming factors via the protein interaction network and gene functional profiles. It can be used to select reprogramming factors from millions of possibilities. It is anticipated that the novel approach will become a very useful tool for both basic research and drug development.


Subject(s)
Cellular Reprogramming/physiology , Induced Pluripotent Stem Cells/metabolism , Protein Interaction Maps/physiology , Animals , Cell Differentiation/genetics , Databases, Factual , Gene Expression Profiling , Humans , Induced Pluripotent Stem Cells/cytology , Kruppel-Like Factor 4 , Kruppel-Like Transcription Factors/chemistry , Kruppel-Like Transcription Factors/genetics , Mice , Octamer Transcription Factor-3/chemistry , Octamer Transcription Factor-3/genetics , Proto-Oncogene Proteins c-myc/chemistry , Proto-Oncogene Proteins c-myc/genetics , SOXB1 Transcription Factors/chemistry , SOXB1 Transcription Factors/genetics , Systems Biology
8.
Protein Pept Lett ; 19(1): 108-12, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21919853

ABSTRACT

As many diseases like high cholesterol are referred to lipid metabolism, studying the lipid metabolic pathway has a positive effect on finding the knowledge about interactions between different elements within high complex living systems. Here, we employed a typical ensemble learning method, Bagging learner, to study and predict the possible sub lipid metabolic pathway of small molecules based on physical and chemical features of the compounds. As a result, jackknife cross validation test and independent set test on the model reached 89.85% and 91.46%, respectively. Therefore, our predictor may be used for finding the new compounds which participate in lipid metabolic procedures.


Subject(s)
Artificial Intelligence , Lipid Metabolism , Small Molecule Libraries/chemistry , Computational Biology , Databases, Factual , Metabolic Networks and Pathways , Predictive Value of Tests , Small Molecule Libraries/metabolism
9.
Protein Pept Lett ; 19(1): 91-8, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21919855

ABSTRACT

It is of great use to find out and clear up the interactions between enzymes and small molecules, for understanding the molecular and cellular functions of organisms. In this study, we developed a novel method for the prediction of enzyme-small molecules interactions based on machine learning approach. The biochemical and physicochemical description of proteins and the functional group composition of small molecules are used for representing enzyme-small molecules pairs. Tested by jackknife cross-validation, our predictor achieved an overall accuracy of 87.47%, showing an acceptable efficiency. The 39 features selected by feature selection were analyzed for further understanding of enzyme-small molecule interactions.


Subject(s)
Algorithms , Proteins/chemistry , Sequence Analysis, Protein/methods , Small Molecule Libraries/chemistry , Software , Support Vector Machine , Amino Acid Sequence , Computational Biology , Databases, Protein , Hydrophobic and Hydrophilic Interactions , Molecular Sequence Data , Predictive Value of Tests , Protein Binding , Proteins/metabolism , Small Molecule Libraries/metabolism
10.
Protein Pept Lett ; 19(1): 23-8, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21919863

ABSTRACT

Information of protein quaternary structure can help to understand the biological functions of proteins. Because wet-lab experiments are both time-consuming and costly, we adopt a novel computational approach to assign proteins into 10 kinds of quaternary structures. By coding each protein using its biochemical and physicochemical properties, feature selection was carried out using Incremental Feature Selection (IFS) method. The thus obtained optimal feature set consisted of 97 features, with which the prediction model was built. As a result, the overall prediction success rate is 74.90% evaluated by Jackknife test, much higher than the overall correct rate of a random guess 10% (1/10). The further feature analysis indicates that protein secondary structure is the most contributed feature in the prediction of protein quaternary structure.


Subject(s)
Protein Structure, Quaternary , Proteins/chemistry , Software , Algorithms , Computational Biology , Databases, Protein , Protein Multimerization , Protein Structure, Secondary , Proteins/physiology , Structure-Activity Relationship
11.
Protein Pept Lett ; 19(1): 15-22, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21919864

ABSTRACT

It is well known that protein subcellular localizations are closely related to their functions. Although many computational methods and tools are available from Internet, it is still necessary to develop new algorithms in this filed to gain a better understanding of the complex mechanism of plant subcellular localization. Here, we provide a new web server named PSCL for plant protein subcellular localization prediction by employing optimized functional domains. After feature optimization, 848 optimal functional domains from InterPro were obtained to represent each protein. By calculating the distances to each of the seven categories, PSCL showing the possibilities of a protein located into each of those categories in ascending order. Toward our dataset, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test. Gene Ontology enrichment analysis showing that catalytic activity, cellular process and metabolic process are strongly correlated with the localization of plant proteins. Finally, PSCL, a Linux Operate System based web interface for the predictor was designed and is accessible for public use at http://pscl.biosino.org/.


Subject(s)
Plant Cells/chemistry , Plant Proteins/chemistry , Plants/chemistry , Software , Subcellular Fractions/chemistry , Algorithms , Biological Evolution , Computational Biology , Databases, Protein , Phylogeny , Plant Cells/physiology , Plant Proteins/genetics , Protein Structure, Tertiary
12.
J Proteomics ; 75(5): 1654-65, 2012 Feb 16.
Article in English | MEDLINE | ID: mdl-22178444

ABSTRACT

S-nitrosylation (SNO) is one of the most important and universal post-translational modifications (PTMs) which regulates various cellular functions and signaling events. Identification of the exact S-nitrosylation sites in proteins may facilitate the understanding of the molecular mechanisms and biological function of S-nitrosylation. Unfortunately, traditional experimental approaches used for detecting S-nitrosylation sites are often laborious and time-consuming. However, computational methods could overcome this demerit. In this work, we developed a novel predictor based on nearest neighbor algorithm (NNA) with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, second structure and the solvent accessibility were utilized to represent the peptides concerned. Feature analysis showed that the features except residual disorder affected identification of the S-nitrosylation sites. It was also shown via the site-specific feature analysis that the features of sites away from the central cysteine might contribute to the S-nitrosylation site determination through a subtle manner. It is anticipated that our prediction method may become a useful tool for identifying the protein S-nitrosylation sites and that the features analysis described in this paper may provide useful insights for in-depth investigation into the mechanism of S-nitrosylation.


Subject(s)
Algorithms , Protein Processing, Post-Translational , Proteins/chemistry , Sequence Analysis, Protein/methods , Animals , Humans , Protein Structure, Secondary , Proteins/genetics , Proteins/metabolism
13.
PLoS One ; 6(7): e22989, 2011.
Article in English | MEDLINE | ID: mdl-21829572

ABSTRACT

Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development.


Subject(s)
Biomarkers, Tumor/metabolism , Body Fluids/chemistry , Neoplasms/diagnosis , Protein Interaction Maps , Proteins/analysis , Proteins/metabolism , Algorithms , Biomarkers, Tumor/analysis , Body Fluids/metabolism , Databases, Protein , Humans , Neoplasms/metabolism , Proteins/chemistry
14.
Biopolymers ; 95(11): 763-71, 2011 Nov.
Article in English | MEDLINE | ID: mdl-21544797

ABSTRACT

Protein methylation, one of the most important post-translational modifications, typically takes place on arginine or lysine residue. The reversible modification involves a series of basic cellular processes. Identification of methyl proteins with their sites will facilitate the understanding of the molecular mechanism of methylation. Besides the experimental methods, computational predictions of methylated sites are much more desirable for their convenience and fast speed. Here, we propose a method dedicated to predicting methylated sites of proteins. Feature selection was made on sequence conservation, physicochemical/biochemical properties, and structural disorder by applying maximum relevance minimum redundancy and incremental feature selection methods. The prediction models were built according to nearest the neighbor algorithm and evaluated by the jackknife cross-validation. We built 11 and 9 predictors for methylarginine and methyllysine, respectively, and integrated them to predict methylated sites. As a result, the average prediction accuracies are 74.25%, 77.02% for methylarginine and methyllysine training sets, respectively. Feature analysis suggested evolutionary information, and physicochemical/biochemical properties play important roles in the recognition of methylated sites. These findings may provide valuable information for exploiting the mechanisms of methylation. Our method may serve as a useful tool for biologists to find the potential methylated sites of proteins.


Subject(s)
Arginine/chemistry , Lysine/chemistry , Methylation , Models, Biological
15.
Biochimie ; 93(3): 489-96, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21075167

ABSTRACT

Palmitoylation is a universal and important lipid modification, involving a series of basic cellular processes, such as membrane trafficking, protein stability and protein aggregation. With the avalanche of new protein sequences generated in the post genomic era, it is highly desirable to develop computational methods for rapidly and effectively identifying the potential palmitoylation sites of uncharacterized proteins so as to timely provide useful information for revealing the mechanism of protein palmitoylation. By using the Incremental Feature Selection approach based on amino acid factors, conservation, disorder feature, and specific features of palmitoylation site, a new predictor named IFS-Palm was developed in this regard. The overall success rate thus achieved by jackknife test on a newly constructed benchmark dataset was 90.65%. It was shown via an in-depth analysis that palmitoylation was intimately correlated with the feature of the upstream residue directly adjacent to cysteine site as well as the conservation of amino acid cysteine. Meanwhile, the protein disorder region might also play an import role in the post-translational modification. These findings may provide useful insights for revealing the mechanisms of palmitoylation.


Subject(s)
Computational Biology/methods , Lipoylation , Proteins/chemistry , Proteins/metabolism , Algorithms , Amino Acid Sequence , Binding Sites , Databases, Protein , Reproducibility of Results , Saccharomycetales/metabolism
16.
PLoS One ; 6(12): e29491, 2011.
Article in English | MEDLINE | ID: mdl-22220213

ABSTRACT

Given a compound, how can we effectively predict its biological function? It is a fundamentally important problem because the information thus obtained may benefit the understanding of many basic biological processes and provide useful clues for drug design. In this study, based on the information of chemical-chemical interactions, a novel method was developed that can be used to identify which of the following eleven metabolic pathway classes a query compound may be involved with: (1) Carbohydrate Metabolism, (2) Energy Metabolism, (3) Lipid Metabolism, (4) Nucleotide Metabolism, (5) Amino Acid Metabolism, (6) Metabolism of Other Amino Acids, (7) Glycan Biosynthesis and Metabolism, (8) Metabolism of Cofactors and Vitamins, (9) Metabolism of Terpenoids and Polyketides, (10) Biosynthesis of Other Secondary Metabolites, (11) Xenobiotics Biodegradation and Metabolism. It was observed that the overall success rate obtained by the method via the 5-fold cross-validation test on a benchmark dataset consisting of 3,137 compounds was 77.97%, which is much higher than 10.45%, the corresponding success rate obtained by the random guesses. Besides, to deal with the situation that some compounds may be involved with more than one metabolic pathway class, the method presented here is featured by the capacity able to provide a series of potential metabolic pathway classes ranked according to the descending order of their likelihood for each of the query compounds concerned. Furthermore, our method was also applied to predict 5,549 compounds whose metabolic pathway classes are unknown. Interestingly, the results thus obtained are quite consistent with the deductions from the reports by other investigators. It is anticipated that, with the continuous increase of the chemical-chemical interaction data, the current method will be further enhanced in its power and accuracy, so as to become a useful complementary vehicle in annotating uncharacterized compounds for their biological functions.


Subject(s)
Metabolic Networks and Pathways , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Acetylgalactosamine/analogs & derivatives , Acetylgalactosamine/metabolism , Cyclopropanes , Databases as Topic , Reproducibility of Results
17.
PLoS One ; 5(3): e9603, 2010 Mar 11.
Article in English | MEDLINE | ID: mdl-20300175

ABSTRACT

BACKGROUND: Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. METHODS/PRINCIPAL FINDINGS: To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. CONCLUSION/SIGNIFICANCE: Our results indicate that the network prediction system thus established is quite promising and encouraging.


Subject(s)
Pharmaceutical Preparations/chemistry , Technology, Pharmaceutical/methods , Algorithms , Binding Sites , Computational Biology/methods , Humans , Models, Statistical , Protein Conformation , Protein Structure, Secondary , Proteins/chemistry , Receptors, G-Protein-Coupled/metabolism
18.
PLoS One ; 5(12): e15917, 2010 Dec 31.
Article in English | MEDLINE | ID: mdl-21209839

ABSTRACT

BACKGROUND: Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites--hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination. CONCLUSIONS/SIGNIFICANCE: These findings may provide useful insights for exploiting the mechanisms of hydroxylation.


Subject(s)
Computational Biology/methods , Hydroxylysine/chemistry , Hydroxyproline/chemistry , Algorithms , Amino Acids/chemistry , Binding Sites , Biochemistry/methods , Computational Biology/instrumentation , Hydroxylation , Hydroxylysine/metabolism , Hydroxyproline/metabolism , Models, Statistical , Models, Theoretical , Peptides/chemistry , Position-Specific Scoring Matrices , Protein Conformation , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...