Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Biol Chem ; 66: 36-43, 2017 Feb.
Article in English | MEDLINE | ID: mdl-27889654

ABSTRACT

Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).


Subject(s)
Carbohydrates/chemistry , Proteins/metabolism , Support Vector Machine , Binding Sites , Cluster Analysis
2.
J Struct Funct Genomics ; 17(2-3): 39-49, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27400687

ABSTRACT

We present a new method for predicting protein-ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction.


Subject(s)
Databases, Protein , Models, Molecular , Software , Streptavidin , Binding Sites , Sequence Analysis, Protein , Streptavidin/chemistry , Streptavidin/genetics
3.
Adv Bioinformatics ; 2015: 528097, 2015.
Article in English | MEDLINE | ID: mdl-26347773

ABSTRACT

Receptor tyrosine kinases are essential proteins involved in cellular differentiation and proliferation in vivo and are heavily involved in allergic diseases, diabetes, and onset/proliferation of cancerous cells. Identifying the interacting partner of this protein, a growth factor ligand, will provide a deeper understanding of cellular proliferation/differentiation and other cell processes. In this study, we developed a method for predicting tyrosine kinase ligand-receptor pairs from their amino acid sequences. We collected tyrosine kinase ligand-receptor pairs from the Database of Interacting Proteins (DIP) and UniProtKB, filtered them by removing sequence redundancy, and used them as a dataset for machine learning and assessment of predictive performance. Our prediction method is based on support vector machines (SVMs), and we evaluated several input features suitable for tyrosine kinase for machine learning and compared and analyzed the results. Using sequence pattern information and domain information extracted from sequences as input features, we obtained 0.996 of the area under the receiver operating characteristic curve. This accuracy is higher than that obtained from general protein-protein interaction pair predictions.

4.
Bioinformation ; 6(5): 204-6, 2011.
Article in English | MEDLINE | ID: mdl-21738315

ABSTRACT

Attachment of a myristoyl group to NH(2)-terminus of a nascent protein among protein post-translational modification (PTM) is called myristoylation. The myristate moiety of proteins plays an important role for their biological functions, such as regulation of membrane binding (HIV-1 Gag) and enzyme activity (AMPK). Several predictors based on protein sequences alone are hitherto proposed. However, they produce a great number of false positive and false negative predictions; or they cannot be used for general purpose (i.e., taxon-specific); or threshold values of the decision rule of predictors need to be selected with cautiousness. Here, we present novel and taxon-free predictors based on protein primary structure. To identify myristoylated proteins accurately, we employ a widely used machinelearning algorithm, support vector machine (SVM). A series of SVM predictors are developed in the present study where various scales representing physicochemical and biological properties of amino acids (from the AAindex database) are used for numerical transformation of protein sequences. Of the predictors, the top ten achieve accuracies of >98% (the average value is 98.34%), and also the area under the ROC curve (AUC) values of >0.98. Compared with those of previous studies, the prediction accuracies are improved by about 3 to 4%.

5.
Article in English | MEDLINE | ID: mdl-20936154

ABSTRACT

Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.

6.
Bioinformation ; 5(6): 255-8, 2010 Nov 27.
Article in English | MEDLINE | ID: mdl-21364827

ABSTRACT

Liquid Chromatography Time-of-Flight Mass Spectrometry (LC-TOF-MS) is widely used for profiling metabolite compounds. LC-TOF-MS is a chemical analysis technique that combines the physical separation capabilities of high-pressure liquid chromatography (HPLC) with the mass analysis capabilities of Time-of-Flight Mass Spectrometry (TOF-MS) which utilizes the difference in the flight time of ions due to difference in the mass-to-charge ratio. Since metabolite compounds have various chemical characteristics, their precise identification is a crucial problem of metabolomics research. Contemporaneously analyzed reference standards are commonly required for mass spectral matching and retention time matching, but there are far fewer reference standards than there are compounds in the organism. We therefore developed a retention time prediction method for HPLC to improve the accuracy of identification of metabolite compounds. This method uses a combination of Support Vector Regression and Multiple Linear Regression adaptively to the measured retention time. We achieved a strong correlation (correlation coefficient = 0.974) between measured and predicted retention times for our experimental data. We also demonstrated a successful identification of an E. coli metabolite compound that cannot be identified by precise mass alone.

7.
J Gen Appl Microbiol ; 55(5): 381-93, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19940384

ABSTRACT

Computational approaches provide valuable information to start experimental surveys identifying glycosylphosphatidylinositol (GPI)-anchored proteins in protein sequence databases. We developed a new sequence-based identification system that uses an optimized classifier based on a support vector machine (SVM) algorithm to recognize appropriate COOH-terminal sequences and uses a classifier implementing a simple majority voting strategy to recognize appropriate NH2-terminal sequences. The SVM classifier showed high accuracy (96%) in 5-fold cross-validation testing, and the majority voting classifier showed high recall (98.88%) when applied to a test dataset of eukaryote proteins. When applied to S. cerevisiae protein sequences, the new identification system showed good ability to classify "unseen" data. Applying our system to protein sequences of three aspergilli, we identified 115 GPI-anchored proteins in Aspergillus fumigatus, 129 in Aspergillus nidulans, and 136 in Aspergillus oryzae. Sequence-based conserved domain search found nearly half of these proteins to have conserved domains that covered a wide range of functions.


Subject(s)
Aspergillus fumigatus/genetics , Aspergillus nidulans/genetics , Aspergillus oryzae/genetics , Databases, Protein , Glycosylphosphatidylinositols/metabolism , Computational Biology , Glycosylphosphatidylinositols/chemistry
8.
Genome Inform ; 16(2): 161-73, 2005.
Article in English | MEDLINE | ID: mdl-16901099

ABSTRACT

We describe a fast protein-protein docking algorithm using a series expansion in terms of newly designed bases to efficiently search the entire six-dimensional conformational space of rigid body molecules. This algorithm is an ab initio docking algorithm designed to list candidates of putative conformations from a global conformational space for unbound docking. In our algorithm, a scoring function is constructed from terms that are the inner products of two scalar fields expressing individual molecules. The mapping from a molecule to a scalar field can be arbitrarily defined to express an energy term. Since this scoring scheme has the same expressiveness as that of a method using a fast Fourier transform (FFT), it has the flexibility to introduce various physicochemical energies. Currently, we are using scalar fields that approximate desolvation free energy and steric hindrance energy. Fast calculation of the scoring function for each conformation of the six-dimensional search space is realized by expansion of the fields in terms of basis functions which are combinations of spherical harmonics and modified Legendre polynomials, and the use of only low-order terms, which carry most of the information on the scalar field. We have implemented this algorithm and evaluated the computation time and precision by using actual protein structure data of complexes and their monomers. This paper presents the results for six unbound cases and in all the cases we obtained at least one conformation close to the native structures (interface RMSD < 3.0 A) within the top 1000 candidates with about 40 seconds of computation time using a single Pentium4 2.4 GHz CPU.


Subject(s)
Algorithms , Protein Interaction Mapping/methods , Computational Biology/methods , Computational Biology/statistics & numerical data , Models, Molecular , Models, Statistical , Predictive Value of Tests , Protein Binding/physiology , Protein Interaction Mapping/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL
...