Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters











Publication year range
1.
Proteomics ; 24(17): e2300184, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38643383

ABSTRACT

Unconventional secretory proteins (USPs) are vital for cell-to-cell communication and are necessary for proper physiological processes. Unlike classical proteins that follow the conventional secretory pathway via the Golgi apparatus, these proteins are released using unconventional pathways. The primary modes of secretion for USPs are exosomes and ectosomes, which originate from the endoplasmic reticulum. Accurate and rapid identification of exosome-mediated secretory proteins is crucial for gaining valuable insights into the regulation of non-classical protein secretion and intercellular communication, as well as for the advancement of novel therapeutic approaches. Although computational methods based on amino acid sequence prediction exist for predicting unconventional proteins secreted by exosomes (UPSEs), they suffer from significant limitations in terms of algorithmic accuracy. In this study, we propose a novel approach to predict UPSEs by combining multiple deep learning models that incorporate both protein sequences and evolutionary information. Our approach utilizes a convolutional neural network (CNN) to extract protein sequence information, while various densely connected neural networks (DNNs) are employed to capture evolutionary conservation patterns.By combining six distinct deep learning models, we have created a superior framework that surpasses previous approaches, achieving an ACC score of 77.46% and an MCC score of 0.5406 on an independent test dataset.


Subject(s)
Deep Learning , Exosomes , Exosomes/metabolism , Exosomes/chemistry , Neural Networks, Computer , Humans , Computational Biology/methods , Algorithms , Amino Acid Sequence , Proteins/metabolism , Proteins/analysis , Proteins/chemistry
2.
Comput Struct Biotechnol J ; 21: 4836-4848, 2023.
Article in English | MEDLINE | ID: mdl-37854634

ABSTRACT

Autophagy is a primary mechanism for maintaining cellular homeostasis. The synergistic actions of autophagy-related (ATG) proteins strictly regulate the whole autophagic process. Therefore, accurate identification of ATGs is a first and critical step to reveal the molecular mechanism underlying the regulation of autophagy. Current computational methods can predict ATGs from primary protein sequences, but owing to the limitations of algorithms, significant room for improvement still exists. In this research, we propose EnsembleDL-ATG, an ensemble deep learning framework that aggregates multiple deep learning models to predict ATGs from protein sequence and evolutionary information. We first evaluated the performance of individual networks for various feature descriptors to identify the most promising models. Then, we explored all possible combinations of independent models to select the most effective ensemble architecture. The final framework was built and maintained by an organization of four different deep learning models. Experimental results show that our proposed method achieves a prediction accuracy of 94.5 % and MCC of 0.890, which are nearly 4 % and 0.08 higher than ATGPred-FL, respectively. Overall, EnsembleDL-ATG is the first ATG machine learning predictor based on ensemble deep learning. The benchmark data and code utilized in this study can be accessed for free at https://github.com/jingry/autoBioSeqpy/tree/2.0/examples/EnsembleDL-ATG.

3.
ACS Omega ; 8(22): 19728-19740, 2023 Jun 06.
Article in English | MEDLINE | ID: mdl-37305295

ABSTRACT

N7-Methylguanosine (m7G) is a crucial post-transcriptional RNA modification that plays a pivotal role in regulating gene expression. Accurately identifying m7G sites is a fundamental step in understanding the biological functions and regulatory mechanisms associated with this modification. While whole-genome sequencing is the gold standard for RNA modification site detection, it is a time-consuming, expensive, and intricate process. Recently, computational approaches, especially deep learning (DL) techniques, have gained popularity in achieving this objective. Convolutional neural networks and recurrent neural networks are examples of DL algorithms that have emerged as versatile tools for modeling biological sequence data. However, developing an efficient network architecture with superior performance remains a challenging task, requiring significant expertise, time, and effort. To address this, we previously introduced a tool called autoBioSeqpy, which streamlines the design and implementation of DL networks for biological sequence classification. In this study, we utilized autoBioSeqpy to develop, train, evaluate, and fine-tune sequence-level DL models for predicting m7G sites. We provided detailed descriptions of these models, along with a step-by-step guide on their execution. The same methodology can be applied to other systems dealing with similar biological questions. The benchmark data and code utilized in this study can be accessed for free at http://github.com/jingry/autoBioSeeqpy/tree/2.0/examples/m7G.

4.
Front Microbiol ; 14: 1175925, 2023.
Article in English | MEDLINE | ID: mdl-37275146

ABSTRACT

Post-transcriptionally RNA modifications, also known as the epitranscriptome, play crucial roles in the regulation of gene expression during development. Recently, deep learning (DL) has been employed for RNA modification site prediction and has shown promising results. However, due to the lack of relevant studies, it is unclear which DL architecture is best suited for some pyrimidine modifications, such as 5-methyluridine (m5U). To fill this knowledge gap, we first performed a comparative evaluation of various commonly used DL models for epigenetic studies with the help of autoBioSeqpy. We identified optimal architectural variations for m5U site classification, optimizing the layer depth and neuron width. Second, we used this knowledge to develop Deepm5U, an improved convolutional-recurrent neural network that accurately predicts m5U sites from RNA sequences. We successfully applied Deepm5U to transcriptomewide m5U profiling data across different sequencing technologies and cell types. Third, we showed that the techniques for interpreting deep neural networks, including LayerUMAP and DeepSHAP, can provide important insights into the internal operation and behavior of models. Overall, we offered practical guidance for the development, benchmark, and analysis of deep learning models when designing new algorithms for RNA modifications.

5.
J Adv Res ; 41: 219-231, 2022 11.
Article in English | MEDLINE | ID: mdl-36328750

ABSTRACT

INTRODUCTION: The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. OBJECTIVES: This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. METHODS: Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. RESULTS: Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. CONCLUSION: We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.


Subject(s)
Deep Learning , Artificial Intelligence , Neural Networks, Computer , Algorithms , Machine Learning , Proteins
6.
iScience ; 25(12): 105530, 2022 Dec 22.
Article in English | MEDLINE | ID: mdl-36425757

ABSTRACT

Despite the impressive success of deep learning techniques in various types of classification and prediction tasks, interpreting these models and explaining their predictions are still major challenges. In this article, we present an easy-to-use command line tool capable of visualizing and analyzing alternative representations of biological observations learned by deep learning models. This new tool, namely, layerUMAP, integrates autoBioSeqpy software and the UMAP library to address learned high-level representations. An important advantage of the tool is that it provides an interactive option that enables users to visualize the outputs of hidden layers along the depth of the model. We use two different classes of examples to illustrate the potential power of layerUMAP, and the results demonstrate that layerUMAP can provide insightful visual feedback about models and further guide us to develop better models.

7.
Front Microbiol ; 13: 843425, 2022.
Article in English | MEDLINE | ID: mdl-35401453

ABSTRACT

DNA N4-methylcytosine (4mC) is a pivotal epigenetic modification that plays an essential role in DNA replication, repair, expression and differentiation. To gain insight into the biological functions of 4mC, it is critical to identify their modification sites in the genomics. Recently, deep learning has become increasingly popular in recent years and frequently employed for the 4mC site identification. However, a systematic analysis of how to build predictive models using deep learning techniques is still lacking. In this work, we first summarized all existing deep learning-based predictors and systematically analyzed their models, features and datasets, etc. Then, using a typical standard dataset with three species (A. thaliana, C. elegans, and D. melanogaster), we assessed the contribution of different model architectures, encoding methods and the attention mechanism in establishing a deep learning-based model for the 4mC site prediction. After a series of optimizations, convolutional-recurrent neural network architecture using the one-hot encoding and attention mechanism achieved the best overall prediction performance. Extensive comparison experiments were conducted based on the same dataset. This work will be helpful for researchers who would like to build the 4mC prediction models using deep learning in the future.

8.
NAR Genom Bioinform ; 3(4): lqab086, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34617013

ABSTRACT

Type III secretion systems (T3SSs) are bacterial membrane-embedded nanomachines that allow a number of humans, plant and animal pathogens to inject virulence factors directly into the cytoplasm of eukaryotic cells. Export of effectors through T3SSs is critical for motility and virulence of most Gram-negative pathogens. Current computational methods can predict type III secreted effectors (T3SEs) from amino acid sequences, but due to algorithmic constraints, reliable and large-scale prediction of T3SEs in Gram-negative bacteria remains a challenge. Here, we present DeepT3 2.0 (http://advintbioinforlab.com/deept3/), a novel web server that integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest. DeepT3 2.0 combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SE prediction. Outcomes from the different models are processed and integrated for discriminating T3SEs and non-T3SEs. Because it leverages diverse models and an integrative deep learning framework, DeepT3 2.0 outperforms existing methods in validation datasets. In addition, the features learned from networks are analyzed and visualized to explain how models make their predictions. We propose DeepT3 2.0 as an integrated and accurate tool for the discovery of T3SEs.

9.
Front Microbiol ; 12: 605782, 2021.
Article in English | MEDLINE | ID: mdl-33552038

ABSTRACT

Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.

10.
Mol Ther Nucleic Acids ; 22: 862-870, 2020 Dec 04.
Article in English | MEDLINE | ID: mdl-33230481

ABSTRACT

Cancer is one of the most dangerous diseases to human health. The accurate prediction of anticancer peptides (ACPs) would be valuable for the development and design of novel anticancer agents. Current deep neural network models have obtained state-of-the-art prediction accuracy for the ACP classification task. However, based on existing studies, it remains unclear which deep learning architecture achieves the best performance. Thus, in this study, we first present a systematic exploration of three important deep learning architectures: convolutional, recurrent, and convolutional-recurrent networks for distinguishing ACPs from non-ACPs. We find that the recurrent neural network with bidirectional long short-term memory cells is superior to other architectures. By utilizing the proposed model, we implement a sequence-based deep learning tool (DeepACP) to accurately predict the likelihood of a peptide exhibiting anticancer activity. The results indicate that DeepACP outperforms several existing methods and can be used as an effective tool for the prediction of anticancer peptides. Furthermore, we visualize and understand the deep learning model. We hope that our strategy can be extended to identify other types of peptides and may provide more assistance to the development of proteomics and new drugs.

11.
Comput Biol Med ; 43(9): 1177-81, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23930811

ABSTRACT

In this study, we focus on different types of Gram-negative bacterial secreted proteins, and try to analyze the relationships and differences among them. Through an extensive literature search, 1612 secreted proteins have been collected as a standard data set from three data sources, including Swiss-Prot, TrEMBL and RefSeq. To explore the relationships among different types of secreted proteins, we model this data set as a sequence similarity network. Finally, a multi-classifier named SecretP is proposed to distinguish different types of secreted proteins, and yields a high total sensitivity of 90.12% for the test set. When performed on another public independent dataset for further evaluation, a promising prediction result is obtained. Predictions can be implemented freely online at http://cic.scu.edu.cn/bioinformatics/secretPv2_1/index.htm.


Subject(s)
Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Gram-Negative Bacteria/genetics , Gram-Negative Bacteria/metabolism , Sequence Analysis, Protein/methods , Sensitivity and Specificity , Sequence Analysis, Protein/instrumentation
12.
Comput Biol Chem ; 36: 31-5, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22277674

ABSTRACT

Signal peptides play a crucial role in various biological processes, such as localization of cell surface receptors, translocation of secreted proteins and cell-cell communication. However, the amino acid mutation in signal peptides, also called non-synonymous single nucleotide polymorphisms (nsSNPs or SAPs) may lead to the loss of their functions. In the present study, a computational method was proposed for predicting deleterious nsSNPs in signal peptides based on random forest (RF) by incorporating position specific scoring matrix (PSSM) profile, SignalP score and physicochemical properties. These features were optimized by the maximum relevance minimum redundancy (mRMR) method. Then, a cost matrix was used to minimize the effect of the imbalanced data classification problem that usually occurred in nsSNPs prediction. The method achieved an overall accuracy of 84.5% and the area under the ROC curve (AUC) of 0.822 by Jackknife test, when the optimal subset included 10 features. Furthermore, on the same dataset, we compared our predictor with other existing methods, including R-score-based method and D-score-based methods, and the result of our method was superior to those of the two methods. The satisfactory performance suggests that our method is effective in predicting the deleterious nsSNPs in signal peptides.


Subject(s)
Polymorphism, Single Nucleotide , Protein Sorting Signals/genetics , Algorithms , Base Sequence , Databases, Genetic , Humans , Models, Genetic , Molecular Sequence Data , Mutation , Sequence Analysis, Protein
13.
BMC Bioinformatics ; 12: 14, 2011 Jan 12.
Article in English | MEDLINE | ID: mdl-21223604

ABSTRACT

BACKGROUND: The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues. RESULTS: We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively. CONCLUSIONS: The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.


Subject(s)
Amino Acid Substitution , Computational Biology/methods , Genetic Association Studies/methods , Polymorphism, Single Nucleotide , Algorithms , DNA Mutational Analysis , Databases, Protein , Humans , Models, Statistical , Mutation , Proteins/analysis , Sequence Analysis, Protein
14.
J Theor Biol ; 267(1): 1-6, 2010 Nov 07.
Article in English | MEDLINE | ID: mdl-20691704

ABSTRACT

Protein secretion plays an important role in bacterial lifestyles. Secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments, particularly delivering pathogenic and symbiotic bacteria into their eukaryotic hosts. Therefore, identification of bacterial secreted proteins becomes an important process for the study of various diseases and the corresponding drugs. In this paper, fusing several new features into Chou's pseudo-amino acid composition (PseAAC), two support vector machine (SVM)-based ternary classifiers are developed to predict secreted proteins of Gram-negative and Gram-positive bacteria. For the two types of bacteria, the high accuracy of 94.03% and 94.36% are obtained in distinguishing classically secreted, non-classically secreted and non-secreted proteins by our method. In order to compare the practical ability of our method in identifying bacterial secreted proteins with those of six published methods, proteins in Escherichia coli and Bacillus subtilis are collected to construct the test sets of Gram-negative and Gram-positive bacteria, and the prediction results of our method are comparable to those of existing methods. When performed on two public independent data sets for predicting NCSPs, it also yields satisfactory results for Gram-negative bacterial proteins. The prediction server SecretP can be accessed at http://cic.scu.edu.cn/bioinformatics/secretPV2/index.htm.


Subject(s)
Amino Acids/analysis , Bacterial Proteins/analysis , Computational Biology/methods , Neural Networks, Computer , Bacillus subtilis/metabolism , Bacterial Proteins/metabolism , Escherichia coli/metabolism , Gram-Negative Bacteria/metabolism , Gram-Positive Bacteria/metabolism , Internet
15.
Protein J ; 29(1): 62-7, 2010 Jan.
Article in English | MEDLINE | ID: mdl-20049515

ABSTRACT

The purpose of this article is to identify protein structural classes by using support vector machine (SVM) ensemble classifier, which is very efficient in enhancing prediction performance. Firstly, auto covariance (AC) and pseudo-amino acid composition (PseAAC) were used in protein representation. AC focuses on adjacent effects and PseAA composition takes sequence order patterns into account. Secondly, SVMs were trained on the datasets represented by different descriptors. The last, ensemble classifier, which constructed on the individual classifiers through a voting strategy, gave the final prediction results. Meanwhile, very promising prediction accuracy 93.14% was obtained by Jackknife test. The experimental results showed that the ensemble system can improve the prediction performance greatly and generate more stable and safer predictors. The current method featured by fusing the protein primary sequence information transferred by AC and described by protein PseAA composition may play an important complementary role in other related applications.


Subject(s)
Amino Acids/analysis , Computational Biology/methods , Proteins/chemistry , Software , Protein Conformation
16.
Peptides ; 31(4): 574-8, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20045033

ABSTRACT

In contrast to a large number of classically secreted proteins (CSPs) and non-secreted proteins (NSPs), only a few proteins have been experimentally proved to enter non-classical secretory pathways. So it is difficult to identify non-classically secreted proteins (NCSPs), and no methods are available for distinguishing the three types of proteins simultaneously. In order to solve this problem, a data mining has been taken firstly, and mammalian proteins exported via ER-Golgi-independent pathways are collected through extensive literature searches. In this paper, a support vector machine (SVM)-based ternary classifier named SecretP is proposed to predict mammalian secreted proteins by using pseudo-amino acid composition (PseAA) and five additional features. When distinguishing the three types of proteins, SecretP yielded an accuracy of 88.79%. Evaluating the performance of our method by an independent test set of 92 human proteins, 76 of them are correctly predicted as NCSPs. When performed on another public independent data set, the prediction result of SecretP is comparable to those of other existing computational methods. Therefore, SecretP can be a useful supplementary tool for future secretome studies. The web server SecretP and all supplementary tables listed in this paper are freely available at http://cic.scu.edu.cn/bioinformatics/secretp/index.htm.


Subject(s)
Amino Acids/chemistry , Data Mining/methods , Databases, Protein , Proteins , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computational Biology/methods , Humans , Molecular Sequence Data , Protein Sorting Signals/genetics , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Proteomics/methods , Sequence Homology, Amino Acid
17.
J Theor Biol ; 259(2): 366-72, 2009 Jul 21.
Article in English | MEDLINE | ID: mdl-19341746

ABSTRACT

The submitochondria location of a mitochondrial protein is very important for further understanding the structure and function of this protein. Hence, it is of great practical significance to develop an automated and reliable method for timely identifying the submitochondria locations of novel mitochondrial proteins. In this study, a sequence-based algorithm combining the augmented Chou's pseudo amino acid composition (Chou's PseAA) based on auto covariance (AC) is developed to predict protein submitochondria locations and membrane protein types in mitochondria inner membrane. The model fully considers the sequence-order effects between residues a certain distance apart in the sequence by AC combined with eight representative descriptors for both common proteins and membrane proteins. As a result of jackknife cross-validation tests, the method for submitochondria location prediction yields the accuracies of 91.8%, 96.4% and 66.1% for inner membrane, matrix, and outer membrane, respectively. The total accuracy is 89.7%. When predicting membrane protein types in mitochondria inner membrane, the method achieves the prediction performance with the accuracies of 98.4%, 64.3% and 86.7% for multi-pass inner membrane, single-pass inner membrane, and matrix side inner membrane, where the total accuracy is 93.6%. The overall performance of our method is better than the achievements of the previous studies. So our method can be an effective supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://chemlab.scu.edu.cn/Predict_subMITO/index.htm.


Subject(s)
Amino Acids/analysis , Mitochondrial Proteins/analysis , Models, Chemical , Animals , Chemistry, Physical , Membrane Proteins/analysis , Pattern Recognition, Automated
18.
Interdiscip Sci ; 1(4): 315-9, 2009 Dec.
Article in English | MEDLINE | ID: mdl-20640811

ABSTRACT

Machine learning methods play the very important role in protein secondary structure prediction and other related works. On condition of a certain approach, the prediction qualities mostly depend on the ways of representing protein sequences into numeric features. In this paper, two Support Vector Machine (SVM) multi-classification strategies, "one-against-one" (1-a-1) and "one-against-all" (1-a-a), were used in protein structural classes identification. Auto covariance (AC), which transforms the physicochemical properties of the amino acids of the proteins into a data matrix, focuses on the neighboring effects and the interactions between residues in protein sequences. "1-a-1" approach was used on SVM to predict protein structural classes and obtained very promising overall accuracy 90.69% by Jackknife test. It was more than 10% higher than the accuracy obtained by using "1-a-a". Experimental results led to the finding that the SVM predictor constructed by "1-a-1" can avoid the appearance of biased prediction accuracy. This current method, using the protein primary sequence information described by auto covariance (AC) and "1-a-1" approach on SVM, should play an important complementary role in other related applications.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Proteins/chemistry , Proteins/classification , Algorithms , Computer Simulation , Genetic Vectors , Pattern Recognition, Automated/methods , Protein Structure, Secondary , Reproducibility of Results , Sequence Analysis, Protein/methods , Software
19.
Nucleic Acids Res ; 36(9): 3025-30, 2008 May.
Article in English | MEDLINE | ID: mdl-18390576

ABSTRACT

Compared to the available protein sequences of different organisms, the number of revealed protein-protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11,474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.


Subject(s)
Artificial Intelligence , Protein Interaction Mapping/methods , Sequence Analysis, Protein/methods , Amino Acids/chemistry , Computational Biology/methods , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL