Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
BMC Genomics ; 17(Suppl 13): 1025, 2016 12 22.
Article in English | MEDLINE | ID: mdl-28155657

ABSTRACT

BACKGROUND: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). RESULTS: Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. CONCLUSION: This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.


Subject(s)
Brain/metabolism , Gene Expression Profiling , Machine Learning , Organogenesis/genetics , Single-Cell Analysis , Transcriptome , Algorithms , Biomarkers , Brain/embryology , Brain/growth & development , Models, Statistical , Neurogenesis/genetics , Organ Specificity , Reproducibility of Results , Single-Cell Analysis/methods , Support Vector Machine
2.
J Bioinform Comput Biol ; 12(4): 1450014, 2014 Aug.
Article in English | MEDLINE | ID: mdl-25152039

ABSTRACT

Protein-protein interactions (PPIs) are important for understanding the cellular mechanisms of biological functions, but the reliability of PPIs extracted by high-throughput assays is known to be low. To address this, many current methods use multiple evidence from different sources of information to compute reliability scores for such PPIs. However, they often combine the evidence without taking into account the uncertainty of the evidence values, potential dependencies between the information sources used and missing values from some information sources. We propose to formulate the task of scoring PPIs using multiple information sources as a multi-criteria decision making problem that can be solved using data fusion to model potential interactions between the multiple information sources. Using data fusion, the amount of contribution from each information source can be proportioned accordingly to systematically score the reliability of PPIs. Our experimental results showed that the reliability scores assigned by our data fusion method can effectively classify highly reliable PPIs from multiple information sources, with substantial improvement in scoring over conventional approach such as the Adjust CD-Distance approach. In addition, the underlying interactions between the information sources used, as well as their relative importance, can also be determined with our data fusion approach. We also showed that such knowledge can be used to effectively handle missing values from information sources.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Decision Making, Computer-Assisted , Gene Expression , High-Throughput Screening Assays , Reproducibility of Results
3.
PLoS One ; 9(5): e97079, 2014.
Article in English | MEDLINE | ID: mdl-24816822

ABSTRACT

An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.


Subject(s)
Algorithms , Artificial Intelligence/trends , Computational Biology/methods , Gene Regulatory Networks/genetics , Genetic Association Studies/methods , Genetic Diseases, Inborn/genetics , Models, Genetic , Gene Ontology , Humans , Phenotype , Selection Bias
4.
J Bioinform Comput Biol ; 11(6): 1343010, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24372039

ABSTRACT

While high-throughput technologies are expected to play a critical role in clinical translational research for complex disease diagnosis, the ability to accurately and consistently discriminate disease phenotypes by determining the gene and protein expression patterns as signatures of different clinical conditions remains a challenge in translational bioinformatics. In this study, we propose a novel feature selection algorithm: Multi-Resolution-Test (MRT-test) that can produce significantly accurate and consistent phenotype discrimination across a series of omics data. Our algorithm can capture those features contributing to subtle data behaviors instead of selecting the features contributing to global data behaviors, which seems to be essential in achieving clinical level diagnosis for different expression data. Furthermore, as an effective biomarker discovery algorithm, it can achieve linear separation for high-dimensional omics data with few biomarkers. We apply our MRT-test to complex disease phenotype diagnosis by combining it with state-of-the-art classifiers and attain exceptional diagnostic results, which suggests that our method's advantage in molecular diagnostics. Experimental evaluation showed that MRT-test based diagnosis is able to generate consistent and robust clinical-level phenotype separation for various diseases. In addition, based on the seed biomarkers detected by the MRT-test, we design a novel network marker synthesis (NMS) algorithm to decipher the underlying molecular mechanisms of tumorigenesis from a systems viewpoint. Unlike existing top-down gene network building approaches, our network marker synthesis method has a less dependence on the global network and enables it to capture the gene regulators for different subnetwork markers, which will provide biologically meaningful insights for understanding the genetic basis of complex diseases.


Subject(s)
Algorithms , Biomarkers , Computational Biology/methods , Phenotype , Breast Neoplasms/genetics , Cerebellar Neoplasms/genetics , Diagnosis, Computer-Assisted/methods , Female , Humans , Medulloblastoma/genetics , Neoplasms/genetics , Neoplasms/metabolism
5.
Methods Mol Biol ; 939: 9-20, 2013.
Article in English | MEDLINE | ID: mdl-23192537

ABSTRACT

Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.e., a short linear motif (SLiM). We call these interacting domains and motifs domain-SLiM interactions. Existing methods have focused on discovering SLiMs in the interacting proteins' sequence data. With the recent increase in protein structures, we have a new opportunity to detect SLiMs directly from the proteins' 3D structures instead of their linear sequences. In this chapter, we describe a computational method called SLiMDIet to directly detect SLiMs on domain interfaces extracted from 3D structures of PPIs. SLiMDIet comprises two steps: (1) interaction interfaces belonging to the same domain are extracted and grouped together using structural clustering and (2) the extracted interaction interfaces in each cluster are structurally aligned to extract the corresponding SLiM. Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.


Subject(s)
Computational Biology/methods , Protein Interaction Domains and Motifs , Amino Acids/chemistry , Cluster Analysis , Databases, Protein , Proteins/chemistry , Sequence Alignment
6.
BMC Genomics ; 14 Suppl 5: S15, 2013.
Article in English | MEDLINE | ID: mdl-24564427

ABSTRACT

BACKGROUND: Many biological processes are carried out by proteins interacting with each other in the form of protein complexes. However, large-scale detection of protein complexes has remained constrained by experimental limitations. As such, computational detection of protein complexes by applying clustering algorithms on the abundantly available protein-protein interaction (PPI) networks is an important alternative. However, many current algorithms have overlooked the importance of selecting seeds for expansion into clusters without excluding important proteins and including many noisy ones, while ensuring a high degree of functional homogeneity amongst the proteins detected for the complexes. RESULTS: We designed a novel method called Probabilistic Local Walks (PLW) which clusters regions in a PPI network with high functional similarity to find protein complex cores with high precision and efficiency in O (|V| log |V| + |E|) time. A seed selection strategy, which prioritises seeds with dense neighbourhoods, was devised. We defined a topological measure, called common neighbour similarity, to estimate the functional similarity of two proteins given the number of their common neighbours. CONCLUSIONS: Our proposed PLW algorithm achieved the highest F-measure (recall and precision) when compared to 11 state-of-the-art methods on yeast protein interaction data, with an improvement of 16.7% over the next highest score. Our experiments also demonstrated that our seed selection strategy is able to increase algorithm precision when applied to three previous protein complex mining techniques. AVAILABILITY: The software, datasets and predicted complexes are available at http://wonglkd.github.io/PLW.


Subject(s)
Computational Biology/methods , Fungal Proteins/analysis , Yeasts/metabolism , Algorithms , Protein Interaction Mapping , Software
7.
J Bioinform Comput Biol ; 10(5): 1250012, 2012 Oct.
Article in English | MEDLINE | ID: mdl-22849367

ABSTRACT

Living cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the differential expression of a target gene in a TRN is challenging, especially when multiple TFs collaboratively participate in the transcriptional regulation. To unravel the roles of the TFs in the regulatory networks, we model the underlying regulatory interactions in terms of the TF-target interactions' directions (activation or repression) and their corresponding logical roles (necessary and/or sufficient). We design a set of constraints that relate gene expression patterns to regulatory interaction models, and develop TRIM (Transcriptional Regulatory Interaction Model Inference), a new hidden Markov model, to infer the models of TF-target interactions in large-scale TRNs of complex organisms. Besides, by training TRIM with wild-type time-series gene expression data, the activation timepoints of each regulatory module can be obtained. To demonstrate the advantages of TRIM, we applied it on yeast TRN to infer the TF-target interaction models for individual TFs as well as pairs of TFs in collaborative regulatory modules. By comparing with TF knockout and other gene expression data, we were able to show that the performance of TRIM is clearly higher than DREM (the best existing algorithm). In addition, on an individual Arabidopsis binding network, we showed that the target genes' expression correlations can be significantly improved by incorporating the TF-target regulatory interaction models inferred by TRIM into the expression data analysis, which may introduce new knowledge in transcriptional dynamics and bioactivation.


Subject(s)
Gene Regulatory Networks , Transcription Factors/genetics , Algorithms , Databases, Genetic , Gene Expression Profiling , Gene Expression Regulation , Markov Chains
8.
Bioinformatics ; 28(20): 2640-7, 2012 Oct 15.
Article in English | MEDLINE | ID: mdl-22923290

ABSTRACT

BACKGROUND: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. RESULT: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. CONCLUSION: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. AVAILABILITY AND IMPLEMENTATION: The executable program and data are available at http://www1.i2r.a-star.edu.sg/~xlli/PUDI/PUDI.html.


Subject(s)
Artificial Intelligence , Disease/genetics , Genes , Algorithms , Humans , Support Vector Machine
9.
J Comput Biol ; 19(9): 1027-42, 2012 Sep.
Article in English | MEDLINE | ID: mdl-21777084

ABSTRACT

Many cellular functions involve protein complexes that are formed by multiple interacting proteins. Tandem Affinity Purification (TAP) is a popular experimental method for detecting such multi-protein interactions. However, current computational methods that predict protein complexes from TAP data require converting the co-complex relationships in TAP data into binary interactions. The resulting pairwise protein-protein interaction (PPI) network is then mined for densely connected regions that are identified as putative protein complexes. Converting the TAP data into PPI data not only introduces errors but also loses useful information about the underlying multi-protein relationships that can be exploited to detect the internal organization (i.e., core-attachment structures) of protein complexes. In this article, we propose a method called CACHET that detects protein complexes with Core-AttaCHment structures directly from bipartitETAP data. CACHET models the TAP data as a bipartite graph in which the two vertex sets are the baits and the preys, respectively. The edges between the two vertex sets represent bait-prey relationships. CACHET first focuses on detecting high-quality protein-complex cores from the bipartite graph. To minimize the effects of false positive interactions, the bait-prey relationships are indexed with reliability scores. Only non-redundant, reliable bicliques computed from the TAP bipartite graph are regarded as protein-complex cores. CACHET constructs protein complexes by including attachment proteins into the cores. We applied CACHET on large-scale TAP datasets and found that CACHET outperformed existing methods in terms of prediction accuracy (i.e., F-measure and functional homogeneity of predicted complexes). In addition, the protein complexes predicted by CACHET are equipped with core-attachment structures that provide useful biological insights into the inherent functional organization of protein complexes. Our supplementary material can be found at http://www1.i2r.a-star.edu.sg/~xlli/CACHET/CACHET.htm ; binary executables can also be found there. Supplementary Material is also available at www.liebertonline.com/cmb.


Subject(s)
Chromatography, Affinity/methods , Protein Interaction Mapping/methods , Protein Interaction Maps , Proteins/metabolism , Algorithms , Databases, Protein , Models, Biological
10.
IEEE Trans Neural Netw Learn Syst ; 23(2): 317-29, 2012 Feb.
Article in English | MEDLINE | ID: mdl-24808510

ABSTRACT

Appetitive operant conditioning in Aplysia for feeding behavior via the electrical stimulation of the esophageal nerve contingently reinforces each spontaneous bite during the feeding process. This results in the acquisition of operant memory by the contingently reinforced animals. Analysis of the cellular and molecular mechanisms of the feeding motor circuitry revealed that activity-dependent neuronal modulation occurs at the interneurons that mediate feeding behaviors. This provides evidence that interneurons are possible loci of plasticity and constitute another mechanism for memory storage in addition to memory storage attributed to activity-dependent synaptic plasticity. In this paper, an associative ambiguity correction-based neuro-fuzzy network, called appetitive reward-based pseudo-outer-product-compositional rule of inference [ARPOP-CRI(S)], is trained based on an appetitive reward-based learning algorithm which is biologically inspired by the appetitive operant conditioning of the feeding behavior in Aplysia. A variant of the Hebbian learning rule called Hebbian concomitant learning is proposed as the building block in the neuro-fuzzy network learning algorithm. The proposed algorithm possesses the distinguishing features of the sequential learning algorithm. In addition, the proposed ARPOP-CRI(S) neuro-fuzzy system encodes fuzzy knowledge in the form of linguistic rules that satisfies the semantic criteria for low-level fuzzy model interpretability. ARPOP-CRI(S) is evaluated and compared against other modeling techniques using benchmark time-series datasets. Experimental results are encouraging and show that ARPOP-CRI(S) is a viable modeling technique for time-variant problem domains.


Subject(s)
Aplysia/physiology , Appetite/physiology , Biomimetics/methods , Conditioning, Operant/physiology , Feeding Behavior/physiology , Neural Networks, Computer , Algorithms , Animals , Artificial Intelligence , Fuzzy Logic , Pattern Recognition, Automated/methods , Reward
11.
J Proteome Res ; 10(12): 5285-95, 2011 Dec 02.
Article in English | MEDLINE | ID: mdl-22004555

ABSTRACT

Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant.


Subject(s)
Algorithms , Data Mining/methods , Protein Interaction Domains and Motifs , Protein Interaction Mapping/methods , Software , Amino Acid Motifs , Amino Acid Sequence , Animals , Computational Biology/methods , Databases, Protein , Humans , Mice , Molecular Sequence Data , Reproducibility of Results , Sequence Alignment , Sequence Analysis, Protein/methods
12.
PLoS One ; 6(7): e21502, 2011.
Article in English | MEDLINE | ID: mdl-21799737

ABSTRACT

BACKGROUND: Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases. METHODOLOGY/PRINCIPAL FINDINGS: We proposed a technique called RWPCN (Random Walker on Protein Complex Network) for predicting and prioritizing disease genes. The basis of RWPCN is a protein complex network constructed using existing human protein complexes and protein interaction network. To prioritize candidate disease genes for the query disease phenotypes, we compute the associations between the protein complexes and the query phenotypes in their respective protein complex and phenotype networks. We tested RWPCN on predicting gene-phenotype associations using leave-one-out cross-validation; our method was observed to outperform existing approaches. We also applied RWPCN to predict novel disease genes for two representative diseases, namely, Breast Cancer and Diabetes. CONCLUSIONS/SIGNIFICANCE: Guilt-by-association prediction and prioritization of disease genes can be enhanced by fully exploiting the underlying modular organizations of both the disease phenome and the protein interactome. Our RWPCN uses a novel protein complex network as a basis for interrogating the human phenome-interactome network. As the protein complex network can capture the underlying modularity in the biological interaction networks better than simple protein interaction networks, RWPCN was found to be able to detect and prioritize disease genes better than traditional approaches that used only protein-phenotype associations.


Subject(s)
Computational Biology/methods , Disease/genetics , Phenotype , Protein Interaction Maps/genetics , Algorithms , Genome, Human/genetics , Humans
13.
BMC Bioinformatics ; 11 Suppl 7: S8, 2010 Oct 15.
Article in English | MEDLINE | ID: mdl-21106130

ABSTRACT

BACKGROUND: Protein-protein interactions (PPIs) play important roles in various cellular processes. However, the low quality of current PPI data detected from high-throughput screening techniques has diminished the potential usefulness of the data. We need to develop a method to address the high data noise and incompleteness of PPI data, namely, to filter out inaccurate protein interactions (false positives) and predict putative protein interactions (false negatives). RESULTS: In this paper, we proposed a novel two-step method to integrate diverse biological and computational sources of supporting evidence for reliable PPIs. The first step, interaction binning or InterBIN, groups PPIs together to more accurately estimate the likelihood (Bin-Confidence score) that the protein pairs interact for each biological or computational evidence source. The second step, interaction classification or InterCLASS, integrates the collected Bin-Confidence scores to build classifiers and identify reliable interactions. CONCLUSIONS: We performed comprehensive experiments on two benchmark yeast PPI datasets. The experimental results showed that our proposed method can effectively eliminate false positives in detected PPIs and identify false negatives by predicting novel yet reliable PPIs. Our proposed method also performed significantly better than merely using each of individual evidence sources, illustrating the importance of integrating various biological and computational sources of data and evidence.


Subject(s)
Computational Biology/methods , Saccharomyces cerevisiae Proteins/metabolism , Protein Interaction Mapping/methods , Reproducibility of Results , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Software
14.
BMC Genomics ; 11 Suppl 1: S3, 2010 Feb 10.
Article in English | MEDLINE | ID: mdl-20158874

ABSTRACT

BACKGROUND: Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse. Thus computational approaches for detecting protein complexes from protein interaction data are useful complements to the limited experimental methods. They can be used together with the experimental methods for mapping the interactions of proteins to understand how different proteins are organized into higher-level substructures to perform various cellular functions. RESULTS: Given the abundance of pairwise protein interaction data from high-throughput genome-wide experimental screenings, a protein interaction network can be constructed from protein interaction data by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins as a link. This binary protein interaction graph can then be used for detecting protein complexes using graph clustering techniques. In this paper, we review and evaluate the state-of-the-art techniques for computational detection of protein complexes, and discuss some promising research directions in this field. CONCLUSIONS: Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes. In addition, the computational approaches have also improved in performance over the years. Further improvements could be achieved if the quality of the underlying protein interaction data can be considered adequately to minimize the undesirable effects from the irrelevant and noisy sources, and the various biological evidences can be better incorporated into the detection process to maximize the exploitation of the increasing wealth of biological knowledge available.


Subject(s)
Proteins/analysis , Systems Biology/methods , Biometry , Gene Expression Profiling , Humans , Protein Binding , Protein Interaction Mapping , Proteins/metabolism
15.
Bioinformatics ; 26(8): 1036-42, 2010 Apr 15.
Article in English | MEDLINE | ID: mdl-20167627

ABSTRACT

MOTIVATION: An important class of protein interactions involves the binding of a protein's domain to a short linear motif (SLiM) on its interacting partner. Extracting such motifs, either experimentally or computationally, is challenging because of their weak binding and high degree of degeneracy. Recent rapid increase of available protein structures provides an excellent opportunity to study SLiMs directly from their 3D structures. RESULTS: Using domain interface extraction (Diet), we characterized 452 distinct SLiMs from the Protein Data Bank (PDB), of which 155 are validated in varying degrees-40 have literature validation, 54 are supported by at least one domain-peptide structural instance, and another 61 have overrepresentation in high-throughput PPI data. We further observed that the lacklustre coverage of existing computational SLiM detection methods could be due to the common assumption that most SLiMs occur outside globular domain regions. 198 of 452 SLiM that we reported are actually found on domain-domain interface; some of them are implicated in autoimmune and neurodegenerative diseases. We suggest that these SLiMs would be useful for designing inhibitors against the pathogenic protein complexes underlying these diseases. Our findings show that 3D structure-based SLiM detection algorithms can provide a more complete coverage of SLiM-mediated protein interactions than current sequence-based approaches.


Subject(s)
Genomics/methods , Protein Interaction Domains and Motifs , Software , Amino Acid Motifs , Databases, Protein , Sequence Analysis, Protein/methods
17.
BMC Bioinformatics ; 10: 169, 2009 Jun 02.
Article in English | MEDLINE | ID: mdl-19486541

ABSTRACT

BACKGROUND: How to detect protein complexes is an important and challenging task in post genomic era. As the increasing amount of protein-protein interaction (PPI) data are available, we are able to identify protein complexes from PPI networks. However, most of current studies detect protein complexes based solely on the observation that dense regions in PPI networks may correspond to protein complexes, but fail to consider the inherent organization within protein complexes. RESULTS: To provide insights into the organization of protein complexes, this paper presents a novel core-attachment based method (COACH) which detects protein complexes in two stages. It first detects protein-complex cores as the "hearts" of protein complexes and then includes attachments into these cores to form biologically meaningful structures. We evaluate and analyze our predicted protein complexes from two aspects. First, we perform a comprehensive comparison between our proposed method and existing techniques by comparing the predicted complexes against benchmark complexes. Second, we also validate the core-attachment structures using various biological evidence and knowledge. CONCLUSION: Our proposed COACH method has been applied on two different yeast PPI networks and the experimental results show that COACH performs significantly better than the state-of-the-art techniques. In addition, the identified complexes with core-attachment structures are demonstrated to match very well with existing biological knowledge and thus provide more insights for future biological study.


Subject(s)
Multiprotein Complexes , Protein Interaction Mapping/methods , Proteins , Software , Algorithms , Data Interpretation, Statistical , Databases, Protein , Multiprotein Complexes/chemistry , Multiprotein Complexes/metabolism , Protein Interaction Domains and Motifs , Proteins/chemistry , Proteins/metabolism , Reproducibility of Results
18.
Ann N Y Acad Sci ; 1158: 224-33, 2009 Mar.
Article in English | MEDLINE | ID: mdl-19348644

ABSTRACT

The protein-protein subnetwork prediction challenge presented at the 2nd Dialogue for Reverse Engineering Assessments and Methods (DREAM2) conference is an important computational problem essential to proteomic research. Given a set of proteins from the Saccharomyces cerevisiae (baker's yeast) genome, the task is to rank all possible interactions between the proteins from the most likely to the least likely. To tackle this task, we adopt a graph-based strategy to combine multiple sources of biological data and computational predictions. Using training and testing sets extracted from existing yeast protein-protein interactions, we evaluate our method and show that it can produce better predictions than any of the individual data sources. This technique is then used to produce our entry for the protein-protein subnetwork prediction challenge.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping , Saccharomyces cerevisiae Proteins , Area Under Curve , Databases, Protein , Genome, Fungal , Models, Genetic , ROC Curve , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
19.
J Neurotrauma ; 26(8): 1177-82, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19371145

ABSTRACT

Traumatic brain injury is a major socioeconomic burden, and the use of statistical models to predict outcomes after head injury can help to allocate limited health resources. Earlier prediction models analyzing admission data have been used to achieve prediction accuracies of up to 80%. Our aim was to design statistical models utilizing a combination of both physiological and biochemical variables obtained from multimodal monitoring in the neurocritical care setting as a complement to earlier models. We used decision tree and logistic regression analysis on variables including intracranial pressure (ICP), mean arterial pressure (MAP), cerebral perfusion pressure (CPP), and pressure reactivity index (PRx), as well as multimodal monitoring parameters to assess brain tissue oxygenation (PbtO(2)), and microdialysis parameters to predict outcomes based on a dichotomized Glasgow Outcome Score. Further analysis was carried out on various subgroup combinations of physiological and biochemical parameters. The reliability of the head injury models was assessed using a 10-fold cross-validation technique. In addition, the confusion matrix was also used to assess the sensitivity, specificity, and the F-ratio. In all, 2,413 time series records were extracted from 26 patients treated at our neurocritical care unit over a 1-year period. Decision tree analysis was found to be superior to logistic regression analysis in predictive accuracy of outcome. The combined use of microdialysis variables and PbtO(2), in addition to ICP, MAP, and CPP was found have the best predictive accuracy. The use of physiological and biochemical variables based on a decision tree analysis model has shown to provide an improvement in predictive accuracy compared with other previous models. The potential application is for outcome prediction in the multivariate setting of advanced multimodality monitoring, and validates the use of multimodal monitoring in the neurocritical care setting to have a potential benefit in predicting outcomes of patients with severe head injury.


Subject(s)
Brain Injuries/diagnosis , Craniocerebral Trauma/diagnosis , Models, Statistical , Blood Pressure/physiology , Decision Trees , Female , Glasgow Outcome Scale , Humans , Intracranial Pressure/physiology , Logistic Models , Male , Microdialysis , Predictive Value of Tests , Prognosis , Prospective Studies
20.
Nucleic Acids Res ; 37(Database issue): D858-62, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18948286

ABSTRACT

Parkinson's disease (PD) is the second most common neurodegenerative disorder affecting millions of people. Both environmental and genetic factors play important roles in its causation and development. Genetic analysis has shown that over 100 genes are correlated with the etiology and pathology of PD. However, accessing genetic information in a consistent and fruitful way is not an easy task. The Mutation Database for Parkinson's Disease (MDPD) is designed to fulfill the need for information integration so that users can easily retrieve, inspect and enhance their knowledge on PD. The database contains 2391 entries on 202 genes extracted from 576 publications and manually examined by biomedical researchers. Each genetic substitution and the resulting impact are clearly labelled and linked to its primary reference. Every reported gene has a summary page that provides information on the variation impact, mutation type, the studied population, mutation position and reference collection. In addition, MDPD provides a unique functionality for users to compare the differences on the type of mutations among ethnic groups. As such, we hope that MDPD will serve as a valuable tool to bridge the gap between genetic analysis and clinical practice. MDPD is publicly accessible at http://datam.i2r.a-star.edu.sg/mdpd/.


Subject(s)
Databases, Genetic , Mutation , Parkinson Disease/genetics , Humans , Polymorphism, Single Nucleotide , Systems Integration
SELECTION OF CITATIONS
SEARCH DETAIL
...