Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
PeerJ ; 2: e315, 2014.
Article in English | MEDLINE | ID: mdl-24711967

ABSTRACT

Many protein domains bind to short peptide sequences, called linear motifs. Data on their sequence specificities is sparse, which is why biologists usually resort to basic pattern searches to identify new putative binding sites for experimental follow-up. Most motifs have poor specificity and prioritization of the matches is thus crucial when scanning a full proteome with a pattern. Here we present a generic method to prioritize motif occurrence predictions by using cellular contextual information. We take 2 parameters as input: the motif occurrences and one or more of the interacting domains. The potential hits are ranked based on how strongly the context network associates them with a protein containing one of the specified domains, which leads to an increased predictive performance. The method is available through a web interface at doremi.jensenlab.org, which allows for an easy application of the method. We show that this approach leads to improved predictions of binding partners for PDZ domains and the SUMO binding domain. This is consistent with the earlier observation that coupling sequence motifs with network information improves kinase-specific substrate predictions.

2.
BMC Bioinformatics ; 14: 224, 2013 Jul 15.
Article in English | MEDLINE | ID: mdl-23855714

ABSTRACT

BACKGROUND: Computational protein short linear motif discovery can use protein interaction information to search for motifs among proteins which share a common interactor. Cytoscape provides a visual interface for protein networks but there is no streamlined way to rapidly visualize motifs in a network of proteins, or to integrate computational discovery with such visualizations. RESULTS: We present SLiMScape, a Cytoscape plugin, which enables both de novo motif discovery and searches for instances of known motifs. Data is presented using Cytoscape's visualization features thus providing an intuitive interface for interpreting results. The distribution of discovered or user-defined motifs may be selectively displayed and the distribution of protein domains may be viewed simultaneously. To facilitate this SLiMScape automatically retrieves domains for each protein. CONCLUSION: SLiMScape provides a platform for performing short linear motif analyses of protein interaction networks by integrating motif discovery and search tools in a network visualization environment. This significantly aids in the discovery of novel short linear motifs and in visualizing the distribution of known motifs.


Subject(s)
Amino Acid Motifs , Software , Protein Interaction Maps , Sequence Analysis, Protein
3.
Bioinformatics ; 29(9): 1120-6, 2013 May 01.
Article in English | MEDLINE | ID: mdl-23505299

ABSTRACT

MOTIVATION: Peptides play important roles in signalling, regulation and immunity within an organism. Many have successfully been used as therapeutic products often mimicking naturally occurring peptides. Here we present PeptideLocator for the automated prediction of functional peptides in a protein sequence. RESULTS: We have trained a machine learning algorithm to predict bioactive peptides within protein sequences. PeptideLocator performs well on training data achieving an area under the curve of 0.92 when tested in 5-fold cross-validation on a set of 2202 redundancy reduced peptide containing protein sequences. It has predictive power when applied to antimicrobial peptides, cytokines, growth factors, peptide hormones, toxins, venoms and other peptides. It can be applied to refine the choice of experimental investigations in functional studies of proteins. AVAILABILITY AND IMPLEMENTATION: PeptideLocator is freely available for academic users at http://bioware.ucd.ie/.


Subject(s)
Algorithms , Peptides/chemistry , Sequence Analysis, Protein/methods , Antimicrobial Cationic Peptides/chemistry , Artificial Intelligence , Peptides/classification , Proteins/chemistry
4.
PLoS One ; 7(10): e45012, 2012.
Article in English | MEDLINE | ID: mdl-23056189

ABSTRACT

The conventional wisdom is that certain classes of bioactive peptides have specific structural features that endow their particular functions. Accordingly, predictions of bioactivity have focused on particular subgroups, such as antimicrobial peptides. We hypothesized that bioactive peptides may share more general features, and assessed this by contrasting the predictive power of existing antimicrobial predictors as well as a novel general predictor, PeptideRanker, across different classes of peptides.We observed that existing antimicrobial predictors had reasonable predictive power to identify peptides of certain other classes i.e. toxin and venom peptides. We trained two general predictors of peptide bioactivity, one focused on short peptides (4-20 amino acids) and one focused on long peptides (> 20 amino acids). These general predictors had performance that was typically as good as, or better than, that of specific predictors. We noted some striking differences in the features of short peptide and long peptide predictions, in particular, high scoring short peptides favour phenylalanine. This is consistent with the hypothesis that short and long peptides have different functional constraints, perhaps reflecting the difficulty for typical short peptides in supporting independent tertiary structure.We conclude that there are general shared features of bioactive peptides across different functional classes, indicating that computational prediction may accelerate the discovery of novel bioactive peptides and aid in the improved design of existing peptides, across many functional classes. An implementation of the predictive method, PeptideRanker, may be used to identify among a set of peptides those that may be more likely to be bioactive.


Subject(s)
Algorithms , Computational Biology/methods , Drug Design , Peptides/chemistry , Amino Acid Sequence , Animals , Anti-Infective Agents/chemistry , Databases, Factual , Humans , Peptide Hormones/chemistry , Reproducibility of Results , Toxins, Biological/chemistry
5.
Sci Signal ; 5(243): pe40, 2012 Sep 25.
Article in English | MEDLINE | ID: mdl-23012652

ABSTRACT

Interactions between short peptides within proteins and peptide-binding domains can trigger many important cell signaling processes, and their interactions are typically of modest affinity. A study showed that this modest affinity appears to be favored by evolution. They used phage display selection to discover "superbinder" Src Homology 2 (SH2) domains, which bound peptides with much stronger affinity than naturally occurring SH2 domains. These superbinder domains had strong biological effects, such as blocking cell signaling. Although the superbinders had higher affinity, this did not appear to reduce their specificity. In contrast, SH2-binding peptides from bacterial pathogens have evolved to exhibit promiscuity of binding to multiple SH2 domains, carried within effector proteins that subvert signaling upon entry into the mammalian cell. Because there are many potential peptide binders of the SH2 domain found in numerous human proteins, modest affinity not only may optimize transient signaling mediated by reversible interactions but also may minimize off-target deleterious binding effects. The stage is set for a more thorough evaluation of the specificity and off-target impact of both naturally occurring and artificial domains and peptides. This may help define both targets and reagents for therapeutic intervention in key signaling processes mediated by short peptides.


Subject(s)
Evolution, Molecular , Models, Biological , Peptides/metabolism , Protein Interaction Domains and Motifs/physiology , Signal Transduction/physiology , Animals , Humans , Protein Binding/physiology , src Homology Domains
6.
BMC Bioinformatics ; 13: 104, 2012 May 18.
Article in English | MEDLINE | ID: mdl-22607209

ABSTRACT

BACKGROUND: Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3-10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation. RESULTS: The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset. CONCLUSIONS: Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.


Subject(s)
Amino Acid Motifs , Sequence Analysis, Protein/methods , Conserved Sequence , Databases, Protein , Humans , Protein Structure, Tertiary , Proteins/chemistry
7.
Nucleic Acids Res ; 40(Database issue): D242-51, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22110040

ABSTRACT

Linear motifs are short, evolutionarily plastic components of regulatory proteins and provide low-affinity interaction interfaces. These compact modules play central roles in mediating every aspect of the regulatory functionality of the cell. They are particularly prominent in mediating cell signaling, controlling protein turnover and directing protein localization. Given their importance, our understanding of motifs is surprisingly limited, largely as a result of the difficulty of discovery, both experimentally and computationally. The Eukaryotic Linear Motif (ELM) resource at http://elm.eu.org provides the biological community with a comprehensive database of known experimentally validated motifs, and an exploratory tool to discover putative linear motifs in user-submitted protein sequences. The current update of the ELM database comprises 1800 annotated motif instances representing 170 distinct functional classes, including approximately 500 novel instances and 24 novel classes. Several older motif class entries have been also revisited, improving annotation and adding novel instances. Furthermore, addition of full-text search capabilities, an enhanced interface and simplified batch download has improved the overall accessibility of the ELM data. The motif discovery portion of the ELM resource has added conservation, and structural attributes have been incorporated to aid users to discriminate biologically relevant motifs from stochastically occurring non-functional instances.


Subject(s)
Amino Acid Motifs , Databases, Protein , Computer Graphics , Disease/genetics , Eukaryota , Sequence Analysis, Protein , User-Computer Interface , Viral Proteins/chemistry
8.
J Mol Biol ; 415(1): 193-204, 2012 Jan 06.
Article in English | MEDLINE | ID: mdl-22079048

ABSTRACT

Short linear motifs in proteins (typically 3-12 residues in length) play key roles in protein-protein interactions by frequently binding specifically to peptide binding domains within interacting proteins. Their tendency to be found in disordered segments of proteins has meant that they have often been overlooked. Here we present SLiMPred (short linear motif predictor), the first general de novo method designed to computationally predict such regions in protein primary sequences independent of experimentally defined homologs and interactors. The method applies machine learning techniques to predict new motifs based on annotated instances from the Eukaryotic Linear Motif database, as well as structural, biophysical, and biochemical features derived from the protein primary sequence. We have integrated these data sources and benchmarked the predictive accuracy of the method, and found that it performs equivalently to a predictor of protein binding regions in disordered regions, in addition to having predictive power for other classes of motif sites such as polyproline II helix motifs and short linear motifs lying in ordered regions. It will be useful in predicting peptides involved in potential protein associations and will aid in the functional characterization of proteins, especially of proteins lacking experimental information on structures and interactions. We conclude that, despite the diversity of motif sequences and structures, SLiMPred is a valuable tool for prioritizing potential interaction motifs in proteins.


Subject(s)
Amino Acid Motifs , Protein Interaction Domains and Motifs , Proteins/chemistry , Proteins/metabolism , Amino Acid Sequence , Artificial Intelligence , Binding Sites , Databases, Protein , Humans , Protein Binding , Protein Structure, Secondary , Protein Structure, Tertiary , Proteome/chemistry , Proteome/genetics , Sequence Alignment/methods , Sequence Analysis, Protein/methods
9.
Nucleic Acids Res ; 39(Web Server issue): W56-60, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21622654

ABSTRACT

Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch 2.0 (Short, Linear Motif Search) web server allows researchers to identify occurrences of a user-defined SLiM in a proteome, using conservation and protein disorder context statistics to rank occurrences. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. For each motif occurrence, overlapping UniProt features and annotated SLiMs are displayed. Visualization also includes annotated multiple sequence alignments surrounding each occurrence, showing conservation and protein disorder statistics in addition to known and predicted SLiMs, protein domains and known post-translational modifications. In addition, enrichment of Gene Ontology terms and protein interaction partners are provided as indicators of possible motif function. All web server results are available for download. Users can search motifs against the human proteome or a subset thereof defined by Uniprot accession numbers or GO term. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch2.html.


Subject(s)
Amino Acid Motifs , Software , Algorithms , Humans , Internet , Proteomics , User-Computer Interface
11.
J Proteome Res ; 9(7): 3759-63, 2010 Jul 02.
Article in English | MEDLINE | ID: mdl-20496950

ABSTRACT

Antibodies are a primary research tool for a diverse range of experiments in biology, from development to pathology. Their utility is derived from their ability to specifically identify proteins at a high level of sensitivity. This diversity of experimental requirements stretches the capabilities of these key research reagents. However, antibodies seem well placed to answer the challenges of the forthcoming proteome-scale biology. Their use in such a wide variety of experimental requirements impacts on the choice of epitope used to raise the antibody. Understanding the constraints imposed by the experimental configuration is crucial to developing well-characterized affinity reagents. Their application to a wide range of biological fields and relatively low-cost of manufacture has ensured that the demand for a resource of well-characterized antibodies will remain high and that they will be an important biological resource for the foreseeable future. This demand will only increase as the number of therapeutic targets continues to grow. Current tools to aid in the production of affinity reagents are disparate and not freely available. We present a freely available Web resource ( http://epic.embl.de ) for the proteomics community; the Epitope Choice Resource (EpiC) for the selection of epitopes and characterization of the target protein. It provides the community with a single Web-based portal for the exploration of epitopes on a target protein and connects over the Internet to a wide range of bioinformatic tools ensuring that data being presented are up to date.


Subject(s)
Antibodies , Databases, Protein , Epitopes , Immunologic Techniques , Proteomics/methods , Software , Antibody Affinity , Consensus Sequence , Humans , Internet
12.
Nucleic Acids Res ; 38(Web Server issue): W534-9, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20497999

ABSTRACT

Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein-protein interactions. The Short, Linear Motif Finder (SLiMFinder) web server is a de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. Motifs are returned with an intuitive P-value that greatly reduces the problem of false positives and is accessible to biologists of all disciplines. Input can be uploaded by the user or extracted directly from UniProt. Numerous masking options give the user great control over the contextual information to be included in the analyses. The SLiMFinder server combines these with user-friendly output and visualizations of motif context to allow the user to quickly gain insight into the validity of a putatively functional motif. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Returned motifs can also be compared with known SLiMs from the literature using CompariMotif. All results are available for download. The SLiMFinder server is available at: http://bioware.ucd.ie/slimfinder.html.


Subject(s)
Amino Acid Motifs , Software , Algorithms , Internet , Sequence Alignment , Sequence Analysis, Protein , User-Computer Interface
13.
Nucleic Acids Res ; 38(Database issue): D167-80, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19920119

ABSTRACT

Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a 'Bar Code' format, which also displays known instances from homologous proteins through a novel 'Instance Mapper' protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.


Subject(s)
Amino Acid Motifs/genetics , Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Eukaryotic Cells/chemistry , Amino Acid Sequence , Animals , Computational Biology/trends , Databases, Protein , Humans , Information Storage and Retrieval/methods , Internet , Molecular Sequence Data , Protein Structure, Tertiary , Sequence Homology, Amino Acid , Software
14.
Mol Cell Proteomics ; 9(1): 1-10, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19674966

ABSTRACT

Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site.


Subject(s)
Databases, Protein/standards , Proteome/analysis , Database Management Systems/standards , Humans , International Cooperation , Proteomics/methods , Terminology as Topic
15.
J Phys Condens Matter ; 21(3): 034106, 2009 Jan 21.
Article in English | MEDLINE | ID: mdl-21817251

ABSTRACT

We present the theory of thermal equivalence in the framework of the Peyrard-Bishop model and some of its anharmonic variants. The thermal equivalence gives rise to a melting index τ which maps closely the experimental DNA melting temperatures for short DNA sequences. We show that the efficient calculation of the melting index can be used to analyse the parameters of the Peyrard-Bishop model and propose an improved set of Morse potential parameters. With this new set we are able to calculate some of the experimental melting temperatures to ± 1.2 °C. We review some of the concepts of sequencing probe design and show how to use the melting index to explore the possibilities of gene coverage by tuning the model parameters.

16.
PLoS One ; 3(6): e2500, 2008 Jun 18.
Article in English | MEDLINE | ID: mdl-18563203

ABSTRACT

BACKGROUND: Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences. RESULTS: We designed overlapping probe libraries for a range of highly variable drug target genes based on known sequence information and develop a formal terminology to describe probe library design. We find that for some targets these libraries can provide good coverage of a previously unseen target whereas for others the coverage is less than 30%. The optimal probe length varies from as short at 12 nt to as large as 19 nt and depends on the sequence, its variability, and the stringency of thermal filtering. It cannot be determined from inspection of an example gene sequence. CONCLUSIONS: Optimal probe length and the optimal number of reference sequences used to design a probe library are highly target specific for highly variable sequencing targets. The optimum design cannot be determined simply by inspection of input sequences or of alignments but only by detailed analysis of the each specific target. For highly variable sequences, shorter probes can in some cases provide better information than longer probes. Probe library design would benefit from a general purpose tool for analysing these issues. The formal terminology developed here and the analysis approaches it is used to describe will contribute to the development of such tools.


Subject(s)
Molecular Probes , HIV/genetics , Hepacivirus/genetics , Orthomyxoviridae/genetics , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
17.
Front Biosci ; 13: 6580-603, 2008 May 01.
Article in English | MEDLINE | ID: mdl-18508681

ABSTRACT

It is now clear that a detailed picture of cell regulation requires a comprehensive understanding of the abundant short protein motifs through which signaling is channeled. The current body of knowledge has slowly accumulated through piecemeal experimental investigation of individual motifs in signaling. Computational methods contributed little to this process. A new generation of bioinformatics tools will aid the future investigation of motifs in regulatory proteins, and the disordered polypeptide regions in which they frequently reside. Allied to high throughput methods such as phosphoproteomics, signaling networks are becoming amenable to experimental deconstruction. In this review, we summarise the current state of linear motif biology, which uses low affinity interactions to create cooperative, combinatorial and highly dynamic regulatory protein complexes. The discrete deterministic properties implicit to these assemblies suggest that models for cell regulatory networks in systems biology should neither be overly dependent on stochastic nor on smooth deterministic approximations.


Subject(s)
Cell Physiological Phenomena , Signal Transduction , Animals , Endoplasmic Reticulum/physiology , Homeostasis , Mammals , Models, Biological , Proteins/physiology , Reproducibility of Results
18.
Nucleic Acids Res ; 33(19): e171, 2005 Nov 07.
Article in English | MEDLINE | ID: mdl-16275781

ABSTRACT

Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20-30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.


Subject(s)
Genomics/methods , Sequence Analysis, DNA/methods , Chromosomes, Human, Pair 1 , Feasibility Studies , Genome, Bacterial , Genome, Human , Genome, Viral , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...