Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Comput Struct Biotechnol J ; 19: 6090-6097, 2021.
Article in English | MEDLINE | ID: mdl-34849210

ABSTRACT

Hidden Markov Models (HMMs) are amongst the most successful methods for predicting protein features in biological sequence analysis. However, there are biological problems where the Markovian assumption is not sufficient since the sequence context can provide useful information for prediction purposes. Several extensions of HMMs have appeared in the literature in order to overcome their limitations. We apply here a hybrid method that combines HMMs and Neural Networks (NNs), termed Hidden Neural Networks (HNNs), for biological sequence analysis in a straightforward manner. In this framework, the traditional HMM probability parameters are replaced by NN outputs. As a case study, we focus on the topology prediction of for alpha-helical and beta-barrel membrane proteins. The HNNs show performance gains compared to standard HMMs and the respective predictors outperform the top-scoring methods in the field. The implementation of HNNs can be found in the package JUCHMME, downloadable from http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. The updated PRED-TMBB2 and HMM-TM prediction servers can be accessed at www.compgen.org.

2.
Front Bioinform ; 1: 646581, 2021.
Article in English | MEDLINE | ID: mdl-36303794

ABSTRACT

OMPdb (www.ompdb.org) was introduced as a database for ß-barrel outer membrane proteins from Gram-negative bacteria in 2011 and then included 69,354 entries classified into 85 families. The database has been updated continuously using a collection of characteristic profile Hidden Markov Models able to discriminate between the different families of prokaryotic transmembrane ß-barrels. The number of families has increased ultimately to a total of 129 families in the current, second major version of OMPdb. New additions have been made in parallel with efforts to update existing families and add novel families. Here, we present the upgrade of OMPdb, which from now on aims to become a global repository for all transmembrane ß-barrel proteins, both eukaryotic and bacterial.

3.
J Proteome Res ; 19(3): 1209-1221, 2020 03 06.
Article in English | MEDLINE | ID: mdl-32008325

ABSTRACT

Even though in the last few years several families of eukaryotic ß-barrel outer membrane proteins have been discovered, their computational characterization and their annotation in public databases are far from complete. The PFAM database includes only very few characteristic profiles for these families, and in most cases, the profile hidden Markov models (pHMMs) have been trained using prokaryotic and eukaryotic proteins together. Here, we present for the first time a comprehensive computational analysis of eukaryotic transmembrane ß-barrels. Twelve characteristic pHMMs were built, based on an extensive literature search, which can discriminate eukaryotic ß-barrels from other classes of proteins (globular and bacterial ß-barrel ones), as well as between mitochondrial and chloroplastic ones. We built eight novel profiles for the chloroplastic ß-barrel families that are not present in the PFAM database and also updated the profile for the MDM10 family (PF12519) in the PFAM database and divide the porin family (PF01459) into two separate families, namely, VDAC and TOM40.


Subject(s)
Eukaryota , Porins , Eukaryota/genetics , Eukaryotic Cells , Mitochondria , Proteins
4.
Bioinformatics ; 35(24): 5309-5312, 2019 12 15.
Article in English | MEDLINE | ID: mdl-31250907

ABSTRACT

SUMMARY: JUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. AVAILABILITY AND IMPLEMENTATION: http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Sequence Analysis
5.
Bioinformatics ; 35(13): 2208-2215, 2019 07 01.
Article in English | MEDLINE | ID: mdl-30445435

ABSTRACT

MOTIVATION: Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. RESULTS: We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Supervised Machine Learning , Algorithms , Markov Chains , Models, Statistical , Sequence Analysis
6.
J Bioinform Comput Biol ; 16(5): 1850019, 2018 10.
Article in English | MEDLINE | ID: mdl-30353782

ABSTRACT

Hidden Markov Models (HMMs) are probabilistic models widely used in computational molecular biology. However, the Markovian assumption regarding transition probabilities which dictates that the observed symbol depends only on the current state may not be sufficient for some biological problems. In order to overcome the limitations of the first order HMM, a number of extensions have been proposed in the literature to incorporate past information in HMMs conditioning either on the hidden states, or on the observations, or both. Here, we implement a simple extension of the standard HMM in which the current observed symbol (amino acid residue) depends both on the current state and on a series of observed previous symbols. The major advantage of the method is the simplicity in the implementation, which is achieved by properly transforming the observation sequence, using an extended alphabet. Thus, it can utilize all the available algorithms for the training and decoding of HMMs. We investigated the use of several encoding schemes and performed tests in a number of important biological problems previously studied by our team (prediction of transmembrane proteins and prediction of signal peptides). The evaluation shows that, when enough data are available, the performance increased by 1.8%-8.2% and the existing prediction methods may improve using this approach. The methods, for which the improvement was significant (PRED-TMBB2, PRED-TAT and HMM-TM), are available as web-servers freely accessible to academic users at www.compgen.org/tools/ .


Subject(s)
Computational Biology/methods , Markov Chains , Algorithms , Membrane Proteins/chemistry , Membrane Proteins/metabolism , Models, Molecular , Models, Statistical , Protein Sorting Signals
7.
Ann Transl Med ; 6(12): 244, 2018 Jun.
Article in English | MEDLINE | ID: mdl-30069446

ABSTRACT

Next-generation sequencing (NGS) can provide researchers with high impact information regarding alternative splice variants or transcript identifications. However, the enormous amount of data acquired from NGS platforms make the analysis of alternative splicing events hard to accomplish. For this reason, we designed the "Alternative Splicing Detection Tool" (ASDT), an algorithm that is capable of identifying alternative splicing events, including novel ones from high-throughput NGS data. ASDT is available as a PERL script at http://aias.biol.uoa.gr/~mtheo and can be executed on any system with PERL installed. In addition to the detection of annotated and novel alternative splicing events from high-throughput NGS data, ASDT can also analyze the intronic regions of genes, thus enabling the detection of novel cryptic exons residing in annotated introns, extensions of previously annotated exons, or even intron retentions. Consequently, ASDT demonstrates many innovative and unique features that can efficiently contribute to alternative splicing analysis of NGS data.

8.
Methods Mol Biol ; 1552: 63-82, 2017.
Article in English | MEDLINE | ID: mdl-28224491

ABSTRACT

Alpha helical transmembrane (TM) proteins constitute an important structural class of membrane proteins involved in a wide variety of cellular functions. The prediction of their transmembrane topology, as well as their discrimination in newly sequenced genomes, is of great importance for the elucidation of their structure and function. Several methods have been applied for the prediction of the transmembrane segments and the topology of alpha helical transmembrane proteins utilizing different algorithmic techniques. Hidden Markov Models (HMMs) have been efficiently used in the development of several computational methods used for this task. In this chapter we give a brief review of different available prediction methods for alpha helical transmembrane proteins pointing out sequence and structural features that should be incorporated in a prediction method. We then describe the procedure of the design and development of a Hidden Markov Model capable of predicting the transmembrane alpha helices in proteins and discriminating them from globular proteins.


Subject(s)
Computational Biology/methods , Computer Simulation , Markov Chains , Membrane Proteins/chemistry , Algorithms , Databases, Protein , Humans , Models, Molecular , Protein Conformation
9.
J Comput Aided Mol Des ; 30(6): 489-512, 2016 06.
Article in English | MEDLINE | ID: mdl-27349423

ABSTRACT

A significant amount of experimental evidence suggests that G-protein coupled receptors (GPCRs) do not act exclusively as monomers but also form biologically relevant dimers and oligomers. However, the structural determinants, stoichiometry and functional importance of GPCR oligomerization remain topics of intense speculation. In this study we attempted to evaluate the nature and dynamics of GPCR oligomeric interactions. A representative set of GPCR homodimers were studied through Coarse-Grained Molecular Dynamics simulations, combined with interface analysis and concepts from network theory for the construction and analysis of dynamic structural networks. Our results highlight important structural determinants that seem to govern receptor dimer interactions. A conserved dynamic behavior was observed among different GPCRs, including receptors belonging in different GPCR classes. Specific GPCR regions were highlighted as the core of the interfaces. Finally, correlations of motion were observed between parts of the dimer interface and GPCR segments participating in ligand binding and receptor activation, suggesting the existence of mechanisms through which dimer formation may affect GPCR function. The results of this study can be used to drive experiments aimed at exploring GPCR oligomerization, as well as in the study of transmembrane protein-protein interactions in general.


Subject(s)
Molecular Dynamics Simulation , Receptors, G-Protein-Coupled/chemistry , Structure-Activity Relationship , Humans , Models, Molecular , Receptors, G-Protein-Coupled/metabolism
10.
Biochim Biophys Acta ; 1864(5): 435-40, 2016 May.
Article in English | MEDLINE | ID: mdl-26854601

ABSTRACT

Heterotrimeric G-proteins form a major protein family, which participates in signal transduction. They are composed of three subunits, Gα, Gß and Gγ. The Gα subunit is further divided in four distinct families Gs, Gi/o, Gq/11 and G12/13. The goal of this work was to detect and classify members of the four distinct families, plus the Gß and the Gγ subunits of G-proteins from sequence alone. To achieve this purpose, six specific profile Hidden Markov Models (pHMMs) were built and checked for their credibility. These models were then applied to ten (10) proteomes and were able to identify all known G-protein and classify them into the distinct families. In a separate case study, the models were applied to twenty seven (27) arthropod proteomes and were able to give more credible classification in proteins with uncertain annotation and in some cases to detect novel proteins. An online tool, GprotPRED, was developed that uses these six pHMMs. The sensitivity and specificity for all pHMMs were equal to 100% with the exception of the Gß case, where sensitivity equals to 100%, while specificity is 99.993%. In contrast to Pfam's pHMM which detects Gα subunits in general, our method not only detects Gα subunits but also classifies them into the appropriate Gα-protein family and thus could become a useful tool for the annotation of G-proteins in newly discovered proteomes. GprotPRED online tool is publicly available for non-commercial use at http://bioinformatics.biol.uoa.gr/GprotPRED and, also, a standalone version of the tool at https://github.com/vkostiou/GprotPRED.


Subject(s)
Amino Acid Sequence/genetics , GTP-Binding Protein alpha Subunits/genetics , GTP-Binding Protein beta Subunits/genetics , GTP-Binding Protein gamma Subunits/genetics , Molecular Sequence Annotation , Animals , Computational Biology , GTP-Binding Protein alpha Subunits/classification , Heterotrimeric GTP-Binding Proteins , Mammals , Markov Chains , Protein Multimerization/genetics , Proteome/genetics , Signal Transduction/genetics , Software
11.
Insect Biochem Mol Biol ; 52: 51-9, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24978609

ABSTRACT

The arthropod cuticle is a composite, bipartite system, made of chitin filaments embedded in a proteinaceous matrix. The physical properties of cuticle are determined by the structure and the interactions of its two major components, cuticular proteins (CPs) and chitin. The proteinaceous matrix consists mainly of structural cuticular proteins. The majority of the structural proteins that have been described to date belong to the CPR family, and they are identified by the conserved R&R region (Rebers and Riddiford Consensus). Two major subfamilies of the CPR family RR-1 and RR-2, have also been identified from conservation at sequence level and some correlation with the cuticle type. Recently, several novel families, also containing characteristic conserved regions, have been described. The package HMMER v3.0 (http://hmmer.janelia.org/) was used to build characteristic profile Hidden Markov Models based on the characteristic regions for 8 of these families, (CPF, CPAP3, CPAP1, CPCFC, CPLCA, CPLCG, CPLCW, Tweedle). In brief, these families can be described as having: CPF (a conserved region with 44 amino acids); CPAP1 and CPAP-3 (analogous to peritrophins, with 1 and 3 chitin-binding domains, respectively); CPCFC (2 or 3 C-x(5)-C repeats); and four of five low complexity (LC) families, each with characteristic domains. Using these models, as well as the models previously created for the two major subfamilies of the CPR family, RR-1 and RR-2 (Karouzou et al., 2007), we developed CutProtFam-Pred, an on-line tool (http://bioinformatics.biol.uoa.gr/CutProtFam-Pred) that allows one to query sequences from proteomes or translated transcriptomes, for the accurate detection and classification of putative structural cuticular proteins. The tool has been applied successfully to diverse arthropod proteomes including a crustacean (Daphnia pulex) and a chelicerate (Tetranychus urticae), but at this taxonomic distance only CPRs and CPAPs were recovered.


Subject(s)
Arthropod Proteins/chemistry , Arthropods/chemistry , Arthropods/genetics , Arthropods/metabolism , Chitin/chemistry , Amino Acid Sequence , Animals , Arthropod Proteins/genetics , Arthropod Proteins/metabolism , Chitin/genetics , Chitin/metabolism , Computational Biology/methods , Markov Chains , Molecular Sequence Data , Multigene Family , Phylogeny , Proteome , Sequence Alignment , Sequence Analysis, Protein
12.
J Struct Biol ; 182(3): 209-18, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23523730

ABSTRACT

G-protein coupled receptors (GPCRs) are one of the largest families of membrane receptors in eukaryotes. Heterotrimeric G-proteins, composed of α, ß and γ subunits, are important molecular switches in the mediation of GPCR signaling. Receptor stimulation after the binding of a suitable ligand leads to G-protein heterotrimer activation and dissociation into the Gα subunit and Gßγ heterodimer. These subunits then interact with a large number of effectors, leading to several cell responses. We studied the interactions between Gα subunits and their binding partners, using information from structural, mutagenesis and Bioinformatics studies, and conducted a series of comparisons of sequence, structure, electrostatic properties and intermolecular energies among different Gα families and subfamilies. We identified a number of Gα surfaces that may, in several occasions, participate in interactions with receptors as well as effectors. The study of Gα interacting surfaces in terms of sequence, structure and electrostatic potential reveals features that may account for the Gα subunit's behavior towards its interacting partners. The electrostatic properties of the Gα subunits, which in some cases differ greatly not only between families but also between subfamilies, as well as the G-protein interacting surfaces of effectors and regulators of G-protein signaling (RGS) suggest that electrostatic complementarity may be an important factor in G-protein interactions. Energy calculations also support this notion. This information may be useful in future studies of G-protein interactions with GPCRs and effectors.


Subject(s)
Heterotrimeric GTP-Binding Proteins/chemistry , Protein Subunits/chemistry , RGS Proteins/chemistry , Receptors, G-Protein-Coupled/chemistry , Structure-Activity Relationship , Amino Acid Sequence , Binding Sites , GTP-Binding Proteins/chemistry , Heterotrimeric GTP-Binding Proteins/metabolism , Humans , Membrane Proteins/chemistry , Protein Conformation , Protein Interaction Domains and Motifs , Protein Subunits/metabolism , RGS Proteins/metabolism , Receptors, G-Protein-Coupled/metabolism , Signal Transduction , Surface Properties
13.
Database (Oxford) ; 2010: baq019, 2010 Aug 05.
Article in English | MEDLINE | ID: mdl-20689020

ABSTRACT

G-protein coupled receptors (GPCRs) are a major family of membrane receptors in eukaryotic cells. They play a crucial role in the communication of a cell with the environment. Ligands bind to GPCRs on the outside of the cell, activating them by causing a conformational change, and allowing them to bind to G-proteins. Through their interaction with G-proteins, several effector molecules are activated leading to many kinds of cellular and physiological responses. The great importance of GPCRs and their corresponding signal transduction pathways is indicated by the fact that they take part in many diverse disease processes and that a large part of efforts towards drug development today is focused on them. We present Human-gpDB, a database which currently holds information about 713 human GPCRs, 36 human G-proteins and 99 human effectors. The collection of information about the interactions between these molecules was done manually and the current version of Human-gpDB holds information for about 1663 connections between GPCRs and G-proteins and 1618 connections between G-proteins and effectors. Major advantages of Human-gpDB are the integration of several external data sources and the support of advanced visualization techniques. Human-gpDB is a simple, yet a powerful tool for researchers in the life sciences field as it integrates an up-to-date, carefully curated collection of human GPCRs, G-proteins, effectors and their interactions. The database may be a reference guide for medical and pharmaceutical research, especially in the areas of understanding human diseases and chemical and drug discovery. Database URLs: http://schneider.embl.de/human_gpdb; http://bioinformatics.biol.uoa.gr/human_gpdb/


Subject(s)
Databases, Factual , GTP-Binding Proteins/metabolism , Receptors, G-Protein-Coupled/metabolism , Computational Biology , Computer Graphics , Databases, Factual/statistics & numerical data , Databases, Protein , GTP-Binding Proteins/chemistry , GTP-Binding Proteins/classification , Humans , Ligands , Protein Interaction Mapping/statistics & numerical data , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/classification , Signal Transduction , Structural Homology, Protein , User-Computer Interface
14.
Bioinformatics ; 24(12): 1471-2, 2008 Jun 15.
Article in English | MEDLINE | ID: mdl-18441001

ABSTRACT

UNLABELLED: gpDB is a publicly accessible, relational database, containing information about G-proteins, G-protein coupled receptors (GPCRs) and effectors, as well as information concerning known interactions between these molecules. The sequences are classified according to a hierarchy of different classes, families and subfamilies based on literature search. The main innovation besides the classification of G-proteins, GPCRs and effectors is the relational model of the database, describing the known coupling specificity of GPCRs to their respective alpha subunits of G-proteins, and also the specific interaction between G-proteins and their effectors, a unique feature not available in any other database. AVAILABILITY: http://bioinformatics.biol.uoa.gr/gpDB CONTACT: shamodr@biol.uoa.gr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Database Management Systems , Databases, Protein , GTP-Binding Proteins/chemistry , Information Storage and Retrieval/methods , Protein Interaction Mapping/methods , Receptors, G-Protein-Coupled/chemistry , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Motifs , Amino Acid Sequence , Conserved Sequence , Molecular Sequence Data , Protein Structure, Tertiary , Sequence Alignment/methods , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...