Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS Comput Biol ; 3(2): e16, 2007 Feb 02.
Article in English | MEDLINE | ID: mdl-17274683

ABSTRACT

Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are described in the peer-reviewed published literature. We describe an application, Mutation GraB (Graph Bigram), that identifies, extracts, and verifies point mutations from biomedical literature. The principal problem of point mutation extraction is to link the point mutation with its associated protein and organism of origin. Our algorithm uses a graph-based bigram traversal to identify these relevant associations and exploits the Swiss-Prot protein database to verify this information. The graph bigram method is different from other models for point mutation extraction in that it incorporates frequency and positional data of all terms in an article to drive the point mutation-protein association. Our method was tested on 589 articles describing point mutations from the G protein-coupled receptor (GPCR), tyrosine kinase, and ion channel protein families. We evaluated our graph bigram metric against a word-proximity metric for term association on datasets of full-text literature in these three different protein families. Our testing shows that the graph bigram metric achieves a higher F-measure for the GPCRs (0.79 versus 0.76), protein tyrosine kinases (0.72 versus 0.69), and ion channel transporters (0.76 versus 0.74). Importantly, in situations where more than one protein can be assigned to a point mutation and disambiguation is required, the graph bigram metric achieves a precision of 0.84 compared with the word distance metric precision of 0.73. We believe the graph bigram search metric to be a significant improvement over previous search metrics for point mutation extraction and to be applicable to text-mining application requiring the association of words.


Subject(s)
Algorithms , Artificial Intelligence , Databases, Protein , Mutation , Proteins/chemistry , Proteins/genetics , Sequence Analysis, Protein/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Sequence Alignment/methods , Sequence Homology, Amino Acid
2.
Mol Endocrinol ; 20(9): 2247-55, 2006 Sep.
Article in English | MEDLINE | ID: mdl-16543405

ABSTRACT

The glycoprotein-hormone receptor information system (GRIS) presents a comprehensive view on all available molecular data for the lutropin/choriogonadotropin receptor, follitropin receptor, and thyrotropin receptor G protein-coupled receptors. It features a mutation database presently containing 696 point mutations, combined with all sequences and the associated homology models. The mutation information was automatically extracted from the literature and manually augmented with respect to constitutivity, surface expression, sensitivity to hormones, and binding affinity. All information in this integrated system is presented in a G protein-coupled receptor specialist-friendly way. A series of interactive tools such as rotamer analysis, mutation prediction, or cavity visualization aids with the design and interpretation of experiments. A universal residue numbering system has been introduced to ease database searches as well as the use of the information in conjunction with literature data from diverse origins. Users can upload new mutations. GRIS is freely accessible at http://gris.ulb.ac.be/.


Subject(s)
Computational Biology , Glycoproteins/chemistry , Glycoproteins/metabolism , Hormones/chemistry , Hormones/metabolism , Amino Acid Sequence , Animals , Glycoproteins/genetics , Humans , Ligands , Models, Molecular , Molecular Sequence Data , Mutation/genetics , Protein Binding , Protein Structure, Tertiary , Sequence Alignment
3.
J Mol Biol ; 341(2): 321-35, 2004 Aug 06.
Article in English | MEDLINE | ID: mdl-15276826

ABSTRACT

Literature studies, 3D structure data, and a series of sequence analysis techniques were combined to reveal important residues in the structure and function of the ligand-binding domain of nuclear hormone receptors. A structure-based multiple sequence alignment allowed for the seamless combination of data from many different studies on different receptors into one single functional model. It was recently shown that a combined analysis of sequence entropy and variability can divide residues in five classes; (1) the main function or active site, (2) support for the main function, (3) signal transduction, (4) modulator or ligand binding and (5) the rest. Mutation data extracted from the literature and intermolecular contacts observed in nuclear receptor structures were analyzed in view of this classification and showed that the main function or active site residues of the nuclear receptor ligand-binding domain are involved in cofactor recruitment. Furthermore, the sequence entropy-variability analysis identified the presence of signal transduction residues that are located between the ligand, cofactor and dimer sites, suggesting communication between these regulatory binding sites. Experimental and computational results agreed well for most residues for which mutation data and intermolecular contact data were available. This allows us to predict the role of the residues for which no functional data is available yet. This study illustrates the power of family-based approaches towards the analysis of protein function, and it points out the problems and possibilities presented by the massive amounts of data that are becoming available in the "omics era". The results shed light on the nuclear receptor family that is involved in processes ranging from cancer to infertility, and that is one of the more important targets in the pharmaceutical industry.


Subject(s)
Amino Acids/chemistry , Multigene Family/physiology , Mutation , Protein Conformation , Receptors, Cytoplasmic and Nuclear/chemistry , Amino Acids/metabolism , Binding Sites , Entropy , Humans , Ligands , Models, Molecular , Protein Binding , Receptors, Cytoplasmic and Nuclear/metabolism , Sequence Alignment , Signal Transduction
4.
Bioinformatics ; 20(4): 557-68, 2004 Mar 01.
Article in English | MEDLINE | ID: mdl-14990452

ABSTRACT

MOTIVATION: The amount of genomic and proteomic data that is published daily in the scientific literature is outstripping the ability of experimental scientists to stay current. Reviews, the traditional medium for collating published observations, are also unable to keep pace. For some specific classes of information (e.g. sequences and protein structures), obligatory data deposition policies have helped. However, a great deal of other valuable information is spread throughout the literature hindering coherent access. We are involved in the Molecular Class-Specific Information System (MCSIS) project, a collaborative effort to design and automate the maintenance of protein family databases. The first two databases, the GPCRDB and NucleaRDB, are focused on G protein-coupled receptors (GPCRs) and nuclear hormone receptors (NRs), respectively. The main aim of the MCSIS project is to gather heterogeneous data from across a variety of electronic and literature sources in order to draw new inferences about the target protein families. RESULTS: We present a computational method that identifies and extracts mutation data from the scientific literature. We focused on the extraction of single point mutations for the GPCR and NR superfamilies. After validation by plausibility filters, the mutation data is integrated into the corresponding MCSIS where it is combined with structural and sequence information already stored in these databases. We extracted and validated 2736 true point mutations from 914 articles on GPCRs and 785 true point mutations from 1094 articles on NRs. The current version of our automated extraction algorithm identifies 49.3% of the GPCR point mutations with a specificity of 87.9%, and 64.5% of the NR point mutations with a specificity of 85.8%. MuteXt routinely analyzes 100 electronic articles in approximately 1 h.


Subject(s)
Algorithms , Database Management Systems , Databases, Bibliographic , Information Storage and Retrieval/methods , Mutation , Periodicals as Topic , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acid Substitution , Databases, Protein , MEDLINE , Natural Language Processing , Proteins/classification , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/classification , Reproducibility of Results , Sensitivity and Specificity , Sequence Alignment/methods , Software , Software Validation
5.
Nucleic Acids Res ; 31(13): 3400-3, 2003 Jul 01.
Article in English | MEDLINE | ID: mdl-12824335

ABSTRACT

We present a coherent series of servers that can perform a large number of structure analyses on nuclear hormone receptors. These servers are part of the NucleaRDB project, which provides a powerful information system for nuclear hormone receptors. The computations performed by the servers include homology modelling, structure validation, calculating contacts, accessibility values, hydrogen bonding patterns, predicting mutations and a host of two- and three-dimensional visualisations. The Nuclear Receptor Structure Analysis Servers (NRSAS) are freely accessible at http://www.cmbi.kun.nl/NR/servers/html/ and in-house copies can be obtained upon request.


Subject(s)
Hormones , Receptors, Cytoplasmic and Nuclear/chemistry , Hydrogen Bonding , Internet , Models, Molecular , Mutation , Protein Conformation , Receptors, Cytoplasmic and Nuclear/genetics , Software , Structural Homology, Protein
6.
Nucleic Acids Res ; 31(1): 294-7, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12520006

ABSTRACT

The GPCRDB is a molecular class-specific information system that collects, combines, validates and disseminates heterogeneous data on G protein-coupled receptors (GPCRs). The database stores data on sequences, ligand binding constants and mutations. The system also provides computationally derived data such as sequence alignments, homology models, and a series of query and visualization tools. The GPCRDB is updated automatically once every 4-5 months and is freely accessible at http://www.gpcr.org/7tm/.


Subject(s)
Databases, Protein , Receptors, Cell Surface , Amino Acid Sequence , Computational Biology , Heterotrimeric GTP-Binding Proteins/metabolism , Humans , Information Systems , Ligands , Models, Molecular , Mutation , Receptors, Cell Surface/chemistry , Receptors, Cell Surface/genetics , Receptors, Cell Surface/metabolism , Sequence Alignment
7.
Nucleic Acids Res ; 31(1): 331-3, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12520016

ABSTRACT

The NRMD is a database for nuclear receptor mutation information. It includes mutation information from SWISS-PROT/TrEMBL, several web-based mutation data resources, and data extracted from the literature in a fully automatic manner. Because it is also possible to add mutations manually, a hundred mutations were added for completeness. At present, the NRMD contains information about 893 mutations in 54 nuclear receptors. A common numbering scheme for all nuclear receptors eases the use of the information for many kinds of studies. The NRMD is freely available to academia and industry as a stand-alone version at: www.receptors.org/NR/.


Subject(s)
Databases, Nucleic Acid , Mutation , Receptors, Cytoplasmic and Nuclear/genetics , Animals
8.
Mol Cell Biol ; 22(20): 7193-203, 2002 Oct.
Article in English | MEDLINE | ID: mdl-12242296

ABSTRACT

Steroidogenic factor 1 (SF-1) is an orphan nuclear receptor with no known ligand. We showed previously that phosphorylation at serine 203 located N'-terminal to the ligand binding domain (LBD) enhanced cofactor recruitment, analogous to the ligand-mediated recruitment in ligand-dependent receptors. In this study, results of biochemical analyses and an LBD helix assembly assay suggest that the SF-1 LBD adopts an active conformation, with helices 1 and 12 packed against the predicted alpha-helical bundle, in the apparent absence of ligand. Fine mapping of the previously defined proximal activation function in SF-1 showed that the activation function mapped fully to helix 1 of the LBD. Limited proteolyses demonstrate that phosphorylation of S203 in the hinge region mimics the stabilizing effects of ligand on the LBD. Moreover, similar effects were observed in an SF-1/thyroid hormone LBD chimera receptor, illustrating that the S203 phosphorylation effects are transferable to a heterologous ligand-dependent receptor. Our collective data suggest that the hinge together with helix 1 is an individualized specific motif, which is tightly associated with its cognate LBD. For SF-1, we find that this intramolecular association and hence receptor activity are further enhanced by mitogen-activated protein kinase phosphorylation, thus mimicking many of the ligand-induced changes observed for ligand-dependent receptors.


Subject(s)
DNA-Binding Proteins/metabolism , Helix-Loop-Helix Motifs , Receptors, Thyroid Hormone/metabolism , Transcription Factors/metabolism , 3T3 Cells , Amino Acid Sequence , Animals , Binding Sites , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Fushi Tarazu Transcription Factors , Homeodomain Proteins , Humans , Ligands , Mice , Mitogen-Activated Protein Kinases/metabolism , Models, Molecular , Molecular Sequence Data , Phosphorylation , Protein Structure, Tertiary , Receptors, Cytoplasmic and Nuclear , Receptors, Thyroid Hormone/chemistry , Receptors, Thyroid Hormone/genetics , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/metabolism , Sequence Homology, Amino Acid , Steroidogenic Factor 1 , Thyroid Hormone Receptors beta , Transcription Factors/chemistry , Transcription Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...