Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters










Publication year range
1.
Environ Sci Process Impacts ; 25(11): 1788-1801, 2023 Nov 15.
Article in English | MEDLINE | ID: mdl-37431591

ABSTRACT

The term "exposome" is defined as a comprehensive study of life-course environmental exposures and the associated biological responses. Humans are exposed to many different chemicals, which can pose a major threat to the well-being of humanity. Targeted or non-targeted mass spectrometry techniques are widely used to identify and characterize various environmental stressors when linking exposures to human health. However, identification remains challenging due to the huge chemical space applicable to exposomics, combined with the lack of sufficient relevant entries in spectral libraries. Addressing these challenges requires cheminformatics tools and database resources to share curated open spectral data on chemicals to improve the identification of chemicals in exposomics studies. This article describes efforts to contribute spectra relevant for exposomics to the open mass spectral library MassBank (https://www.massbank.eu) using various open source software efforts, including the R packages RMassBank and Shinyscreen. The experimental spectra were obtained from ten mixtures containing toxicologically relevant chemicals from the US Environmental Protection Agency (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT). Following processing and curation, 5582 spectra from 783 of the 1268 ENTACT compounds were added to MassBank, and through this to other open spectral libraries (e.g., MoNA, GNPS) for community benefit. Additionally, an automated deposition and annotation workflow was developed with PubChem to enable the display of all MassBank mass spectra in PubChem, which is rerun with each MassBank release. The new spectral records have already been used in several studies to increase the confidence in identification in non-target small molecule identification workflows applied to environmental and exposomics research.


Subject(s)
Environmental Exposure , Software , Humans , Mass Spectrometry/methods , Environmental Exposure/analysis , Databases, Factual
2.
Nucleic Acids Res ; 51(D1): D1373-D1380, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36305812

ABSTRACT

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.


Subject(s)
Databases, Chemical , Drug Discovery , Drug Discovery/methods , Biological Assay , Proteins , Cheminformatics
3.
Nucleic Acids Res ; 49(D1): D1388-D1395, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33151290

ABSTRACT

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).


Subject(s)
COVID-19/prevention & control , Databases, Chemical , Information Storage and Retrieval/statistics & numerical data , SARS-CoV-2/isolation & purification , User-Computer Interface , COVID-19/epidemiology , COVID-19/virology , Drug Discovery/statistics & numerical data , Epidemics , Humans , Information Storage and Retrieval/methods , Internet , Public Health/statistics & numerical data , SARS-CoV-2/physiology , Software
4.
Nucleic Acids Res ; 47(D1): D1102-D1109, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30371825

ABSTRACT

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.


Subject(s)
Computational Biology/methods , Databases, Chemical , Pharmaceutical Preparations/chemistry , Small Molecule Libraries/chemistry , Animals , Biological Assay/methods , Drug Discovery/methods , High-Throughput Screening Assays/methods , Humans , Information Storage and Retrieval/methods , Internet , Molecular Structure , Patents as Topic , Structure-Activity Relationship
5.
Methods Mol Biol ; 1825: 63-91, 2018.
Article in English | MEDLINE | ID: mdl-30334203

ABSTRACT

PubChem ( https://pubchem.ncbi.nlm.nih.gov ) is a key chemical information resource, developed and maintained by the US National Institutes of Health. The present chapter describes how to find potential multitarget ligands from PubChem that would be tested in further experiments. While the protocol presented here uses PubChem's Web-based interfaces to allow users to follow it interactively, it can also be implemented in computer software by using programmatic access interfaces to PubChem (such as PUG-REST or E-Utilities).


Subject(s)
Databases, Chemical , Drug Discovery/methods , Internet , Pharmaceutical Preparations/metabolism , Software , Humans , Ligands , National Institutes of Health (U.S.) , Pharmaceutical Preparations/chemistry , United States , User-Computer Interface
6.
Methods Mol Biol ; 1647: 221-236, 2017.
Article in English | MEDLINE | ID: mdl-28809006

ABSTRACT

We describe a computational protocol to aid the design of small molecule and peptide drugs that target protein-protein interactions, particularly for anti-cancer therapy. To achieve this goal, we explore multiple strategies, including finding binding hot spots, incorporating chemical similarity and bioactivity data, and sampling similar binding sites from homologous protein complexes. We demonstrate how to combine existing interdisciplinary resources with examples of semi-automated workflows. Finally, we discuss several major problems, including the occurrence of drug-resistant mutations, drug promiscuity, and the design of dual-effect inhibitors.


Subject(s)
Antineoplastic Agents/pharmacology , Drug Design , Molecular Targeted Therapy , Protein Interaction Mapping , Proteins/chemistry , Antineoplastic Agents/chemistry , Binding Sites , Computer Simulation , Drug Resistance, Neoplasm , Humans , Models, Molecular , Protein Binding , Protein Conformation , Proto-Oncogene Proteins c-mdm2/chemistry , Proto-Oncogene Proteins c-mdm2/metabolism , Tumor Suppressor Protein p53/chemistry , Tumor Suppressor Protein p53/metabolism , Workflow
7.
Nucleic Acids Res ; 45(D1): D955-D963, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899599

ABSTRACT

PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing.


Subject(s)
Databases, Chemical , Databases, Nucleic Acid , RNA Interference , Search Engine , Small Molecule Libraries , Drug Discovery , Gene Expression Regulation/drug effects , Humans , Software , User-Computer Interface , Web Browser
8.
J Cheminform ; 8: 32, 2016.
Article in English | MEDLINE | ID: mdl-27293485

ABSTRACT

BACKGROUND: PubChem is an open archive consisting of a set of three primary public databases (BioAssay, Compound, and Substance). It contains information on a broad range of chemical entities, including small molecules, lipids, carbohydrates, and (chemically modified) amino acid and nucleic acid sequences (including siRNA and miRNA). Currently (as of Nov. 2015), PubChem contains more than 150 million depositor-provided chemical substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results provided from over 1 million biological assay records. DESCRIPTION: Many PubChem records (substances, compounds, and assays) include depositor-provided cross-references to scientific articles in PubMed. Some PubChem contributors provide bioactivity data extracted from scientific articles. Literature-derived bioactivity data complement high-throughput screening (HTS) data from the concluded NIH Molecular Libraries Program and other HTS projects. Some journals provide PubChem with information on chemicals that appear in their newly published articles, enabling concurrent publication of scientific articles in journals and associated data in public databases. In addition, PubChem links records to PubMed articles indexed with the Medical Subject Heading (MeSH) controlled vocabulary thesaurus. CONCLUSION: Literature information, both provided by depositors and derived from MeSH annotations, can be accessed using PubChem's web interfaces, enabling users to explore information available in literature related to PubChem records beyond typical web search results. GRAPHICAL ABSTRACT: Graphical abstractLiterature information for PubChem records is derived from various sources.

9.
Nucleic Acids Res ; 44(D1): D1202-13, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26400175

ABSTRACT

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.


Subject(s)
Databases, Chemical , Internet , Molecular Structure , Pharmaceutical Preparations/chemistry , Software
10.
Cancer Res ; 76(3): 561-71, 2016 Feb 01.
Article in English | MEDLINE | ID: mdl-26676746

ABSTRACT

Oncogenic mutations in the monomeric Casitas B-lineage lymphoma (Cbl) gene have been found in many tumors, but their significance remains largely unknown. Several human c-Cbl (CBL) structures have recently been solved, depicting the protein at different stages of its activation cycle and thus providing mechanistic insight underlying how stability-activity tradeoffs in cancer-related proteins-may influence disease onset and progression. In this study, we computationally modeled the effects of missense cancer mutations on structures representing four stages of the CBL activation cycle to identify driver mutations that affect CBL stability, binding, and activity. We found that recurrent, homozygous, and leukemia-specific mutations had greater destabilizing effects on CBL states than random noncancer mutations. We further tested the ability of these computational models, assessing the changes in CBL stability and its binding to ubiquitin-conjugating enzyme E2, by performing blind CBL-mediated EGFR ubiquitination assays in cells. Experimental CBL ubiquitin ligase activity was in agreement with the predicted changes in CBL stability and, to a lesser extent, with CBL-E2 binding affinity. Two thirds of all experimentally tested mutations affected the ubiquitin ligase activity by either destabilizing CBL or disrupting CBL-E2 binding, whereas about one-third of tested mutations were found to be neutral. Collectively, our findings demonstrate that computational methods incorporating multiple protein conformations and stability and binding affinity evaluations can successfully predict the functional consequences of cancer mutations on protein activity, and provide a proof of concept for mutations in CBL.


Subject(s)
Lung Neoplasms/enzymology , Lung Neoplasms/genetics , Mutation, Missense , Proto-Oncogene Proteins c-cbl/genetics , Proto-Oncogene Proteins c-cbl/metabolism , Uterine Cervical Neoplasms/enzymology , Uterine Cervical Neoplasms/genetics , Carcinoma, Non-Small-Cell Lung/enzymology , Carcinoma, Non-Small-Cell Lung/genetics , Enzyme Activation , ErbB Receptors/chemistry , ErbB Receptors/metabolism , Female , HEK293 Cells , HeLa Cells , Humans , Models, Molecular , Phosphorylation , Protein Stability , Proto-Oncogene Proteins c-cbl/chemistry , Signal Transduction , Thermodynamics , Transfection , Ubiquitination
11.
Biophys J ; 109(6): 1295-306, 2015 Sep 15.
Article in English | MEDLINE | ID: mdl-26213149

ABSTRACT

Structures of protein complexes provide atomistic insights into protein interactions. Human proteins represent a quarter of all structures in the Protein Data Bank; however, available protein complexes cover less than 10% of the human proteome. Although it is theoretically possible to infer interactions in human proteins based on structures of homologous protein complexes, it is still unclear to what extent protein interactions and binding sites are conserved, and whether protein complexes from remotely related species can be used to infer interactions and binding sites. We considered biological units of protein complexes and clustered protein-protein binding sites into similarity groups based on their structure and sequence, which allowed us to identify unique binding sites. We showed that the growth rate of the number of unique binding sites in the Protein Data Bank was much slower than the growth rate of the number of structural complexes. Next, we investigated the evolutionary roots of unique binding sites and identified the major phyletic branches with the largest expansion in the number of novel binding sites. We found that many binding sites could be traced to the universal common ancestor of all cellular organisms, whereas relatively few binding sites emerged at the major evolutionary branching points. We analyzed the physicochemical properties of unique binding sites and found that the most ancient sites were the largest in size, involved many salt bridges, and were the most compact and least planar. In contrast, binding sites that appeared more recently in the evolution of eukaryotes were characterized by a larger fraction of polar and aromatic residues, and were less compact and more planar, possibly due to their more transient nature and roles in signaling processes.


Subject(s)
Binding Sites/genetics , Evolution, Molecular , Protein Binding/genetics , Proteins/genetics , Proteins/metabolism , Animals , Humans , Models, Molecular
12.
Prog Biophys Mol Biol ; 116(2-3): 187-93, 2014.
Article in English | MEDLINE | ID: mdl-24931138

ABSTRACT

Protein interactions have evolved into highly precise and regulated networks adding an immense layer of complexity to cellular systems. The most accurate atomistic description of protein binding sites can be obtained directly from structures of protein complexes. The availability of structurally characterized protein interfaces significantly improves our understanding of interactomes, and the progress in structural characterization of protein-protein interactions (PPIs) can be measured by calculating the structural coverage of protein domain families. We analyze the coverage of protein domain families (defined according to CDD and Pfam databases) by structures, structural protein-protein complexes and unique protein binding sites. Structural PPI coverage of currently available protein families is about 30% without any signs of saturation in coverage growth dynamics. Given the current growth rates of domain databases and structural PPI deposition, complete domain coverage with PPIs is not expected in the near future. As a result of this study we identify families without any protein-protein interaction evidence (listed on a supporting website http://www.ncbi.nlm.nih.gov/Structure/ibis/coverage/) and propose them as potential targets for structural studies with a focus on protein interactions.


Subject(s)
Protein Interaction Mapping/methods , Protein Interaction Mapping/trends , Proteins/chemistry , Proteins/metabolism , Binding Sites , Computational Biology , Databases, Protein , Protein Structure, Tertiary
13.
Nucleic Acids Res ; 42(Database issue): D1075-82, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24198245

ABSTRACT

PubChem's BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological tests of small molecules generated through high-throughput screening experiments, medicinal chemistry studies, chemical biology research and drug discovery programs. In addition, the BioAssay database contains data from high-throughput RNA interference screening aimed at identifying critical genes responsible for a biological process or disease condition. The mission of PubChem is to serve the community by providing free and easy access to all deposited data. To this end, PubChem BioAssay is integrated into the National Center for Biotechnology Information retrieval system, making them searchable by Entrez queries and cross-linked to other biomedical information archived at National Center for Biotechnology Information. Moreover, PubChem BioAssay provides web-based and programmatic tools allowing users to search, access and analyze bioassay test results and metadata. In this work, we provide an update for the PubChem BioAssay resource, such as information content growth, new developments supporting data integration and search, and the recently deployed PubChem Upload to streamline chemical structure and bioassay submissions.


Subject(s)
Databases, Chemical , High-Throughput Screening Assays , RNA Interference , Drug Discovery , Genes , Humans , Internet , Proteins/genetics , Small Molecule Libraries , Systems Integration
14.
PLoS One ; 8(6): e66273, 2013.
Article in English | MEDLINE | ID: mdl-23799087

ABSTRACT

Many studies have shown that missense mutations might play an important role in carcinogenesis. However, the extent to which cancer mutations might affect biomolecular interactions remains unclear. Here, we map glioblastoma missense mutations on the human protein interactome, model the structures of affected protein complexes and decipher the effect of mutations on protein-protein, protein-nucleic acid and protein-ion binding interfaces. Although some missense mutations over-stabilize protein complexes, we found that the overall effect of mutations is destabilizing, mostly affecting the electrostatic component of binding energy. We also showed that mutations on interfaces resulted in more drastic changes of amino acid physico-chemical properties than mutations occurring outside the interfaces. Analysis of glioblastoma mutations on interfaces allowed us to stratify cancer-related interactions, identify potential driver genes, and propose two dozen additional cancer biomarkers, including those specific to functions of the nervous system. Such an analysis also offered insight into the molecular mechanism of the phenotypic outcomes of mutations, including effects on complex stability, activity, binding and turnover rate. As a result of mutated protein and gene network analysis, we observed that interactions of proteins with mutations mapped on interfaces had higher bottleneck properties compared to interactions with mutations elsewhere on the protein or unaffected interactions. Such observations suggest that genes with mutations directly affecting protein binding properties are preferably located in central network positions and may influence critical nodes and edges in signal transduction networks.


Subject(s)
Glioblastoma/genetics , Mutation, Missense , Artificial Intelligence , Binding Sites , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Glioblastoma/metabolism , Humans , Models, Biological , Phenotype , Protein Binding , Protein Interaction Domains and Motifs/genetics , Protein Interaction Maps , Protein Stability , Thermodynamics
15.
J Phys Chem B ; 117(42): 13226-34, 2013 Oct 24.
Article in English | MEDLINE | ID: mdl-23734591

ABSTRACT

The nuclear factor of activated T cells 5 (NFAT5 or TonEBP) is a Rel family transcriptional activator and is activated by hypertonic conditions. Several studies point to a possible connection between nuclear translocation and DNA binding; however, the mechanism of NFAT5 nuclear translocation and the effect of DNA binding on retaining NFAT5 in the nucleus are largely unknown. Recent experiments showed that different mutations introduced in the DNA-binding loop and dimerization interface were important for DNA binding and some of them decreased the nuclear-cytoplasm ratio of NFAT5. To understand the mechanisms of these mutations, we model their effect on protein dynamics and DNA binding. We show that the NFAT5 complex without DNA is much more flexible than the complex with DNA. Moreover, DNA binding considerably stabilizes the overall dimeric complex and the NFAT5 dimer is only marginally stable in the absence of DNA. Two sets of NFAT5 mutations from the same DNA-binding loop are found to have different mechanisms of specific and nonspecific binding to DNA. The R217A/E223A/R226A (R293A/E299A/R302A using isoform c numbering) mutant is characterized by significantly compromised binding to DNA and higher complex flexibility. On the contrary, the T222D (T298D in isoform c) mutation, a potential phosphomimetic mutation, makes the overall complex more rigid and does not significantly affect the DNA binding. Therefore, the reduced nuclear-cytoplasm ratio of NFAT5 can be attributed to reduced binding to DNA for the triple mutant, while the T222D mutant suggests an additional mechanism at work.


Subject(s)
DNA/metabolism , Transcription Factors/metabolism , Binding Sites , Humans , Molecular Dynamics Simulation , Mutation , Principal Component Analysis , Protein Binding , Protein Structure, Tertiary , Software , Transcription Factors/chemistry , Transcription Factors/genetics
16.
EMBO Rep ; 13(3): 266-71, 2012 Mar 01.
Article in English | MEDLINE | ID: mdl-22261719

ABSTRACT

Although the identification of protein interactions by high-throughput (HTP) methods progresses at a fast pace, 'interactome' data sets still suffer from high rates of false positives and low coverage. To map the human protein interactome, we describe a new framework that uses experimental evidence on structural complexes, the atomic details of binding interfaces and evolutionary conservation. The structurally inferred interaction network is highly modular and more functionally coherent compared with experimental interaction networks derived from multiple literature citations. Moreover, structurally inferred and high-confidence HTP networks complement each other well, allowing us to construct a merged network to generate testable hypotheses and provide valuable experimental leads.


Subject(s)
Multiprotein Complexes/chemistry , Protein Interaction Mapping/methods , Proteomics/methods , Binding Sites , Computational Biology/methods , Databases, Genetic , Humans , Protein Binding , Protein Interaction Domains and Motifs , Software
17.
Mol Biosyst ; 8(1): 320-6, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22012032

ABSTRACT

We analyze human-specific KEGG pathways trying to understand the functional role of intrinsic disorder in proteins. Pathways provide a comprehensive picture of biological processes and allow better understanding of a protein's function within the specific context of its surroundings. Our study pinpoints a few specific pathways significantly enriched in disorder-containing proteins and identifies the role of these proteins within the framework of pathway relationships. Three major categories of relations are shown to be significantly enriched in disordered proteins: gene expression, protein binding and to a lesser degree, protein phosphorylation. Finally we find that relations involving protein activation and to some extent inhibition are characterized by low disorder content.


Subject(s)
Metabolic Networks and Pathways , Protein Folding , Proteins/chemistry , Proteins/metabolism , Humans , Protein Binding
18.
Nucleic Acids Res ; 40(Database issue): D400-12, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22140110

ABSTRACT

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activity data of small molecules and RNAi reagents. The mission of PubChem is to deliver free and easy access to all deposited data, and to provide intuitive data analysis tools. The PubChem BioAssay database currently contains 500,000 descriptions of assay protocols, covering 5000 protein targets, 30,000 gene targets and providing over 130 million bioactivity outcomes. PubChem's bioassay data are integrated into the NCBI Entrez information retrieval system, thus making PubChem data searchable and accessible by Entrez queries. Also, as a repository, PubChem constantly optimizes and develops its deposition system answering many demands of both high- and low-volume depositors. The PubChem information platform allows users to search, review and download bioassay description and data. The PubChem platform also enables researchers to collect, compare and analyze biological test results through web-based and programmatic tools. In this work, we provide an update for the PubChem BioAssay resource, including information content growth, data model extension and new developments of data submission, retrieval, analysis and download tools.


Subject(s)
Databases, Factual , Drug Discovery , RNA Interference , Biological Assay , High-Throughput Screening Assays , Indicators and Reagents , Molecular Structure , Software
19.
Nucleic Acids Res ; 40(Database issue): D834-40, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22102591

ABSTRACT

We have recently developed the Inferred Biomolecular Interaction Server (IBIS) and database, which reports, predicts and integrates different types of interaction partners and locations of binding sites in proteins based on the analysis of homologous structural complexes. Here, we highlight several new IBIS features and options. The server's webpage is now redesigned to allow users easier access to data for different interaction types. An entry page is added to give a quick summary of available results and to now accept protein sequence accessions. To elucidate the formation of protein complexes, not just binary interactions, IBIS currently presents an expandable interaction network. Previously, IBIS provided annotations for four different types of binding partners: proteins, small molecules, nucleic acids and peptides; in the current version a new protein-ion interaction type has been added. Several options provide easy downloads of IBIS data for all Protein Data Bank (PDB) protein chains and the results for each query. In this study, we show that about one-third of all RefSeq sequences can be annotated with IBIS interaction partners and binding sites. The IBIS server is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi and updated biweekly.


Subject(s)
Databases, Protein , Protein Interaction Mapping , Proteins/chemistry , Binding Sites , Computer Graphics , Ions/chemistry , Molecular Sequence Annotation , Multiprotein Complexes/chemistry , Nucleic Acids/chemistry , Peptides/chemistry , Sequence Analysis, Protein , Systems Integration , User-Computer Interface
20.
BMC Bioinformatics ; 11: 365, 2010 Jul 01.
Article in English | MEDLINE | ID: mdl-20594344

ABSTRACT

BACKGROUND: The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. RESULTS: We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. CONCLUSIONS: A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.


Subject(s)
Algorithms , Binding Sites , Proteins/chemistry , Proteins/metabolism , Amino Acid Sequence , Cluster Analysis , Knowledge Bases , Protein Binding , Sequence Analysis, Protein , Structural Homology, Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...