Search | VHL Regional Portal

Annotation of biologically relevant ligands in UniProtKB using ChEBI.

Coudert, Elisabeth; Gehant, Sebastien; de Castro, Edouard; Pozzato, Monica; Baratin, Delphine; Neto, Teresa; Sigrist, Christian J A; Redaschi, Nicole; Bridge, Alan.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36484697

ABSTRACT

MOTIVATION: To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands. RESULTS: We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides. AVAILABILITY AND IMPLEMENTATION: Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website (www.uniprot.org), REST API (www.uniprot.org/help/api), SPARQL endpoint (sparql.uniprot.org/) and FTP site (https://ftp.uniprot.org/pub/databases/uniprot/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Knowledge Bases , Databases, Protein , Ligands , Amino Acid Sequence , Binding Sites , Molecular Sequence Annotation

Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB.

Feuermann, Marc; Boutet, Emmanuel; Morgat, Anne; Axelsen, Kristian B; Bansal, Parit; Bolleman, Jerven; de Castro, Edouard; Coudert, Elisabeth; Gasteiger, Elisabeth; Géhant, Sébastien; Lieberherr, Damien; Lombardot, Thierry; Neto, Teresa B; Pedruzzi, Ivo; Poux, Sylvain; Pozzato, Monica; Redaschi, Nicole; Bridge, Alan.

Metabolites ; 11(1)2021 Jan 12.

Article in English | MEDLINE | ID: mdl-33445429

ABSTRACT

The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.

Bolleman, Jerven; de Castro, Edouard; Baratin, Delphine; Gehant, Sebastien; Cuche, Beatrice A; Auchincloss, Andrea H; Coudert, Elisabeth; Hulo, Chantal; Masson, Patrick; Pedruzzi, Ivo; Rivoire, Catherine; Xenarios, Ioannis; Redaschi, Nicole; Bridge, Alan.

Gigascience ; 9(2)2020 02 01.

Article in English | MEDLINE | ID: mdl-32034905

ABSTRACT

BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.

Subject(s)

Genomics/methods , Molecular Sequence Annotation/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Software/standards , Animals , Genomics/standards , Humans , Molecular Sequence Annotation/standards , Sequence Analysis, DNA/standards , Sequence Analysis, Protein/standards

Enzyme annotation in UniProtKB using Rhea.

Morgat, Anne; Lombardot, Thierry; Coudert, Elisabeth; Axelsen, Kristian; Neto, Teresa Batista; Gehant, Sebastien; Bansal, Parit; Bolleman, Jerven; Gasteiger, Elisabeth; de Castro, Edouard; Baratin, Delphine; Pozzato, Monica; Xenarios, Ioannis; Poux, Sylvain; Redaschi, Nicole; Bridge, Alan.

Bioinformatics ; 36(6): 1896-1901, 2020 03 01.

Article in English | MEDLINE | ID: mdl-31688925

ABSTRACT

MOTIVATION: To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. RESULTS: We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide. AVAILABILITY AND IMPLEMENTATION: UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.

Subject(s)

Rheiformes , Animals , Databases, Protein , Knowledge Bases

FAIR adoption, assessment and challenges at UniProt.

Garcia, Leyla; Bolleman, Jerven; Gehant, Sebastien; Redaschi, Nicole; Martin, Maria.

Sci Data ; 6(1): 175, 2019 09 20.

Article in English | MEDLINE | ID: mdl-31541106

Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation.

Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud; Bolleman, Jerven; Géhant, Sébastien; Breuza, Lionel; Bridge, Alan; Poux, Sylvain; Redaschi, Nicole; Bougueleret, Lydie; Xenarios, Ioannis.

Hum Mutat ; 35(8): 927-35, 2014 Aug.

Article in English | MEDLINE | ID: mdl-24848695

ABSTRACT

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.

Subject(s)

Databases, Protein/statistics & numerical data , Genetic Association Studies , Genetics, Medical , Knowledge Bases , Proteome , Software , Amino Acid Sequence , Genetic Variation , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Internet , Molecular Sequence Annotation , Molecular Sequence Data , Terminology as Topic

The EBI RDF platform: linked open data for the life sciences.

Jupp, Simon; Malone, James; Bolleman, Jerven; Brandizi, Marco; Davies, Mark; Garcia, Leyla; Gaulton, Anna; Gehant, Sebastien; Laibe, Camille; Redaschi, Nicole; Wimalaratne, Sarala M; Martin, Maria; Le Novère, Nicolas; Parkinson, Helen; Birney, Ewan; Jenkinson, Andrew M.

Bioinformatics ; 30(9): 1338-9, 2014 May 01.

Article in English | MEDLINE | ID: mdl-24413672

ABSTRACT

MOTIVATION: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI.

Subject(s)

Computational Biology/methods , Databases, Genetic , Academies and Institutes , Biomedical Research , Internet

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL