Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
ArXiv ; 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38903736

ABSTRACT

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.

2.
Database (Oxford) ; 20222022 04 12.
Article in English | MEDLINE | ID: mdl-35411389

ABSTRACT

SwissBioPics (www.swissbiopics.org) is a freely available resource of interactive, high-resolution cell images designed for the visualization of subcellular location data. SwissBioPics provides images describing cell types from all kingdoms of life-from the specialized muscle, neuronal and epithelial cells of animals, to the rods, cocci, clubs and spirals of prokaryotes. All cell images in SwissBioPics are drawn in Scalable Vector Graphics (SVG), with each subcellular location tagged with a unique identifier from the controlled vocabulary of subcellular locations and organelles of UniProt (https://www.uniprot.org/locations/). Users can search and explore SwissBioPics cell images through our website, which provides a platform for users to learn more about how cells are organized. A web component allows developers to embed SwissBioPics images in their own websites, using the associated JavaScript and a styling template, and to highlight subcellular locations and organelles by simply providing the web component with the appropriate identifier(s) from the UniProt-controlled vocabulary or the 'Cellular Component' branch of the Gene Ontology (www.geneontology.org), as well as an organism identifier from the National Center for Biotechnology Information taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy). The UniProt website now uses SwissBioPics to visualize the subcellular locations and organelles where proteins function. SwissBioPics is freely available for anyone to use under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. DATABASE URL: www.swissbiopics.org.


Subject(s)
Proteins , Vocabulary, Controlled , Animals
4.
Metabolites ; 11(1)2021 Jan 12.
Article in English | MEDLINE | ID: mdl-33445429

ABSTRACT

The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.

5.
Bioinformatics ; 36(17): 4643-4648, 2020 11 01.
Article in English | MEDLINE | ID: mdl-32399560

ABSTRACT

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.


Subject(s)
Knowledge Bases , Proteins , Chromosome Mapping , Databases, Protein , Molecular Sequence Annotation , Proteins/genetics
6.
Gigascience ; 9(2)2020 02 01.
Article in English | MEDLINE | ID: mdl-32034905

ABSTRACT

BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.


Subject(s)
Genomics/methods , Molecular Sequence Annotation/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Software/standards , Animals , Genomics/standards , Humans , Molecular Sequence Annotation/standards , Sequence Analysis, DNA/standards , Sequence Analysis, Protein/standards
7.
Nucleic Acids Res ; 43(Database issue): D1064-70, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25348399

ABSTRACT

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Sequence Homology, Amino Acid , Humans , Internet , Proteins/classification
8.
Nucleic Acids Res ; 41(Database issue): D584-9, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23193261

ABSTRACT

HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Proteins/classification , Eukaryota/genetics , Internet
9.
Nucleic Acids Res ; 40(Database issue): D565-70, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22123736

ABSTRACT

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Vocabulary, Controlled , Molecular Sequence Annotation/standards
10.
Nucleic Acids Res ; 40(Database issue): D841-6, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22121220

ABSTRACT

IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275,000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.


Subject(s)
Databases, Protein , Protein Interaction Mapping , Computer Graphics , Genes , Internet , Molecular Sequence Annotation , Sequence Analysis, Protein , Software
11.
Genome Biol ; 12(1): R7, 2011.
Article in English | MEDLINE | ID: mdl-21247460

ABSTRACT

BACKGROUND: Millions of humans and animals suffer from superficial infections caused by a group of highly specialized filamentous fungi, the dermatophytes, which exclusively infect keratinized host structures. To provide broad insights into the molecular basis of the pathogenicity-associated traits, we report the first genome sequences of two closely phylogenetically related dermatophytes, Arthroderma benhamiae and Trichophyton verrucosum, both of which induce highly inflammatory infections in humans. RESULTS: 97% of the 22.5 megabase genome sequences of A. benhamiae and T. verrucosum are unambiguously alignable and collinear. To unravel dermatophyte-specific virulence-associated traits, we compared sets of potentially pathogenicity-associated proteins, such as secreted proteases and enzymes involved in secondary metabolite production, with those of closely related onygenales (Coccidioides species) and the mould Aspergillus fumigatus. The comparisons revealed expansion of several gene families in dermatophytes and disclosed the peculiarities of the dermatophyte secondary metabolite gene sets. Secretion of proteases and other hydrolytic enzymes by A. benhamiae was proven experimentally by a global secretome analysis during keratin degradation. Molecular insights into the interaction of A. benhamiae with human keratinocytes were obtained for the first time by global transcriptome profiling. Given that A. benhamiae is able to undergo mating, a detailed comparison of the genomes further unraveled the genetic basis of sexual reproduction in this species. CONCLUSIONS: Our results enlighten the genetic basis of fundamental and putatively virulence-related traits of dermatophytes, advancing future research on these medically important pathogens.


Subject(s)
Arthrodermataceae/genetics , Arthrodermataceae/pathogenicity , Animals , Arthrodermataceae/classification , Arthrodermataceae/metabolism , Comparative Genomic Hybridization , Evolution, Molecular , Fungal Proteins/genetics , Fungal Proteins/metabolism , Gene Expression Regulation, Fungal , Genome, Fungal , Humans , Keratinocytes/metabolism , Keratinocytes/microbiology , Keratins/metabolism , Multigene Family , Peptide Hydrolases/genetics , Phylogeny , Transcriptome
12.
Cell Div ; 1: 3, 2006 Apr 03.
Article in English | MEDLINE | ID: mdl-16759348

ABSTRACT

In recent years, the general understanding of nutrient sensing and signalling, as well as the knowledge about responses triggered by altered nutrient availability have greatly advanced. While initial studies were directed to top-down elucidation of single nutrient-induced pathways, recent investigations place the individual signalling pathways into signalling networks and pursue the identification of converging effector branches that orchestrate the dynamical responses to nutritional cues. In this review, we focus on Rim15, a protein kinase required in yeast for the proper entry into stationary phase (G0). Recent studies revealed that the activity of Rim15 is regulated by the interplay of at least four intercepting nutrient-responsive pathways.

13.
EMBO J ; 24(24): 4271-8, 2005 Dec 21.
Article in English | MEDLINE | ID: mdl-16308562

ABSTRACT

Eukaryotic cell proliferation is controlled by growth factors and essential nutrients. In their absence, cells may enter into a quiescent state (G0). In Saccharomyces cerevisiae, the conserved protein kinase A (PKA) and rapamycin-sensitive TOR (TORC1) pathways antagonize G0 entry in response to carbon and/or nitrogen availability primarily by inhibiting the PAS kinase Rim15 function. Here, we show that the phosphate-sensing Pho80-Pho85 cyclin-cyclin-dependent kinase (CDK) complex also participates in Rim15 inhibition through direct phosphorylation, thereby effectively sequestering Rim15 in the cytoplasm via its association with 14-3-3 proteins. Inactivation of either Pho80-Pho85 or TORC1 causes dephosphorylation of the 14-3-3-binding site in Rim15, thus enabling nuclear import of Rim15 and induction of the Rim15-controlled G0 program. Importantly, we also show that Pho80-Pho85 and TORC1 converge on a single amino acid in Rim15. Thus, Rim15 plays a key role in G0 entry through its ability to integrate signaling from the PKA, TORC1, and Pho80-Pho85 pathways.


Subject(s)
Cyclin-Dependent Kinases/metabolism , Cyclins/chemistry , Repressor Proteins/chemistry , Resting Phase, Cell Cycle , Saccharomyces cerevisiae Proteins/chemistry , 14-3-3 Proteins/metabolism , Active Transport, Cell Nucleus , Binding Sites , Cell Nucleus/metabolism , Cyclic AMP-Dependent Protein Kinases/metabolism , Cyclin-Dependent Kinases/chemistry , Cyclins/metabolism , Cytoplasm/metabolism , Glutathione Transferase/metabolism , Green Fluorescent Proteins/metabolism , Immunoblotting , Immunoprecipitation , Models, Biological , Mutation , Phosphatidylinositol 3-Kinases/metabolism , Phosphorylation , Phosphotransferases (Alcohol Group Acceptor)/metabolism , Plasmids/metabolism , Protein Binding , Protein Kinases/metabolism , Protein Structure, Tertiary , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Signal Transduction , TOR Serine-Threonine Kinases , Time Factors
14.
EMBO J ; 24(17): 3000-11, 2005 Sep 07.
Article in English | MEDLINE | ID: mdl-16107882

ABSTRACT

Regulated interactions between microtubules (MTs) and the cell cortex control MT dynamics and position the mitotic spindle. In eukaryotic cells, the adenomatous polyposis coli/Kar9p and dynein/dynactin pathways are involved in guiding MT plus ends and MT sliding along the cortex, respectively. Here we identify Bud14p as a novel cortical activator of the dynein/dynactin complex in budding yeast. Bud14p accumulates at sites of polarized growth and the mother-bud neck during cytokinesis. The localization to bud and shmoo tips requires an intact actin cytoskeleton and the kelch-domain-containing proteins Kel1p and Kel2p. While cells lacking Bud14p function fail to stabilize the pre-anaphase spindle at the mother-bud neck, overexpression of Bud14p is toxic and leads to elongated astral MTs and increased dynein-dependent sliding along the cell cortex. Bud14p physically interacts with the type-I phosphatase Glc7p, and localizes Glc7p to the bud cortex. Importantly, the formation of Bud14p-Glc7p complexes is necessary to regulate MT dynamics at the cortex. Taken together, our results suggest that Bud14p functions as a regulatory subunit of the Glc7p type-I phosphatase to stabilize MT interactions specifically at sites of polarized growth.


Subject(s)
Dyneins/metabolism , Phosphoprotein Phosphatases/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Actin Cytoskeleton/metabolism , Adaptor Proteins, Signal Transducing , Amino Acid Motifs , Carrier Proteins/metabolism , Conserved Sequence , Cytokinesis , Dynactin Complex , Microtubule-Associated Proteins/metabolism , Microtubules/metabolism , Mitosis , Mutation , Phosphoprotein Phosphatases/genetics , Protein Binding , Protein Phosphatase 1 , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Spindle Apparatus
15.
Mol Cell Biol ; 25(1): 488-98, 2005 Jan.
Article in English | MEDLINE | ID: mdl-15601868

ABSTRACT

The Ccr4-Not complex is a conserved global regulator of gene expression, which serves as a regulatory platform that senses and/or transmits nutrient and stress signals to various downstream effectors. Presumed effectors of this complex in yeast are TFIID, a general transcription factor that associates with the core promoter, and Msn2, a key transcription factor that regulates expression of stress-responsive element (STRE)-controlled genes. Here we show that the constitutively high level of STRE-driven expression in ccr4-not mutants results from two independent effects. Accordingly, loss of Ccr4-Not function causes a dramatic Msn2-independent redistribution of TFIID on promoters with a particular bias for STRE-controlled over ribosomal protein gene promoters. In parallel, loss of Ccr4-Not complex function results in an alteration of the posttranslational modification status of Msn2, which depends on the type 1 protein phosphatase Glc7 and its newly identified subunit Bud14. Tests of epistasis as well as transcriptional analyses of Bud14-dependent transcription support a model in which the Ccr4-Not complex prevents activation of Msn2 via inhibition of the Bud14/Glc7 module in exponentially growing cells. Thus, increased activity of STRE genes in ccr4-not mutants may result from both altered general distribution of TFIID and unscheduled activation of Msn2.


Subject(s)
Cell Cycle Proteins/physiology , DNA-Binding Proteins/physiology , Ribonucleases/physiology , Saccharomyces cerevisiae Proteins/physiology , Transcription Factors/physiology , Transcriptional Activation , Cross-Linking Reagents/pharmacology , DNA/metabolism , Gene Expression Regulation , Genotype , Glucose/metabolism , Immunoblotting , Immunoprecipitation , Models, Biological , Mutation , Nucleic Acid Hybridization , Phosphoprotein Phosphatases/metabolism , Plasmids/metabolism , Promoter Regions, Genetic , Protein Binding , Protein Phosphatase 1 , Protein Processing, Post-Translational , RNA, Messenger/metabolism , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Time Factors , Transcription Factor TFIID/chemistry , Transcription, Genetic , Two-Hybrid System Techniques
16.
Mol Cell ; 12(6): 1607-13, 2003 Dec.
Article in English | MEDLINE | ID: mdl-14690612

ABSTRACT

The highly conserved Tor kinases (TOR) and the protein kinase A (PKA) pathway regulate cell proliferation in response to growth factors and/or nutrients. In Saccharomyces cerevisiae, loss of either TOR or PKA causes cells to arrest growth early in G(1) and to enter G(0) by mechanisms that are poorly understood. Here we demonstrate that the protein kinase Rim15 is required for entry into G(0) following inactivation of TOR and/or PKA. Induction of Rim15-dependent G(0) traits requires two discrete processes, i.e., nuclear accumulation of Rim15, which is negatively regulated both by a Sit4-independent TOR effector branch and the protein kinase B (PKB/Akt) homolog Sch9, and release from PKA-mediated inhibition of its protein kinase activity. Thus, Rim15 integrates signals from at least three nutrient-sensory kinases (TOR, PKA, and Sch9) to properly control entry into G(0), a key developmental process in eukaryotic cells.


Subject(s)
Cyclic AMP-Dependent Protein Kinases/metabolism , Phosphatidylinositol 3-Kinases/metabolism , Phosphotransferases (Alcohol Group Acceptor)/metabolism , Protein Kinases/metabolism , Resting Phase, Cell Cycle/physiology , Saccharomyces cerevisiae Proteins/metabolism , Signal Transduction/physiology , Animals , Antifungal Agents/metabolism , Gene Expression Regulation, Fungal , Glucose/metabolism , Phenotype , Phosphoprotein Phosphatases/metabolism , Phosphorylation , Protein Phosphatase 2 , Saccharomyces cerevisiae/physiology , Sirolimus/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...