Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
Add more filters










Publication year range
1.
Genetics ; 224(1)2023 05 04.
Article in English | MEDLINE | ID: mdl-36607068

ABSTRACT

As one of the first model organism knowledgebases, Saccharomyces Genome Database (SGD) has been supporting the scientific research community since 1993. As technologies and research evolve, so does SGD: from updates in software architecture, to curation of novel data types, to incorporation of data from, and collaboration with, other knowledgebases. We are continuing to make steps toward providing the community with an S. cerevisiae pan-genome. Here, we describe software upgrades, a new nomenclature system for genes not found in the reference strain, and additions to gene pages. With these improvements, we aim to remain a leading resource for students, researchers, and the broader scientific community.


Subject(s)
Saccharomyces , Humans , Saccharomyces/genetics , Saccharomyces cerevisiae/genetics , Genome, Fungal , Databases, Genetic , Software
2.
Genetics ; 220(4)2022 04 04.
Article in English | MEDLINE | ID: mdl-34897464

ABSTRACT

Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.


Subject(s)
Saccharomyces , Alleles , Databases, Genetic , Genome, Fungal , Humans , Saccharomyces/genetics , Saccharomyces cerevisiae/genetics
3.
Open Biol ; 10(9): 200149, 2020 09.
Article in English | MEDLINE | ID: mdl-32875947

ABSTRACT

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.


Subject(s)
Computational Biology/methods , Gene Ontology , Molecular Sequence Annotation , Databases, Genetic , Evolution, Molecular , Genome, Fungal , Genomics/methods , Quality Control , Schizosaccharomyces/genetics , Web Browser , Workflow
4.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-32559296

ABSTRACT

Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.


Subject(s)
Databases, Genetic , Genomics , Molecular Sequence Annotation/methods , Gene Ontology , Information Storage and Retrieval
5.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-32128557

ABSTRACT

The identification and accurate quantitation of protein abundance has been a major objective of proteomics research. Abundance studies have the potential to provide users with data that can be used to gain a deeper understanding of protein function and regulation and can also help identify cellular pathways and modules that operate under various environmental stress conditions. One of the central missions of the Saccharomyces Genome Database (SGD; https://www.yeastgenome.org) is to work with researchers to identify and incorporate datasets of interest to the wider scientific community, thereby enabling hypothesis-driven research. A large number of studies have detailed efforts to generate proteome-wide abundance data, but deeper analyses of these data have been hampered by the inability to compare results between studies. Recently, a unified protein abundance dataset was generated through the evaluation of more than 20 abundance datasets, which were normalized and converted to common measurement units, in this case molecules per cell. We have incorporated these normalized protein abundance data and associated metadata into the SGD database, as well as the SGD YeastMine data warehouse, resulting in the addition of 56 487 values for untreated cells grown in either rich or defined media and 28 335 values for cells treated with environmental stressors. Abundance data for protein-coding genes are displayed in a sortable, filterable table on Protein pages, available through Locus Summary pages. A median abundance value was incorporated, and a median absolute deviation was calculated for each protein-coding gene and incorporated into SGD. These values are displayed in the Protein section of the Locus Summary page. The inclusion of these data has enhanced the quality and quantity of protein experimental information presented at SGD and provides opportunities for researchers to access and utilize the data to further their research.


Subject(s)
Genome, Fungal/genetics , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Databases, Genetic , Genomics/methods , Internet , Proteome/genetics , Proteome/metabolism , Proteomics/methods , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , User-Computer Interface
6.
Nucleic Acids Res ; 48(D1): D743-D748, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31612944

ABSTRACT

The Saccharomyces Genome Database (SGD; www.yeastgenome.org) maintains the official annotation of all genes in the Saccharomyces cerevisiae reference genome and aims to elucidate the function of these genes and their products by integrating manually curated experimental data. Technological advances have allowed researchers to profile RNA expression and identify transcripts at high resolution. These data can be configured in web-based genome browser applications for display to the general public. Accordingly, SGD has incorporated published transcript isoform data in our instance of JBrowse, a genome visualization platform. This resource will help clarify S. cerevisiae biological processes by furthering studies of transcriptional regulation, untranslated regions, genome engineering, and expression quantification in S. cerevisiae.


Subject(s)
Genome, Fungal , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Transcriptome , Computational Biology/methods , Databases, Genetic , Genomics , Molecular Sequence Annotation , Open Reading Frames , Protein Isoforms , RNA-Seq , Reference Values , User-Computer Interface , Web Browser
7.
Database (Oxford) ; 20192019 01 01.
Article in English | MEDLINE | ID: mdl-30715275

ABSTRACT

High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.


Subject(s)
Databases, Genetic , Gene Ontology , Genomics/methods , Molecular Sequence Annotation/methods , Animals , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA
8.
Database (Oxford) ; 20192019 01 01.
Article in English | MEDLINE | ID: mdl-30715277

ABSTRACT

Proteins seldom function individually. Instead, they interact with other proteins or nucleic acids to form stable macromolecular complexes that play key roles in important cellular processes and pathways. One of the goals of Saccharomyces Genome Database (SGD; www.yeastgenome.org) is to provide a complete picture of budding yeast biological processes. To this end, we have collaborated with the Molecular Interactions team that provides the Complex Portal database at EMBL-EBI to manually curate the complete yeast complexome. These data, from a total of 589 complexes, were previously available only in SGD's YeastMine data warehouse (yeastmine.yeastgenome.org) and the Complex Portal (www.ebi.ac.uk/complexportal). We have now incorporated these macromolecular complex data into the SGD core database and designed complex-specific reports to make these data easily available to researchers. These web pages contain referenced summaries focused on the composition and function of individual complexes. In addition, detailed information about how subunits interact within the complex, their stoichiometry and the physical structure are displayed when such information is available. Finally, we generate network diagrams displaying subunits and Gene Ontology annotations that are shared between complexes. Information on macromolecular complexes will continue to be updated in collaboration with the Complex Portal team and curated as more data become available.


Subject(s)
DNA, Fungal , Databases, Genetic , Fungal Proteins , Genome, Fungal/genetics , Saccharomyces/genetics , DNA, Fungal/chemistry , DNA, Fungal/genetics , DNA, Fungal/metabolism , Fungal Proteins/chemistry , Fungal Proteins/genetics , Fungal Proteins/metabolism , Genomics
9.
Lab Anim (NY) ; 47(10): 277-289, 2018 10.
Article in English | MEDLINE | ID: mdl-30224793

ABSTRACT

Model organism databases (MODs) have been collecting and integrating biomedical research data for 30 years and were designed to meet specific needs of each model organism research community. The contributions of model organism research to understanding biological systems would be hard to overstate. Modern molecular biology methods and cost reductions in nucleotide sequencing have opened avenues for direct application of model organism research to elucidating mechanisms of human diseases. Thus, the mandate for model organism research and databases has now grown to include facilitating use of these data in translational applications. Challenges in meeting this opportunity include the distribution of research data across many databases and websites, a lack of data format standards for some data types, and sustainability of scale and cost for genomic database resources like MODs. The issues of widely distributed data and application of data standards are some of the challenges addressed by FAIR (Findable, Accessible, Interoperable, and Re-usable) data principles. The Alliance of Genome Resources is now moving to address these challenges by bringing together expertly curated research data from fly, mouse, rat, worm, yeast, zebrafish, and the Gene Ontology consortium. Centralized multi-species data access, integration, and format standardization will lower the data utilization barrier in comparative genomics and translational applications and will provide a framework in which sustainable scale and cost can be addressed. This article presents a brief historical perspective on how the Alliance model organisms are complementary and how they have already contributed to understanding the etiology of human diseases. In addition, we discuss four challenges for using data from MODs in translational applications and how the Alliance is working to address them, in part by applying FAIR data principles. Ultimately, combined data from these animal models are more powerful than the sum of the parts.


Subject(s)
Animals, Laboratory , Databases as Topic , Translational Research, Biomedical/methods , Animals , Models, Animal
10.
Methods Mol Biol ; 1757: 21-30, 2018.
Article in English | MEDLINE | ID: mdl-29761454

ABSTRACT

The Saccharomyces Genome Database (SGD) is a well-established, key resource for researchers studying Saccharomyces cerevisiae. In addition to updating and maintaining the official genomic sequence of this highly studied organism, SGD provides integrated data regarding gene functions and phenotypes, which are extracted from the published literature. The vast amount and variety of data housed in the database can prove challenging to navigate for the first-time user. Therefore, this chapter serves as an introduction describing how to search the database in order to discover new information. We introduce the different types of pages on the website, and describe how to manipulate the tables and diagrams therein to display, download, or analyze the data using various SGD tools.


Subject(s)
Databases, Genetic , Genome, Fungal , Genomics , Saccharomyces/genetics , Computational Biology/methods , Gene Ontology , Genes, Fungal , Genomics/methods , Molecular Sequence Annotation , Phenotype , Software , Web Browser
12.
Nucleic Acids Res ; 46(D1): D736-D742, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29140510

ABSTRACT

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD.


Subject(s)
Databases, Genetic , Genome, Fungal , Saccharomyces cerevisiae/genetics , Forecasting , Gene Ontology , Genes, Fungal , Genome, Human , Humans , Mutation , Species Specificity
13.
Database (Oxford) ; 2017(1)2017 01 01.
Article in English | MEDLINE | ID: mdl-28365719

ABSTRACT

The Saccharomyces Genome Database (SGD; www.yeastgenome.org ), the primary genetics and genomics resource for the budding yeast S. cerevisiae , provides free public access to expertly curated information about the yeast genome and its gene products. As the central hub for the yeast research community, SGD engages in a variety of social outreach efforts to inform our users about new developments, promote collaboration, increase public awareness of the importance of yeast to biomedical research, and facilitate scientific discovery. Here we describe these various outreach methods, from networking at scientific conferences to the use of online media such as blog posts and webinars, and include our perspectives on the benefits provided by outreach activities for model organism databases. Database URL: http://www.yeastgenome.org.


Subject(s)
Biomedical Research/education , Databases, Genetic , Genome, Fungal , Saccharomyces cerevisiae/genetics , Blogging , Congresses as Topic
14.
Database (Oxford) ; 2017(1)2017 01 01.
Article in English | MEDLINE | ID: mdl-28365727

ABSTRACT

Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; www.yeastgenome.org ) has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research. Database URL: : www.yeastgenome.org.


Subject(s)
Databases, Protein , Genome, Fungal , Molecular Sequence Annotation , Proteome , Saccharomyces cerevisiae Proteins , Saccharomyces cerevisiae , Proteome/genetics , Proteome/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
15.
Nucleic Acids Res ; 45(D1): D128-D134, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27794554

ABSTRACT

RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/.


Subject(s)
Databases, Nucleic Acid , RNA, Untranslated/chemistry , Animals , Genomics , Humans , Nucleotides/chemistry , Sequence Analysis, RNA , Species Specificity
16.
Article in English | MEDLINE | ID: mdl-27252399

ABSTRACT

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org.


Subject(s)
Databases, Genetic , Genome, Fungal/genetics , Genomics/methods , Saccharomyces/genetics , Molecular Sequence Annotation , Reproducibility of Results , Saccharomyces cerevisiae/genetics , User-Computer Interface
17.
Article in English | MEDLINE | ID: mdl-26989152

ABSTRACT

In recent years, thousands of Saccharomyces cerevisiae genomes have been sequenced to varying degrees of completion. The Saccharomyces Genome Database (SGD) has long been the keeper of the original eukaryotic reference genome sequence, which was derived primarily from S. cerevisiae strain S288C. Because new technologies are pushing S. cerevisiae annotation past the limits of any system based exclusively on a single reference sequence, SGD is actively working to expand the original S. cerevisiae systematic reference sequence from a single genome to a multi-genome reference panel. We first commissioned the sequencing of additional genomes and their automated analysis using the AGAPE pipeline. Here we describe our curation strategy to produce manually reviewed high-quality genome annotations in order to elevate 11 of these additional genomes to Reference status. Database URL: http://www.yeastgenome.org/.


Subject(s)
Genome, Fungal , Saccharomyces cerevisiae/genetics , Automation , Base Sequence , Data Mining , Databases, Genetic , Open Reading Frames/genetics , Reference Standards
18.
Nucleic Acids Res ; 44(D1): D698-702, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26578556

ABSTRACT

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer.


Subject(s)
Databases, Genetic , Genetic Variation , Genome, Fungal , Saccharomyces cerevisiae/genetics , Molecular Sequence Annotation , Sequence Alignment , Sequence Analysis, DNA , Sequence Analysis, Protein , User-Computer Interface
20.
PLoS One ; 10(3): e0120671, 2015.
Article in English | MEDLINE | ID: mdl-25781462

ABSTRACT

The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.


Subject(s)
Contig Mapping/methods , Genome, Fungal , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA/methods , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...