Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Database (Oxford) ; 2017(1)2017 01 01.
Article in English | MEDLINE | ID: mdl-28365736

ABSTRACT

The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ). Database URL: http://www.ensembl.org.


Subject(s)
Databases, Nucleic Acid , Genome, Human , Molecular Sequence Annotation/methods , Sequence Analysis, DNA/methods , User-Computer Interface , Humans
2.
Database (Oxford) ; 2011: bar030, 2011.
Article in English | MEDLINE | ID: mdl-21785142

ABSTRACT

For a number of years the BioMart data warehousing system has proven to be a valuable resource for scientists seeking a fast and versatile means of accessing the growing volume of genomic data provided by the Ensembl project. The launch of the Ensembl Genomes project in 2009 complemented the Ensembl project by utilizing the same visualization, interactive and programming tools to provide users with a means for accessing genome data from a further five domains: protists, bacteria, metazoa, plants and fungi. The Ensembl and Ensembl Genomes BioMarts provide a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space. This article aims to give a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts as well as some useful examples and a description of current data content and future objectives. Database URLs: http://www.ensembl.org/biomart/martview/; http://metazoa.ensembl.org/biomart/martview/; http://plants.ensembl.org/biomart/martview/; http://protists.ensembl.org/biomart/martview/; http://fungi.ensembl.org/biomart/martview/; http://bacteria.ensembl.org/biomart/martview/.


Subject(s)
Classification/methods , Databases, Genetic , Information Storage and Retrieval/methods , Animals , Anopheles/genetics , Computational Biology , Genome/genetics , Humans , Open Reading Frames/genetics , Polymorphism, Single Nucleotide/genetics , Search Engine
3.
Nucleic Acids Res ; 39(Database issue): D800-6, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21045057

ABSTRACT

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.


Subject(s)
Databases, Genetic , Genomics , Animals , Genetic Variation , Humans , Mice , Molecular Sequence Annotation , Rats , Regulatory Sequences, Nucleic Acid , Software , Zebrafish/genetics
4.
BMC Genomics ; 11: 294, 2010 May 11.
Article in English | MEDLINE | ID: mdl-20459806

ABSTRACT

BACKGROUND: Gene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases. RESULTS: We have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor. CONCLUSIONS: Consistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.


Subject(s)
Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Alleles , Animals , Computational Biology , Data Mining , Gene Expression Profiling/standards , Humans , Internet , Mice , Oligonucleotide Array Sequence Analysis/standards , Programming Languages , Quality Control , RNA, Messenger/genetics , Rats
5.
Genome Med ; 2(4): 24, 2010 Apr 15.
Article in English | MEDLINE | ID: mdl-20398331

ABSTRACT

As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org.

6.
Brief Bioinform ; 9(6): 532-44, 2008 Nov.
Article in English | MEDLINE | ID: mdl-19112082

ABSTRACT

The torrent of data emerging from the application of new technologies to functional genomics and systems biology can no longer be contained within the traditional modes of data sharing and publication with the consequence that data is being deposited in, distributed across and disseminated through an increasing number of databases. The resulting fragmentation poses serious problems for the model organism community which increasingly rely on data mining and computational approaches that require gathering of data from a range of sources. In the light of these problems, the European Commission has funded a coordination action, CASIMIR (coordination and sustainability of international mouse informatics resources), with a remit to assess the technical and social aspects of database interoperability that currently prevent the full realization of the potential of data integration in mouse functional genomics. In this article, we assess the current problems with interoperability, with particular reference to mouse functional genomics, and critically review the technologies that can be deployed to overcome them. We describe a typical use-case where an investigator wishes to gather data on variation, genomic context and metabolic pathway involvement for genes discovered in a genome-wide screen. We go on to develop an automated approach involving an in silico experimental workflow tool, Taverna, using web services, BioMart and MOLGENIS technologies for data retrieval. Finally, we focus on the current impediments to adopting such an approach in a wider context, and strategies to overcome them.


Subject(s)
Computational Biology/methods , Database Management Systems , Databases, Genetic , Genomics/methods , Animals , Humans , Information Storage and Retrieval , Mice , Software , User-Computer Interface
7.
Mamm Genome ; 18(3): 157-63, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17436037

ABSTRACT

Understanding the functions encoded in the mouse genome will be central to an understanding of the genetic basis of human disease. To achieve this it will be essential to be able to characterize the phenotypic consequences of variation and alterations in individual genes. Data on the phenotypes of mouse strains are currently held in a number of different forms (detailed descriptions of mouse lines, first-line phenotyping data on novel mutations, data on the normal features of inbred lines) at many sites worldwide. For the most efficient use of these data sets, we have initiated a process to develop standards for the description of phenotypes (using ontologies) and file formats for the description of phenotyping protocols and phenotype data sets. This process is ongoing and needs to be supported by the wider mouse genetics and phenotyping communities to succeed. We invite interested parties to contact us as we develop this process further.


Subject(s)
Databases, Genetic , Mice/genetics , Animals , Genomics , Mice, Inbred Strains/genetics , Mice, Mutant Strains/genetics , Phenotype
8.
Genome Res ; 14(5): 929-33, 2004 May.
Article in English | MEDLINE | ID: mdl-15123588

ABSTRACT

Systems for managing genomic data must store a vast quantity of information. Ensembl stores these data in several MySQL databases. The core software libraries provide a practical and effective means for programmers to access these data. By encapsulating the underlying database structure, the libraries present end users with a simple, abstract interface to a complex data model. Programs that use the libraries rather than SQL to access the data are unaffected by most schema changes. The architecture of the core software libraries, the schema, and the factors influencing their design are described. All code and data are freely available.


Subject(s)
Computational Biology , Software , Animals , Databases, Genetic , Humans , Software Design
9.
Genome Res ; 14(5): 925-8, 2004 May.
Article in English | MEDLINE | ID: mdl-15078858

ABSTRACT

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Subject(s)
Computational Biology/trends
SELECTION OF CITATIONS
SEARCH DETAIL
...