Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 44
Filter
Add more filters










Publication year range
1.
Genetics ; 227(1)2024 05 07.
Article in English | MEDLINE | ID: mdl-38531069

ABSTRACT

Mouse Genome Informatics (MGI) is a federation of expertly curated information resources designed to support experimental and computational investigations into genetic and genomic aspects of human biology and disease using the laboratory mouse as a model system. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are core MGI databases that share data and system architecture. MGI serves as the central community resource of integrated information about mouse genome features, variation, expression, gene function, phenotype, and human disease models acquired from peer-reviewed publications, author submissions, and major bioinformatics resources. To facilitate integration and standardization of data, biocuration scientists annotate using terms from controlled metadata vocabularies and biological ontologies (e.g. Mammalian Phenotype Ontology, Mouse Developmental Anatomy, Disease Ontology, Gene Ontology, etc.), and by applying international community standards for gene, allele, and mouse strain nomenclature. MGI serves basic scientists, translational researchers, and data scientists by providing access to FAIR-compliant data in both human-readable and compute-ready formats. The MGI resource is accessible at https://informatics.jax.org. Here, we present an overview of the core data types represented in MGI and highlight recent enhancements to the resource with a focus on new data and functionality for MGD and GXD.


Subject(s)
Databases, Genetic , Genome , Animals , Mice , Knowledge Bases , Genomics/methods , Computational Biology/methods , Humans
2.
Mamm Genome ; 33(1): 4-18, 2022 03.
Article in English | MEDLINE | ID: mdl-34698891

ABSTRACT

The Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards. MGI's mission is to facilitate the use of the mouse as an experimental model for understanding the genetic and genomic basis of human health and disease. MGI is the authoritative source for mouse gene, allele, and strain nomenclature and is the primary source of mouse phenotype annotations, functional annotations, developmental gene expression information, and annotations of mouse models with human diseases. MGI maintains mouse anatomy and phenotype ontologies and contributes to the development of the Gene Ontology and Disease Ontology and uses these ontologies as standard terminologies for annotation. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are MGI's two major knowledgebases. Here, we highlight some of the recent changes and enhancements to MGD and GXD that have been implemented in response to changing needs of the biomedical research community and to improve the efficiency of expert curation. MGI can be accessed freely at http://www.informatics.jax.org .


Subject(s)
Databases, Genetic , Ecosystem , Alleles , Animals , Gene Ontology , Genomics , Mice
3.
Mamm Genome ; 33(1): 55-65, 2022 03.
Article in English | MEDLINE | ID: mdl-34482425

ABSTRACT

Recombinase alleles and transgenes can be used to facilitate spatio-temporal specificity of gene disruption or transgene expression. However, the versatility of this in vivo recombination system relies on having detailed and accurate characterization of recombinase expression and activity to enable selection of the appropriate allele or transgene. The CrePortal ( http://www.informatics.jax.org/home/recombinase ) leverages the informatics infrastructure of Mouse Genome Informatics to integrate data from the scientific literature, direct data submissions from the scientific community at-large, and from major projects developing new recombinase lines and characterizing recombinase expression and specificity patterns. Searching the CrePortal by recombinase activity or specific recombinase gene driver provides users with a recombinase alleles and transgenes activity tissue summary and matrix comparison of gene expression and recombinase activity with links to generation details, a recombinase activity grid, and associated phenotype annotations. Future improvements will add cell type-based activity annotations. The CrePortal provides a comprehensive presentation of recombinase allele and transgene data to assist researchers in selection of the recombinase allele or transgene based on where and when recombination is desired.


Subject(s)
Integrases , Recombinases , Alleles , Animals , Integrases/genetics , Integrases/metabolism , Mice , Mice, Transgenic , Recombinases/genetics , Transgenes
5.
Bioinformatics ; 37(Suppl_1): i468-i476, 2021 07 12.
Article in English | MEDLINE | ID: mdl-34252939

ABSTRACT

MOTIVATION: Biomedical research findings are typically disseminated through publications. To simplify access to domain-specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature-a labor intensive process. The first step toward biocuration requires identifying articles relevant to the specific area on which the database focuses. Thus, automatically identifying publications relevant to a specific topic within a large volume of publications is an important task toward expediting the biocuration process and, in turn, biomedical research. Current methods focus on textual contents, typically extracted from the title-and-abstract. Notably, images and captions are often used in publications to convey pivotal evidence about processes, experiments and results. RESULTS: We present a new document classification scheme, using both image and caption information, in addition to titles-and-abstracts. To use the image information, we introduce a new image representation, namely Figure-word, based on class labels of subfigures. We use word embeddings for representing captions and titles-and-abstracts. To utilize all three types of information, we introduce two information integration methods. The first combines Figure-words and textual features obtained from captions and titles-and-abstracts into a single larger vector for document representation; the second employs a meta-classification scheme. Our experiments and results demonstrate the usefulness of the newly proposed Figure-words for representing images. Moreover, the results showcase the value of Figure-words, captions and titles-and-abstracts in providing complementary information for document classification; these three sources of information when combined, lead to an overall improved classification performance. AVAILABILITY AND IMPLEMENTATION: Source code and the list of PMIDs of the publications in our datasets are available upon request.


Subject(s)
Biomedical Research , Databases, Factual
6.
Nucleic Acids Res ; 49(D1): D924-D931, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33104772

ABSTRACT

The Gene Expression Database (GXD; www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental gene expression information. For many years, GXD has collected and integrated data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot, and western blot experiments through curation of the scientific literature and by collaborations with large-scale expression projects. Since our last report in 2019, we have continued to acquire these classical types of expression data; developed a searchable index of RNA-Seq and microarray experiments that allows users to quickly and reliably find specific mouse expression studies in ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) and GEO (https://www.ncbi.nlm.nih.gov/geo/); and expanded GXD to include RNA-Seq data. Uniformly processed RNA-Seq data are imported from the EBI Expression Atlas and then integrated with the other types of expression data in GXD, and with the genetic, functional, phenotypic and disease-related information in Mouse Genome Informatics (MGI). This integration has made the RNA-Seq data accessible via GXD's enhanced searching and filtering capabilities. Further, we have embedded the Morpheus heat map utility into the GXD user interface to provide additional tools for display and analysis of RNA-Seq data, including heat map visualization, sorting, filtering, hierarchical clustering, nearest neighbors analysis and visual enrichment.


Subject(s)
Databases, Genetic , Gene Expression Profiling/methods , Gene Expression , High-Throughput Nucleotide Sequencing/methods , Animals , Cluster Analysis , Internet , Mice , Proteins/genetics , Proteins/metabolism , User-Computer Interface
8.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-32294192

ABSTRACT

Gathering information from the scientific literature is essential for biomedical research, as much knowledge is conveyed through publications. However, the large and rapidly increasing publication rate makes it impractical for researchers to quickly identify all and only those documents related to their interest. As such, automated biomedical document classification attracts much interest. Such classification is critical in the curation of biological databases, because biocurators must scan through a vast number of articles to identify pertinent information within documents most relevant to the database. This is a slow, labor-intensive process that can benefit from effective automation.We present a document classification scheme aiming to identify papers containing information relevant to a specific topic, among a large collection of articles, for supporting the biocuration classification task. Our framework is based on a meta-classification scheme we have introduced before; here we incorporate into it features gathered from figure captions, in addition to those obtained from titles and abstracts. We trained and tested our classifier over a large imbalanced dataset, originally curated by the Gene Expression Database (GXD). GXD collects all the gene expression information in the Mouse Genome Informatics (MGI) resource. As part of the MGI literature classification pipeline, GXD curators identify MGI-selected papers that are relevant for GXD. The dataset consists of ~60 000 documents (5469 labeled as relevant; 52 866 as irrelevant), gathered throughout 2012-2016, in which each document is represented by the text of its title, abstract and figure captions. Our classifier attains precision 0.698, recall 0.784, f-measure 0.738 and Matthews correlation coefficient 0.711, demonstrating that the proposed framework effectively addresses the high imbalance in the GXD classification task. Moreover, our classifier's performance is significantly improved by utilizing information from image captions compared to using titles and abstracts alone; this observation clearly demonstrates that image captions provide substantial information for supporting biomedical document classification and curation.Database URL.


Subject(s)
Biomedical Research/statistics & numerical data , Computational Biology/methods , Data Curation/methods , Databases, Factual , Animals , Biomedical Research/classification , Biomedical Research/methods , Computational Biology/classification , Data Mining/methods , Humans , Internet
9.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-32140729

ABSTRACT

The Gene Expression Database (GXD), an extensive community resource of curated expression information for the mouse, has developed an RNA-Seq and Microarray Experiment Search (http://www.informatics.jax.org/gxd/htexp_index). This tool allows users to quickly and reliably find specific experiments in ArrayExpress and the Gene Expression Omnibus (GEO) that study endogenous gene expression in wild-type and mutant mice. Standardized metadata annotations, curated by GXD, allow users to specify the anatomical structure, developmental stage, mutated gene, strain and sex of samples of interest, as well as the study type and key parameters of the experiment. These searches, powered by controlled vocabularies and ontologies, can be combined with free text searching of experiment titles and descriptions. Search result summaries include link-outs to ArrayExpress and GEO, providing easy access to the expression data itself. Links to the PubMed entries for accompanying publications are also included. More information about this tool and GXD can be found at the GXD home page (http://www.informatics.jax.org/expression.shtml). Database URL: http://www.informatics.jax.org/expression.shtml.


Subject(s)
Data Curation/methods , Databases, Genetic , Gene Expression Profiling/methods , Metadata , Oligonucleotide Array Sequence Analysis/methods , RNA-Seq/methods , Animals , Information Storage and Retrieval/methods , Internet , Mice , User-Computer Interface
10.
Database (Oxford) ; 20192019 01 01.
Article in English | MEDLINE | ID: mdl-31032839

ABSTRACT

Published literature is an important source of knowledge supporting biomedical research. Given the large and increasing number of publications, automated document classification plays an important role in biomedical research. Effective biomedical document classifiers are especially needed for bio-databases, in which the information stems from many thousands of biomedical publications that curators must read in detail and annotate. In addition, biomedical document classification often amounts to identifying a small subset of relevant publications within a much larger collection of available documents. As such, addressing class imbalance is essential to a practical classifier. We present here an effective classification scheme for automatically identifying papers among a large pool of biomedical publications that contain information relevant to a specific topic, which the curators are interested in annotating. The proposed scheme is based on a meta-classification framework using cluster-based under-sampling combined with named-entity recognition and statistical feature selection strategies. We examined the performance of our method over a large imbalanced data set that was originally manually curated by the Jackson Laboratory's Gene Expression Database (GXD). The set consists of more than 90 000 PubMed abstracts, of which about 13 000 documents are labeled as relevant to GXD while the others are not relevant. Our results, 0.72 precision, 0.80 recall and 0.75 f-measure, demonstrate that our proposed classification scheme effectively categorizes such a large data set in the face of data imbalance.


Subject(s)
Databases, Nucleic Acid , Information Dissemination , Polymorphism, Single Nucleotide , Programming Languages
11.
Nucleic Acids Res ; 47(D1): D774-D779, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30335138

ABSTRACT

The mouse Gene Expression Database (GXD) is an extensive, well-curated community resource freely available at www.informatics.jax.org/expression.shtml. Covering all developmental stages, GXD includes data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot and western blot experiments in wild-type and mutant mice. GXD's gene expression information is integrated with the other data in Mouse Genome Informatics and interconnected with other databases, placing these data in the larger biological and biomedical context. Since the last report, the ability of GXD to provide insights into the molecular mechanisms of development and disease has been greatly enhanced by the addition of new data and by the implementation of new web features. These include: improvements to the Differential Gene Expression Data Search, facilitating searches for genes that have been shown to be exclusively expressed in a specified structure and/or developmental stage; an enhanced anatomy browser that now provides access to expression data and phenotype data for a given anatomical structure; direct access to the wild-type gene expression data for the tissues affected in a specific mutant; and a comparison matrix that juxtaposes tissues where a gene is normally expressed against tissues, where mutations in that gene cause abnormalities.


Subject(s)
Databases, Genetic , Genome/genetics , Transcriptome/genetics , Animals , Internet , Mice , User-Computer Interface
12.
Database (Oxford) ; 2017(1)2017 01 01.
Article in English | MEDLINE | ID: mdl-28365740

ABSTRACT

The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area. Database URL: www.informatics.jax.org.


Subject(s)
Data Curation , Data Mining/methods , Databases, Genetic , Gene Expression Regulation , Animals , Mice
13.
Methods Mol Biol ; 1488: 47-73, 2017.
Article in English | MEDLINE | ID: mdl-27933520

ABSTRACT

The Mouse Genome Informatics (MGI), resource ( www.informatics.jax.org ) has existed for over 25 years, and over this time its data content, informatics infrastructure, and user interfaces and tools have undergone dramatic changes (Eppig et al., Mamm Genome 26:272-284, 2015). Change has been driven by scientific methodological advances, rapid improvements in computational software, growth in computer hardware capacity, and the ongoing collaborative nature of the mouse genomics community in building resources and sharing data. Here we present an overview of the current data content of MGI, describe its general organization, and provide examples using simple and complex searches, and tools for mining and retrieving sets of data.


Subject(s)
Computational Biology/methods , Genome , Genomics , Animals , Data Mining/methods , Databases, Genetic , Genomics/methods , Mice , Research , Software , Translational Research, Biomedical/methods , User-Computer Interface , Web Browser
14.
Nucleic Acids Res ; 45(D1): D730-D736, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899677

ABSTRACT

The Gene Expression Database (GXD; www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental expression information. Through curation of the scientific literature and by collaborations with large-scale expression projects, GXD collects and integrates data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot and western blot experiments. Expression data from both wild-type and mutant mice are included. The expression data are combined with genetic and phenotypic data in Mouse Genome Informatics (MGI) and made readily accessible to many types of database searches. At present, GXD includes over 1.5 million expression results and more than 300 000 images, all annotated with detailed and standardized metadata. Since our last report in 2014, we have added a large amount of data, we have enhanced data and database infrastructure, and we have implemented many new search and display features. Interface enhancements include: a new Mouse Developmental Anatomy Browser; interactive tissue-by-developmental stage and tissue-by-gene matrix views; capabilities to filter and sort expression data summaries; a batch search utility; gene-based expression overviews; and links to expression data from other species.


Subject(s)
Computational Biology/methods , Databases, Genetic , Gene Expression Profiling/methods , Gene Expression , Genomics/methods , Animals , Gene Ontology , Mice , Organ Specificity , Search Engine , User-Computer Interface , Web Browser
15.
Mamm Genome ; 26(7-8): 272-84, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26238262

ABSTRACT

From its inception in 1989, the mission of the Mouse Genome Informatics (MGI) resource remains to integrate genetic, genomic, and biological data about the laboratory mouse to facilitate the study of human health and disease. This mission is ever more feasible as the revolution in genetics knowledge, the ability to sequence genomes, and the ability to specifically manipulate mammalian genomes are now at our fingertips. Through major paradigm shifts in biological research and computer technologies, MGI has adapted and evolved to become an integral part of the larger global bioinformatics infrastructure and honed its ability to provide authoritative reference datasets used and incorporated by many other established bioinformatics resources. Here, we review some of the major changes in research approaches over that last quarter century, how these changes are reflected in the MGI resource you use today, and what may be around the next corner.


Subject(s)
Databases, Genetic/history , Genome , Genomics/history , Software , Animals , Databases, Genetic/supply & distribution , Disease Models, Animal , Genomics/methods , Genomics/trends , Genotype , History, 20th Century , History, 21st Century , Humans , Mice , Mutagenesis, Site-Directed , Phenotype , Reverse Genetics
16.
Mamm Genome ; 26(7-8): 305-13, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26223881

ABSTRACT

The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.


Subject(s)
Databases, Genetic/history , Genome , Genomics/methods , Sequence Homology, Amino Acid , Alleles , Animals , Disease Models, Animal , Genomics/history , Genotype , History, 20th Century , History, 21st Century , Humans , Mice , Molecular Sequence Annotation , Phenotype , Phylogeny
17.
Mamm Genome ; 26(9-10): 422-30, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26208972

ABSTRACT

Mouse anatomy ontologies provide standard nomenclature for describing normal and mutant mouse anatomy, and are essential for the description and integration of data directly related to anatomy such as gene expression patterns. Building on our previous work on anatomical ontologies for the embryonic and adult mouse, we have recently developed a new and substantially revised anatomical ontology covering all life stages of the mouse. Anatomical terms are organized in complex hierarchies enabling multiple relationships between terms. Tissue classification as well as partonomic, developmental, and other types of relationships can be represented. Hierarchies for specific developmental stages can also be derived. The ontology forms the core of the eMouse Atlas Project (EMAP) and is used extensively for annotating and integrating gene expression patterns and other data by the Gene Expression Database (GXD), the eMouse Atlas of Gene Expression (EMAGE) and other database resources. Here we illustrate the evolution of the developmental and adult mouse anatomical ontologies toward one combined system. We report on recent ontology enhancements, describe the current status, and discuss future plans for mouse anatomy ontology development and application in integrating data resources.


Subject(s)
Computational Biology , Organ Specificity/genetics , Software , Animals , Databases, Genetic , Gene Expression Regulation, Developmental , Mice
18.
Genesis ; 53(8): 510-22, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26045019

ABSTRACT

The Gene Expression Database (GXD) is an extensive and freely available community resource of mouse developmental expression data. GXD curates and integrates expression data from the literature, via electronic data submissions, and by collaborations with large-scale projects. As an integral component of the Mouse Genome Informatics Resource, GXD combines expression data with genetic, functional, phenotypic, and disease-related data, and provides tools for the research community to search for and analyze expression data in this larger context. Recent enhancements include: an interactive browser to navigate the mouse developmental anatomy and find expression data for specific anatomical structures; the capability to search for expression data of genes located in specific genomic regions, supporting the identification of disease candidate genes; a summary displaying all the expression images that meet specified search criteria; interactive matrix views that provide overviews of spatio-temporal expression patterns (Tissue × Stage Matrix) and enable the comparison of expression patterns between genes (Tissue × Gene Matrix); data zoom and filter utilities to iteratively refine summary displays and data sets; and gene-based links to expression data from other model organisms, such as chicken, Xenopus, and zebrafish, fostering comparative expression analysis for species that are highly relevant for developmental research.


Subject(s)
Databases, Genetic , Gene Expression Profiling/methods , Mice/genetics , Animals , Data Curation , Genomics/methods , Internet , Models, Animal
19.
Mamm Genome ; 26(7-8): 314-24, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25939429

ABSTRACT

The Gene Expression Database (GXD) is an extensive, easily searchable, and freely available database of mouse gene expression information (www.informatics.jax.org/expression.shtml). GXD was developed to foster progress toward understanding the molecular basis of human development and disease. GXD contains information about when and where genes are expressed in different tissues in the mouse, especially during the embryonic period. GXD collects different types of expression data from wild-type and mutant mice, including RNA in situ hybridization, immunohistochemistry, RT-PCR, and northern and western blot results. The GXD curators read the scientific literature and enter the expression data from those papers into the database. GXD also acquires expression data directly from researchers, including groups doing large-scale expression studies. GXD currently contains nearly 1.5 million expression results for over 13,900 genes. In addition, it has over 265,000 images of expression data, allowing users to retrieve the primary data and interpret it themselves. By being an integral part of the larger Mouse Genome Informatics (MGI) resource, GXD's expression data are combined with other genetic, functional, phenotypic, and disease-oriented data. This allows GXD to provide tools for researchers to evaluate expression data in the larger context, search by a wide variety of biologically and biomedically relevant parameters, and discover new data connections to help in the design of new experiments. Thus, GXD can provide researchers with critical insights into the functions of genes and the molecular mechanisms of development, differentiation, and disease.


Subject(s)
Data Mining/methods , Databases, Genetic , Genome , User-Computer Interface , Animals , Embryo, Mammalian , Gene Expression , Genetic Markers , Humans , Information Dissemination , Mice , Organ Specificity
20.
Dev Dyn ; 243(10): 1176-86, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24958384

ABSTRACT

Because molecular mechanisms of development are extraordinarily complex, the understanding of these processes requires the integration of pertinent research data. Using the Gene Expression Database for Mouse Development (GXD) as an example, we illustrate the progress made toward this goal, and discuss relevant issues that apply to developmental databases and developmental research in general. Since its first release in 1998, GXD has served the scientific community by integrating multiple types of expression data from publications and electronic submissions and by making these data freely and widely available. Focusing on endogenous gene expression in wild-type and mutant mice and covering data from RNA in situ hybridization, in situ reporter (knock-in), immunohistochemistry, reverse transcriptase-polymerase chain reaction, Northern blot, and Western blot experiments, the database has grown tremendously over the years in terms of data content and search utilities. Currently, GXD includes over 1.4 million annotated expression results and over 260,000 images. All these data and images are readily accessible to many types of database searches. Here we describe the data and search tools of GXD; explain how to use the database most effectively; discuss how we acquire, curate, and integrate developmental expression information; and describe how the research community can help in this process.


Subject(s)
Databases, Genetic , Gene Expression Regulation, Developmental , Gene Expression , Mice/embryology , Access to Information , Animals , Humans , Information Storage and Retrieval , Mice/genetics , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...