Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
BMJ Open ; 12(10): e049657, 2022 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-36223959

RESUMO

OBJECTIVES: The enormous toll of the COVID-19 pandemic has heightened the urgency of collecting and analysing population-scale datasets in real time to monitor and better understand the evolving pandemic. The objectives of this study were to examine the relationship of risk factors to COVID-19 susceptibility and severity and to develop risk models to accurately predict COVID-19 outcomes using rapidly obtained self-reported data. DESIGN: A cross-sectional study. SETTING: AncestryDNA customers in the USA who consented to research. PARTICIPANTS: The AncestryDNA COVID-19 Study collected self-reported survey data on symptoms, outcomes, risk factors and exposures for over 563 000 adult individuals in the USA in just under 4 months, including over 4700 COVID-19 cases as measured by a self-reported positive test. RESULTS: We replicated previously reported associations between several risk factors and COVID-19 susceptibility and severity outcomes, and additionally found that differences in known exposures accounted for many of the susceptibility associations. A notable exception was elevated susceptibility for men even after adjusting for known exposures and age (adjusted OR=1.36, 95% CI=1.19 to 1.55). We also demonstrated that self-reported data can be used to build accurate risk models to predict individualised COVID-19 susceptibility (area under the curve (AUC)=0.84) and severity outcomes including hospitalisation and critical illness (AUC=0.87 and 0.90, respectively). The risk models achieved robust discriminative performance across different age, sex and genetic ancestry groups within the study. CONCLUSIONS: The results highlight the value of self-reported epidemiological data to rapidly provide public health insights into the evolving COVID-19 pandemic.


Assuntos
COVID-19 , Adulto , COVID-19/epidemiologia , Estudos Transversais , Humanos , Masculino , Pandemias , Fatores de Risco , SARS-CoV-2
2.
Nat Genet ; 54(4): 374-381, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35410379

RESUMO

Multiple COVID-19 genome-wide association studies (GWASs) have identified reproducible genetic associations indicating that there is a genetic component to susceptibility and severity risk. To complement these studies, we collected deep coronavirus disease 2019 (COVID-19) phenotype data from a survey of 736,723 AncestryDNA research participants. With these data, we defined eight phenotypes related to COVID-19 outcomes: four phenotypes that align with previously studied COVID-19 definitions and four 'expanded' phenotypes that focus on susceptibility given exposure, mild clinical manifestations and an aggregate score of symptom severity. We performed a replication analysis of 12 previously reported COVID-19 genetic associations with all eight phenotypes in a trans-ancestry meta-analysis of AncestryDNA research participants. In this analysis, we show distinct patterns of association at the 12 loci with the eight outcomes that we assessed. We also performed a genome-wide discovery analysis of all eight phenotypes, which did not yield new genome-wide significant loci but did suggest that three of the four 'expanded' COVID-19 phenotypes have enhanced power to capture protective genetic associations relative to the previously studied phenotypes. Thus, we conclude that continued large-scale ascertainment of deep COVID-19 phenotype data would likely represent a boon for COVID-19 therapeutic target identification.


Assuntos
COVID-19 , Estudo de Associação Genômica Ampla , COVID-19/genética , Predisposição Genética para Doença , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
3.
Nat Commun ; 12(1): 6442, 2021 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-34750360

RESUMO

The genetic architecture of atrial fibrillation (AF) encompasses low impact, common genetic variants and high impact, rare variants. Here, we characterize a high impact AF-susceptibility allele, KCNQ1 R231H, and describe its transcontinental geographic distribution and history. Induced pluripotent stem cell-derived cardiomyocytes procured from risk allele carriers exhibit abbreviated action potential duration, consistent with a gain-of-function effect. Using identity-by-descent (IBD) networks, we estimate the broad- and fine-scale population ancestry of risk allele carriers and their relatives. Analysis of ancestral migration routes reveals ancestors who inhabited Denmark in the 1700s, migrated to the Northeastern United States in the early 1800s, and traveled across the Midwest to arrive in Utah in the late 1800s. IBD/coalescent-based allele dating analysis reveals a relatively recent origin of the AF risk allele (~5000 years). Thus, our approach broadens the scope of study for disease susceptibility alleles to the context of human migration and ancestral origins.


Assuntos
Fibrilação Atrial/genética , Predisposição Genética para Doença/genética , Canal de Potássio KCNQ1/genética , Mutação de Sentido Incorreto , Polimorfismo de Nucleotídeo Único , Potenciais de Ação , Alelos , Dinamarca , Emigrantes e Imigrantes , Feminino , Genótipo , Geografia , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Masculino , Pessoa de Meia-Idade , Miócitos Cardíacos/citologia , Miócitos Cardíacos/metabolismo , Miócitos Cardíacos/fisiologia , Linhagem , Fatores de Risco , Utah
4.
BMC Bioinformatics ; 22(1): 459, 2021 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-34563119

RESUMO

BACKGROUND: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual's ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. RESULTS: The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. CONCLUSIONS: Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.


Assuntos
Genética Populacional , Genoma Humano , Haplótipos , Humanos , Polimorfismo de Nucleotídeo Único
5.
G3 (Bethesda) ; 9(9): 2863-2878, 2019 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-31484785

RESUMO

We present a massive investigation into the genetic basis of human lifespan. Beginning with a genome-wide association (GWA) study using a de-identified snapshot of the unique AncestryDNA database - more than 300,000 genotyped individuals linked to pedigrees of over 400,000,000 people - we mapped six genome-wide significant loci associated with parental lifespan. We compared these results to a GWA analysis of the traditional lifespan proxy trait, age, and found only one locus, APOE, to be associated with both age and lifespan. By combining the AncestryDNA results with those of an independent UK Biobank dataset, we conducted a meta-analysis of more than 650,000 individuals and identified fifteen parental lifespan-associated loci. Beyond just those significant loci, our genome-wide set of polymorphisms accounts for up to 8% of the variance in human lifespan; this value represents a large fraction of the heritability estimated from phenotypic correlations between relatives.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Longevidade/genética , Idoso , Idoso de 80 Anos ou mais , Apolipoproteínas E/genética , Proteínas de Transporte/genética , Bases de Dados Genéticas , Feminino , Humanos , Masculino , Proteínas Nucleares/genética , Linhagem , Polimorfismo de Nucleotídeo Único , Estudos Prospectivos , Proteínas Proto-Oncogênicas/genética
6.
Nat Commun ; 8: 14238, 2017 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-28169989

RESUMO

Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.


Assuntos
Demografia/estatística & dados numéricos , Genética Populacional/métodos , Dinâmica Populacional/tendências , População/genética , Análise por Conglomerados , Demografia/métodos , Emigrantes e Imigrantes , Fluxo Gênico/genética , Técnicas de Genotipagem , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único , Dinâmica Populacional/estatística & dados numéricos , Análise de Sequência de DNA , Estados Unidos/etnologia
7.
Nucleic Acids Res ; 42(Database issue): D677-84, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24285306

RESUMO

PortEco (http://porteco.org) aims to collect, curate and provide data and analysis tools to support basic biological research in Escherichia coli (and eventually other bacterial systems). PortEco is implemented as a 'virtual' model organism database that provides a single unified interface to the user, while integrating information from a variety of sources. The main focus of PortEco is to enable broad use of the growing number of high-throughput experiments available for E. coli, and to leverage community annotation through the EcoliWiki and GONUTS systems. Currently, PortEco includes curated data from hundreds of genome-wide RNA expression studies, from high-throughput phenotyping of single-gene knockouts under hundreds of annotated conditions, from chromatin immunoprecipitation experiments for tens of different DNA-binding factors and from ribosome profiling experiments that yield insights into protein expression. Conditions have been annotated with a consistent vocabulary, and data have been consistently normalized to enable users to find, compare and interpret relevant experiments. PortEco includes tools for data analysis, including clustering, enrichment analysis and exploration via genome browsers. PortEco search and data analysis tools are extensively linked to the curated gene, metabolic pathway and regulation content at its sister site, EcoCyc.


Assuntos
Bases de Dados Genéticas , Escherichia coli/genética , Alelos , Proteínas de Ligação a DNA/metabolismo , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Genes Bacterianos , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Fenótipo , RNA Mensageiro/metabolismo , Ribossomos/metabolismo , Software
8.
Methods Mol Biol ; 719: 31-69, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21370078

RESUMO

To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.


Assuntos
Biologia Computacional/normas , Disseminação de Informação/métodos , Biologia Computacional/métodos , Atenção à Saúde/normas , Humanos , Padrões de Referência , Projetos de Pesquisa/normas
9.
Bioinformatics ; 26(19): 2470-1, 2010 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-20733062

RESUMO

UNLABELLED: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis. AVAILABILITY AND IMPLEMENTATION: Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows.


Assuntos
Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Software , Biologia Computacional/métodos , Bases de Dados Factuais , Anotação de Sequência Molecular , Interface Usuário-Computador
10.
Tuberculosis (Edinb) ; 90(4): 225-35, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20488753

RESUMO

The Tuberculosis Database (TBDB) is an online database providing integrated access to genome sequence, expression data and literature curation for TB. TBDB currently houses genome assemblies for numerous strains of Mycobacterium tuberculosis (MTB) as well assemblies for over 20 strains related to MTB and useful for comparative analysis. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives, including over 3000 MTB microarrays, 95 RT-PCR datasets, 2700 microarrays for human and mouse TB related experiments, and 260 arrays for Streptomyces coelicolor. To enable wide use of these data, TBDB provides a suite of tools for searching, browsing, analyzing, and downloading the data. We provide here an overview of TBDB focusing on recent data releases and enhancements. In particular, we describe the recent release of a Global Genetic Diversity dataset for TB, support for short-read re-sequencing data, new tools for exploring gene expression data in the context of gene regulation, and the integration of a metabolic network reconstruction and BioCyc with TBDB. By integrating a wide range of genomic data with tools for their use, TBDB is a unique platform for both basic science research in TB, as well as research into the discovery and development of TB drugs, vaccines and biomarkers.


Assuntos
Bases de Dados Genéticas , Mycobacterium tuberculosis/genética , Tuberculose/microbiologia , Bases de Dados Genéticas/tendências , Regulação Bacteriana da Expressão Gênica , Variação Genética , Genoma Bacteriano , Biblioteca Genômica , Genômica/métodos , Humanos , Redes e Vias Metabólicas/genética , Mycobacterium tuberculosis/metabolismo , Sistemas On-Line
11.
Nat Genet ; 41(2): 149-55, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19174838

RESUMO

Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005-2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica/normas , Análise de Sequência com Séries de Oligonucleotídeos/normas , Revisão da Pesquisa por Pares , Animais , Interpretação Estatística de Dados , Estudo de Associação Genômica Ampla/normas , Humanos , Publicações/normas , Reprodutibilidade dos Testes
12.
Nucleic Acids Res ; 37(Database issue): D499-508, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18835847

RESUMO

The effective control of tuberculosis (TB) has been thwarted by the need for prolonged, complex and potentially toxic drug regimens, by reliance on an inefficient vaccine and by the absence of biomarkers of clinical status. The promise of the genomics era for TB control is substantial, but has been hindered by the lack of a central repository that collects and integrates genomic and experimental data about this organism in a way that can be readily accessed and analyzed. The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28 different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software. By bringing together M. tuberculosis genome annotation and gene-expression data with a suite of analysis tools, TBDB (http://www.tbdb.org/) provides a unique discovery platform for TB research.


Assuntos
Bases de Dados Genéticas , Mycobacterium tuberculosis/genética , Tuberculose/microbiologia , Pesquisa Biomédica , Gráficos por Computador , Expressão Gênica , Genoma Bacteriano , Genômica , Humanos , Mycobacterium tuberculosis/metabolismo , Integração de Sistemas , Tuberculose/diagnóstico , Tuberculose/tratamento farmacológico
13.
Nucleic Acids Res ; 37(Database issue): D898-901, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18953035

RESUMO

Hundreds of researchers across the world use the Stanford Microarray Database (SMD; http://smd.stanford.edu/) to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Animais , Humanos , Camundongos , Software
15.
Neuroinformatics ; 6(2): 117-21, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18473189

RESUMO

Molecular biology and genomics have made notable strides in the sharing of primary data and resources. In other domains of neuroscience research, however, there has been resistance to adopting formalized strategies for data exchange, archiving, and availability. In this article, we discuss how neuroscience domains might follow the lead of molecular biology on what has been successful and what has failed in active data sharing. This considers not only the technical challenges but also the sociological concerns in making it possible. Though, not a pain-free process, with increased data availability, scientists from multiple fields can enjoy greater opportunity for novel discoveries about the brain in health and disease.


Assuntos
Acesso à Informação , Biologia Computacional/tendências , Bases de Dados Factuais/tendências , Bases de Dados Genéticas/tendências , Neurociências/tendências , Animais , Biologia Computacional/ética , Biologia Computacional/normas , Bases de Dados Factuais/ética , Bases de Dados Factuais/normas , Bases de Dados Genéticas/ética , Bases de Dados Genéticas/normas , Genômica/ética , Genômica/normas , Genômica/tendências , Humanos , Imageamento Tridimensional/ética , Imageamento Tridimensional/normas , Imageamento Tridimensional/tendências , Comunicação Interdisciplinar , Metanálise como Assunto , Neurociências/ética , Neurociências/normas
16.
Nat Biotechnol ; 26(3): 305-12, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18327244

RESUMO

One purpose of the biomedical literature is to report results in sufficient detail that the methods of data collection and analysis can be independently replicated and verified. Here we present reporting guidelines for gene expression localization experiments: the minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE). MISFISHIE is modeled after the Minimum Information About a Microarray Experiment (MIAME) specification for microarray experiments. Both guidelines define what information should be reported without dictating a format for encoding that information. MISFISHIE describes six types of information to be provided for each experiment: experimental design, biomaterials and treatments, reporters, staining, imaging data and image characterizations. This specification has benefited the consortium within which it was developed and is expected to benefit the wider research community. We welcome feedback from the scientific community to help improve our proposal.


Assuntos
Imuno-Histoquímica/normas , Hibridização In Situ/normas , Biologia Computacional/métodos , Biologia Computacional/normas , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Imuno-Histoquímica/métodos , Hibridização In Situ/métodos
17.
BMC Bioinformatics ; 9: 28, 2008 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-18205924

RESUMO

BACKGROUND: MAGE-ML has been promoted as a standard format for describing microarray experiments and the data they produce. Two characteristics of the MAGE-ML format compromise its use as a universal standard: First, MAGE-ML files are exceptionally large - too large to be easily read by most people, and often too large to be read by most software programs. Second, the MAGE-ML standard permits many ways of representing the same information. As a result, different producers of MAGE-ML create different documents describing the same experiment and its data. Recognizing all the variants is an unwieldy software engineering task, resulting in software packages that can read and process MAGE-ML from some, but not all producers. This Tower of MAGE-ML Babel bars the unencumbered exchange of microarray experiment descriptions couched in MAGE-ML. RESULTS: We have developed XBabelPhish - an XQuery-based technology for translating one MAGE-ML variant into another. XBabelPhish's use is not restricted to translating MAGE-ML documents. It can transform XML files independent of their DTD, XML schema, or semantic content. Moreover, it is designed to work on very large (> 200 Mb.) files, which are common in the world of MAGE-ML. CONCLUSION: XBabelPhish provides a way to inter-translate MAGE-ML variants for improved interchange of microarray experiment information. More generally, it can be used to transform most XML files, including very large ones that exceed the capacity of most XML tools.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Hipermídia , Interface Usuário-Computador , Animais , Perfilação da Expressão Gênica/métodos , Humanos , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Integração de Sistemas , Simplificação do Trabalho
18.
Nucleic Acids Res ; 36(Database issue): D871-7, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17989087

RESUMO

The Stanford Tissue Microarray Database (TMAD; http://tma.stanford.edu) is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license.


Assuntos
Bases de Dados Genéticas , Imuno-Histoquímica , Hibridização In Situ , Análise Serial de Tecidos , Humanos , Internet , Proteínas/análise , RNA Mensageiro/análise , Software , Interface Usuário-Computador
19.
Nat Biotechnol ; 25(10): 1127-33, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17921998

RESUMO

The Functional Genomics Experiment data model (FuGE) has been developed to facilitate convergence of data standards for high-throughput, comprehensive analyses in biology. FuGE models the components of an experimental activity that are common across different technologies, including protocols, samples and data. FuGE provides a foundation for describing entire laboratory workflows and for the development of new data formats. The Microarray Gene Expression Data society and the Proteomics Standards Initiative have committed to using FuGE as the basis for defining their respective standards, and other standards groups, including the Metabolomics Standards Initiative, are evaluating FuGE in their development efforts. Adoption of FuGE by multiple standards bodies will enable uniform reporting of common parts of functional genomics workflows, simplify data-integration efforts and ease the burden on researchers seeking to fulfill multiple minimum reporting requirements. Such advances are important for transparent data management and mining in functional genomics and systems biology.


Assuntos
Biologia Computacional , Simulação por Computador/normas , Genômica/normas , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/normas , Proteômica/normas , Bases de Dados Factuais
20.
BMC Bioinformatics ; 8: 338, 2007 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-17854506

RESUMO

BACKGROUND: Biomedical ontologies are being widely used to annotate biological data in a computer-accessible, consistent and well-defined manner. However, due to their size and complexity, annotating data with appropriate terms from an ontology is often challenging for experts and non-experts alike, because there exist few tools that allow one to quickly find relevant ontology terms to easily populate a web form. RESULTS: We have produced a tool, OntologyWidget, which allows users to rapidly search for and browse ontology terms. OntologyWidget can easily be embedded in other web-based applications. OntologyWidget is written using AJAX (Asynchronous JavaScript and XML) and has two related elements. The first is a dynamic auto-complete ontology search feature. As a user enters characters into the search box, the appropriate ontology is queried remotely for terms that match the typed-in text, and the query results populate a drop-down list with all potential matches. Upon selection of a term from the list, the user can locate this term within a generic and dynamic ontology browser, which comprises the second element of the tool. The ontology browser shows the paths from a selected term to the root as well as parent/child tree hierarchies. We have implemented web services at the Stanford Microarray Database (SMD), which provide the OntologyWidget with access to over 40 ontologies from the Open Biological Ontology (OBO) website 1. Each ontology is updated weekly. Adopters of the OntologyWidget can either use SMD's web services, or elect to rely on their own. Deploying the OntologyWidget can be accomplished in three simple steps: (1) install Apache Tomcat 2 on one's web server, (2) download and install the OntologyWidget servlet stub that provides access to the SMD ontology web services, and (3) create an html (HyperText Markup Language) file that refers to the OntologyWidget using a simple, well-defined format. CONCLUSION: We have developed OntologyWidget, an easy-to-use ontology search and display tool that can be used on any web page by creating a simple html description. OntologyWidget provides a rapid auto-complete search function paired with an interactive tree display. We have developed a web service layer that communicates between the web page interface and a database of ontology terms. We currently store 40 of the ontologies from the OBO website 1, as well as a several others. These ontologies are automatically updated on a weekly basis. OntologyWidget can be used in any web-based application to take advantage of the ontologies we provide via web services or any other ontology that is provided elsewhere in the correct format. The full source code for the JavaScript and description of the OntologyWidget is available from http://smd.stanford.edu/ontologyWidget/.


Assuntos
Biologia Computacional/métodos , Software , Terminologia como Assunto , Linguagens de Programação , Interface Usuário-Computador , Vocabulário Controlado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...