Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 5: 146, 2004 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-15471541

RESUMO

BACKGROUND: Researchers who use MEDLINE for text mining, information extraction, or natural language processing may benefit from having a copy of MEDLINE that they can manage locally. The National Library of Medicine (NLM) distributes MEDLINE in eXtensible Markup Language (XML)-formatted text files, but it is difficult to query MEDLINE in that format. We have developed software tools to parse the MEDLINE data files and load their contents into a relational database. Although the task is conceptually straightforward, the size and scope of MEDLINE make the task nontrivial. Given the increasing importance of text analysis in biology and medicine, we believe a local installation of MEDLINE will provide helpful computing infrastructure for researchers. RESULTS: We developed three software packages that parse and load MEDLINE, and ran each package to install separate instances of the MEDLINE database. For each installation, we collected data on loading time and disk-space utilization to provide examples of the process in different settings. Settings differed in terms of commercial database-management system (IBM DB2 or Oracle 9i), processor (Intel or Sun), programming language of installation software (Java or Perl), and methods employed in different versions of the software. The loading times for the three installations were 76 hours, 196 hours, and 132 hours, and disk-space utilization was 46.3 GB, 37.7 GB, and 31.6 GB, respectively. Loading times varied due to a variety of differences among the systems. Loading time also depended on whether data were written to intermediate files or not, and on whether input files were processed in sequence or in parallel. Disk-space utilization depended on the number of MEDLINE files processed, amount of indexing, and whether abstracts were stored as character large objects or truncated. CONCLUSIONS: Relational database (RDBMS) technology supports indexing and querying of very large datasets, and can accommodate a locally stored version of MEDLINE. RDBMS systems support a wide range of queries and facilitate certain tasks that are not directly supported by the application programming interface to PubMed. Because there is variation in hardware, software, and network infrastructures across sites, we cannot predict the exact time required for a user to load MEDLINE, but our results suggest that performance of the software is reasonable. Our database schemas and conversion software are publicly available at http://biotext.berkeley.edu.


Assuntos
MEDLINE , Design de Software , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Bibliográficas , Software , Validação de Programas de Computador , Interface Usuário-Computador
3.
Bioinformatics ; 18 Suppl 1: S207-15, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12169549

RESUMO

MOTIVATION: The information model chosen to store biological data affects the types of queries possible, database performance, and difficulty in updating that information model. Genetic sequence data for pharmacogenetics studies can be complex, and the best information model to use may change over time. As experimental and analytical methods change, and as biological knowledge advances, the data storage requirements and types of queries needed may also change. RESULTS: We developed a model for genetic sequence and polymorphism data, and used XML Schema to specify the elements and attributes required for this model. We implemented this model as an ontology in a frame-based representation and as a relational model in a database system. We collected genetic data from two pharmacogenetics resequencing studies, and formulated queries useful for analysing these data. We compared the ontology and relational models in terms of query complexity, performance, and difficulty in changing the information model. Our results demonstrate benefits of evolving the schema for storing pharmacogenetics data: ontologies perform well in early design stages as the information model changes rapidly and simplify query formulation, while relational models offer improved query speed once the information model and types of queries needed stabilize.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Genéticos , Farmacogenética/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Hipermídia , Alinhamento de Sequência/métodos
4.
Pac Symp Biocomput ; : 65-76, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-11928517

RESUMO

Research directed toward discovering how genetic factors influence a patient's response to drugs requires coordination of data produced from laboratory experiments, computational methods, and clinical studies. A public repository of pharmacogenetic data should accelerate progress in the field of pharmacogenetics by organizing and disseminating public datasets. We are developing a pharmacogenetics knowledge base (PharmGKB) to support the storage and retrieval of both experimental data and conceptual knowledge. PharmGKB is an Internet-based resource that integrates complex biological, pharmacological, and clinical data in such a way that researchers can submit their data and users can retrieve information to investigate genotype-phenotype correlations. Successful management of the names, meaning, and organization of concepts used within the system is crucial. We have selected a frame-based knowledge-representation system for development of an ontology of concepts and relationships that represent the domain and that permit storage of experimental data. Preliminary experience shows that the ontology we have developed for gene-sequence data allows us to accept, store, and query data submissions.


Assuntos
Bases de Dados Factuais , Tratamento Farmacológico , Farmacogenética/métodos , Antineoplásicos/classificação , Antineoplásicos/uso terapêutico , Proteínas de Transporte/genética , Enzimas/genética , Enzimas/metabolismo , Humanos , Internet
5.
Pac Symp Biocomput ; : 88-99, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-11928521

RESUMO

Ontologies are useful for organizing large numbers of concepts having complex relationships, such as the breadth of genetic and clinical knowledge in pharmacogenomics. But because ontologies change and knowledge evolves, it is time consuming to maintain stable mappings to external data sources that are in relational format. We propose a method for interfacing ontology models with data acquisition from external relational data sources. This method uses a declarative interface between the ontology and the data source, and this interface is modeled in the ontology and implemented using XML schema. Data is imported from the relational source into the ontology using XML, and data integrity is checked by validating the XML submission with an XML schema. We have implemented this approach in PharmGKB (http://www.pharmgkb.org/), a pharmacogenetics knowledge base. Our goals were to (1) import genetic sequence data, collected in relational format, into the pharmacogenetics ontology, and (2) automate the process of updating the links between the ontology and data acquisition when the ontology changes. We tested our approach by linking PharmGKB with data acquisition from a relational model of genetic sequence information. The ontology subsequently evolved, and we were able to rapidly update our interface with the external data and continue acquiring the data. Similar approaches may be helpful for integrating other heterogeneous information sources in order make the diversity of pharmacogenetics data amenable to computational analysis.


Assuntos
Bases de Dados Factuais , Bases de Dados Genéticas , Farmacogenética/métodos , Tratamento Farmacológico/métodos , Humanos , Reação em Cadeia da Polimerase/métodos , Reprodutibilidade dos Testes , Análise de Sequência/métodos
6.
Nucleic Acids Res ; 30(1): 163-5, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752281

RESUMO

The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) contains genomic, phenotype and clinical information collected from ongoing pharmacogenetic studies. Tools to browse, query, download, submit, edit and process the information are available to registered research network members. A subset of the tools is publicly available. PharmGKB currently contains over 150 genes under study, 14 Coriell populations and a large ontology of pharmacogenetics concepts. The pharmacogenetic concepts and the experimental data are interconnected by a set of relations to form a knowledge base of information for pharmacogenetic researchers. The information in PharmGKB, and its associated tools for processing that information, are tailored for leading-edge pharmacogenetics research. The PharmGKB project was initiated in April 2000 and the first version of the knowledge base went online in February 2001.


Assuntos
Biotransformação/genética , Bases de Dados Factuais , Farmacogenética , Sistemas de Gerenciamento de Base de Dados , Tratamento Farmacológico , Enzimas/metabolismo , Variação Genética , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , Internet , Preparações Farmacêuticas/metabolismo , Fenótipo , Polimorfismo de Nucleotídeo Único , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...