Pesquisa | Portal Regional da BVS

Extending traditional query-based integration approaches for functional characterization of post-genomic data.

Eckman, B A; Kosky, A S; Laroco , L A.

Bioinformatics ; 17(7): 587-601, 2001 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-11448877

RESUMO

MOTIVATION: To identify and characterize regions of functional interest in genomic sequence requires full, flexible query access to an integrated, up-to-date view of all related information, irrespective of where it is stored (within an organization or across the Internet) and its format (traditional database, flat file, web site, results of runtime analysis). Wide-ranging multi-source queries often return unmanageably large result sets, requiring non-traditional approaches to exclude extraneous data. RESULTS: Target Informatics Net (TINet) is a readily extensible data integration system developed at GlaxoSmith- Kline (GSK), based on the Object-Protocol Model (OPM) multidatabase middleware system of Gene Logic Inc. Data sources currently integrated include: the Mouse Genome Database (MGD) and Gene Expression Database (GXD), GenBank, SwissProt, PubMed, GeneCards, the results of runtime BLAST and PROSITE searches, and GSK proprietary relational databases. Special-purpose class methods used to filter and augment query results include regular expression pattern-matching over BLAST HSP alignments and retrieving partial sequences derived from primary structure annotations. All data sources and methods are accessible through an SQL-like query language or a GUI, so that when new investigations arise no additional programming beyond query specification is required. The power and flexibility of this approach are illustrated in such integrated queries as: (1) 'find homologs in genomic sequence to all novel genes cloned and reported in the scientific literature within the past three months that are linked to the MeSH term 'neoplasms"; (2) 'using a neuropeptide precursor query sequence, return only HSPs where the target genomic sequences conserve the G[KR][KR] motif at the appropriate points in the HSP alignment'; and (3) 'of the human genomic sequences annotated with exon boundaries in GenBank, return only those with valid putative donor/acceptor sites and start/stop codons'.

Assuntos

Biologia Computacional , Genoma , Animais , DNA Complementar/genética , Sistemas de Gerenciamento de Base de Dados , Bases de Dados como Assunto , Expressão Gênica , Genoma Humano , Humanos , Internet , Camundongos , Fosfotransferases/genética , Software

The Merck Gene Index browser: an extensible data integration system for gene finding, gene characterization and EST data mining.

Eckman, B A; Aaronson, J S; Borkowski, J A; Bailey, W J; Elliston, K O; Williamson, A R; Blevins, R A.

Bioinformatics ; 14(1): 2-13, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-9520496

RESUMO

MOTIVATION: To make effective use of the vast amounts of expressed sequence tag (EST) sequence data generated by the Merck-sponsored EST project and other similar efforts, sequences must be organized into gene classes, and scientists must be able to 'mine' the gene class data in the context of related genomic data. RESULTS: This paper presents the Merck Gene Index browser, an easily extensible, World Wide Web-based system for mining the Merck Gene Index (MGI) and related genomic data. The MGI is a non-redundant set of clones and sequences, each representing a distinct gene, constructed from all high-quality 3' EST sequences generated by the Merck-sponsored EST project. The MGI browser integrates data from a variety of sources and storage formats, both local and remote, using an eclectic integration strategy, including a federation of relational databases, a local data warehouse and simple hypertext links. Data currently integrated include: LENS cDNA clone and EST data, dbEST protein and non-EST nucleic acid similarity data, WashU sequence chromatograms. Entrez sequence and Medline entries, and UniGene gene clusters. Flatfile sequence data are accessed using the Bioapps server, an internally developed client-server system that supports generic sequence analysis applications. Browser data are retrieved and formatted by means of the Bioinformatics Data Integration Toolkit (B-DIT), a new suite of Perl routines.

Assuntos

Indexação e Redação de Resumos , DNA Complementar , Sistemas de Gerenciamento de Base de Dados , Genes , Algoritmos , Redes de Comunicação de Computadores , Sistemas Computacionais , Regulação da Expressão Gênica , Humanos , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico , Software

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA