Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 14(1): 2-13, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-9520496

RESUMO

MOTIVATION: To make effective use of the vast amounts of expressed sequence tag (EST) sequence data generated by the Merck-sponsored EST project and other similar efforts, sequences must be organized into gene classes, and scientists must be able to 'mine' the gene class data in the context of related genomic data. RESULTS: This paper presents the Merck Gene Index browser, an easily extensible, World Wide Web-based system for mining the Merck Gene Index (MGI) and related genomic data. The MGI is a non-redundant set of clones and sequences, each representing a distinct gene, constructed from all high-quality 3' EST sequences generated by the Merck-sponsored EST project. The MGI browser integrates data from a variety of sources and storage formats, both local and remote, using an eclectic integration strategy, including a federation of relational databases, a local data warehouse and simple hypertext links. Data currently integrated include: LENS cDNA clone and EST data, dbEST protein and non-EST nucleic acid similarity data, WashU sequence chromatograms. Entrez sequence and Medline entries, and UniGene gene clusters. Flatfile sequence data are accessed using the Bioapps server, an internally developed client-server system that supports generic sequence analysis applications. Browser data are retrieved and formatted by means of the Bioinformatics Data Integration Toolkit (B-DIT), a new suite of Perl routines.


Assuntos
Indexação e Redação de Resumos , DNA Complementar , Sistemas de Gerenciamento de Base de Dados , Genes , Algoritmos , Redes de Comunicação de Computadores , Sistemas Computacionais , Regulação da Expressão Gênica , Humanos , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico , Software
2.
Genome Res ; 6(9): 829-45, 1996 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-8889550

RESUMO

A rigorous analysis of the Merck-sponsored EST data with respect to known gene sequences increases the utility of the data set and helps refine methods for building a gene index. A highly curated human transcript data base was used as a reference data set of known genes. A detailed analysis of EST sequences derived from known genes was performed to assess the accuracy of EST sequence annotation. The EST data was screened to remove low-quality and low-complexity sequences. A set of high-quality ESTs similar to the transcript data base was identified using BLAST; this subset of ESTs was compared with the set of known genes using the Smith-Waterman algorithm. Error rates of several types were assessed based on a flexible match criterion defining sequence identity. The rate of lane-tracking errors is very low, approximately 0.5%. Insert size data is accurate within approximately 20%. Reversed clone and internal priming error rates are approximately 5% and 2.5%, respectively, contributing to the incorrect identification of reads as 3' ends of genes. Follow-up investigation reveals that a significant number of clones, miscategorized as reversed, represent overlapping genes on the opposite strand of entries in the transcript data base. Relevance of these results to the creation of a high-quality index to the human genome capable of supporting diverse genomic investigations is discussed.


Assuntos
Sequência de Bases , Mapeamento Cromossômico , Bases de Dados Factuais , Genoma Humano , Sitios de Sequências Rotuladas , Algoritmos , Quimera , Clonagem Molecular , Feminino , Humanos , Lactente , Reprodutibilidade dos Testes , Transcrição Gênica
3.
Rapid Commun Mass Spectrom ; 9(15): 1546-51, 1995.
Artigo em Inglês | MEDLINE | ID: mdl-8652878

RESUMO

The analysis of matrix-assisted laser desorption ionization post-source decay (MALDI-PSD) mass spectra of peptides by using the cross-correlation method for database searching is illustrated. MALDI-PSD mass spectra are shown to contain sufficient fragmentation information to uniquely identify the correct amino acid sequence from large protein databases (approximately 160,000 entries). A search employing the MALDI-PSD mass spectrum of a phosphorylated peptide that correctly identifies the amino acid sequence and the site of phosphorylation is also illustrated.


Assuntos
Bases de Dados Factuais , Cromatografia Gasosa-Espectrometria de Massas/métodos , Peptídeos/análise , Sequência de Aminoácidos , Humanos , Dados de Sequência Molecular , Fosfopeptídeos/análise
4.
J Comput Biol ; 1(1): 3-14, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-8790449

RESUMO

We have developed a general system, QGB, for performing complex queries on the information in the DDBJ/EMBL/GenBank databases, including queries over the structural features of sequences implied in the FEATURE TABLE. Queries are formed in a Structured Query Language (SQL)-like syntax with language extensions to support complex types (e.g., sets, ordered sets, and records) appropriate for representing and querying sequence data. A novel aspect of QGB is its ability to deduce missing features and infer relationships among features as a consequence of constructing a parse tree of sequence structure from information described in the FEATURE TABLE. The grammar for the parse tree is implemented in a customized form of the Definite Clause Grammar syntax of the logic programming language Prolog. The logic grammar formalism was chosen because it provides a perspicuous representation for features and constraints, and Prolog provides an execution model for the grammar rules. Construction of the parse tree also identifies inconsistencies and errors in the FEATURE TABLE that can in some cases be corrected automatically and used to generate an augmented version of the table.


Assuntos
Sequência de Bases , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Hemoglobinas/genética , Humanos , Cariotipagem , Dados de Sequência Molecular , Linguagens de Programação
5.
Artigo em Inglês | MEDLINE | ID: mdl-7584350

RESUMO

We describe various methods designed to discover knowledge in the GenBank nucleic acid sequence database. Using a grammatical model of gene structure, we create a parse tree of a gene using features listed in the FEATURE TABLE. The parse tree infers features that are not explicitly listed, but which follow from the listed features. This method discovers 30% more introns and 40% more exons when applied to a globin gene subset of GenBank. Parse tree construction also entails resolving ambiguity and inconsistency within a FEATURE TABLE. We transform the parse tree into an augmented FEATURE TABLE that represents inferred gene structure explicitly and unambiguously, thereby greatly improving the utility of the FEATURE TABLE to researchers. We then describe various analogical reasoning techniques designed to exploit the homologous nature of genes. We build a classification hierarchy that reflects the evolutionary relationship between genes. Descriptive grammars of gene classes are then induced from the instance grammars of genes. Case based reasoning techniques use these abstract gene class descriptions to predict the presence and location of regulatory features not listed in the FEATURE TABLE. A cross-validation test shows a success rate of 87% on a globin gene subset of GenBank.


Assuntos
Sequência de Bases , Bases de Dados Factuais , Globinas/genética , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA/métodos , Algoritmos , Inteligência Artificial , Família Multigênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...