Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 17 Suppl 1: S199-206, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11473010

RESUMO

We present an approach to integrate physical properties of DNA, such as DNA bendability or GC content, into our probabilistic promoter recognition system McPROMOTER. In the new model, a promoter is represented as a sequence of consecutive segments represented by joint likelihoods for DNA sequence and profiles of physical properties. Sequence likelihoods are modeled with interpolated Markov chains, physical properties with Gaussian distributions. The background uses two joint sequence/profile models for coding and non-coding sequences, each consisting of a mixture of a sense and an anti-sense submodel. On a large Drosophila test set, we achieved a reduction of about 30% of false positives when compared with a model solely based on sequence likelihoods.


Assuntos
Biologia Computacional , DNA/química , DNA/genética , Modelos Genéticos , Regiões Promotoras Genéticas , Animais , Fenômenos Químicos , Físico-Química , Bases de Dados de Ácidos Nucleicos , Drosophila/genética , Funções Verossimilhança , Cadeias de Markov , Modelos Estatísticos , Redes Neurais de Computação , Processos Estocásticos
2.
Trends Genet ; 17(2): 56-60, 2001 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-11173099

RESUMO

The DNA sequence of several higher eukaryotes is now complete, and we know the expression patterns of thousands of genes under a variety of conditions. This gives us the opportunity to identify and analyze the parts of a genome believed to be responsible for most transcription control--the promoters. This article gives a short overview of the state-of-the-art techniques for computational promoter localization and analysis, and comments on the most recent advances in the field.


Assuntos
Biologia Computacional , Células Eucarióticas , Regiões Promotoras Genéticas , Sequência de Bases , DNA , Sequências Reguladoras de Ácido Nucleico
3.
Pac Symp Biocomput ; : 380-91, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-10902186

RESUMO

We present a new statistical approach for eukaryotic polymerase II promoter recognition. We apply stochastic segment models in which each state represents a functional part of the promoter. The segments are trained in an unsupervised way. We compare segment models with three and five states with our previous system which modeled the promoters as a whole, i.e. as a single state. Results on the classification of a representative collection of human and D. melanogaster promoter and non-promoter sequences show great improvements. The practical importance is demonstrated on the mining of large contiguous sequences.


Assuntos
Modelos Genéticos , Regiões Promotoras Genéticas , Algoritmos , Animais , Simulação por Computador , DNA Polimerase II/genética , Bases de Dados Factuais , Drosophila melanogaster/genética , Genoma , Humanos , Processos Estocásticos
4.
Genome Res ; 10(4): 483-501, 2000 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-10779488

RESUMO

Computational methods for automated genome annotation are critical to our community's ability to make full use of the large volume of genomic sequence being generated and released. To explore the accuracy of these automated feature prediction tools in the genomes of higher organisms, we evaluated their performance on a large, well-characterized sequence contig from the Adh region of Drosophila melanogaster. This experiment, known as the Genome Annotation Assessment Project (GASP), was launched in May 1999. Twelve groups, applying state-of-the-art tools, contributed predictions for features including gene structure, protein homologies, promoter sites, and repeat elements. We evaluated these predictions using two standards, one based on previously unreleased high-quality full-length cDNA sequences and a second based on the set of annotations generated as part of an in-depth study of the region by a group of Drosophila experts. Although these standard sets only approximate the unknown distribution of features in this region, we believe that when taken in context the results of an evaluation based on them are meaningful. The results were presented as a tutorial at the conference on Intelligent Systems in Molecular Biology (ISMB-99) in August 1999. Over 95% of the coding nucleotides in the region were correctly identified by the majority of the gene finders, and the correct intron/exon structures were predicted for >40% of the genes. Homology-based annotation techniques recognized and associated functions with almost half of the genes in the region; the remainder were only identified by the ab initio techniques. This experiment also presents the first assessment of promoter prediction techniques for a significant number of genes in a large contiguous region. We discovered that the promoter predictors' high false-positive rates make their predictions difficult to use. Integrating gene finding and cDNA/EST alignments with promoter predictions decreases the number of false-positive classifications but discovers less than one-third of the promoters in the region. We believe that by establishing standards for evaluating genomic annotations and by assessing the performance of existing automated genome annotation tools, this experiment establishes a baseline that contributes to the value of ongoing large-scale annotation projects and should guide further research in genome informatics.


Assuntos
Biologia Computacional/métodos , Drosophila melanogaster/genética , Genes de Insetos , Genoma , Álcool Desidrogenase/química , Álcool Desidrogenase/genética , Animais , DNA Complementar , Bases de Dados Factuais/tendências , Drosophila melanogaster/enzimologia , Etiquetas de Sequências Expressas , Regiões Promotoras Genéticas/genética , Homologia de Sequência de Aminoácidos
5.
Genome Res ; 10(4): 539-42, 2000 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-10779494

RESUMO

We describe our statistical system for promoter recognition in genomic DNA with which we took part in the Genome Annotation Assessment Project (GASP1). We applied two versions of the system: the first uses a region-based approach toward transcription start site identification, namely, interpolated Markov chains; the second was a hybrid approach combining regions and signals within a stochastic segment model. We compare the results of both versions with each other and examine how well the application on a genomic scale compares with the results we previously obtained on smaller data sets.


Assuntos
Álcool Desidrogenase/genética , Bases de Dados Factuais , Drosophila melanogaster/enzimologia , Drosophila melanogaster/genética , Regiões Promotoras Genéticas/genética , Software , Animais , Genes de Insetos/genética , Probabilidade , Análise de Sequência de DNA/estatística & dados numéricos
6.
Bioinformatics ; 15(5): 362-9, 1999 May.
Artigo em Inglês | MEDLINE | ID: mdl-10366656

RESUMO

MOTIVATION: We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated Markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated estimation of optimal interpolation parameters and show how the IMCs can be applied to detect promoters in contiguous DNA sequences. Our interpolation approach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in the general framework for gene recognition systems. RESULTS: A 5-fold cross-validation evaluation of our IMC approach on a representative sequence set yielded a mean correlation coefficient of 0.84 (promoter versus coding sequences) and 0.53 (promoter versus non-coding sequences). Applied to the task of eukaryotic promoter region identification in genomic DNA sequences, our classifier identifies 50% of the promoter regions in the sequences used in the most recent review and comparison by Fickett and Hatzigeorgiou ( Genome Res., 7, 861-878, 1997), while having a false-positive rate of 1/849 bp.


Assuntos
DNA/análise , Cadeias de Markov , Regiões Promotoras Genéticas , Algoritmos , Animais , Drosophila melanogaster/genética , Processamento Eletrônico de Dados , Células Eucarióticas , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...