Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Phys Rev E Stat Nonlin Soft Matter Phys ; 64(4 Pt 1): 041917, 2001 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11690062

RESUMO

We study statistical patterns in the DNA sequence of human chromosome 22, the first completely sequenced human chromosome. We find that (i). the 33.4 x 10(6) nucleotide long human chromosome exhibits long-range power-law correlations over more than four orders of magnitude, (ii). the entropies H(n) of the frequency distribution of oligonucleotides of length n (n-mers) grow sublinearly with increasing n, indicating the presence of higher-order correlations for all of the studied lengths 1

Assuntos
Cromossomos Humanos Par 22/ultraestrutura , DNA/ultraestrutura , Algoritmos , Elementos Alu , Entropia , Genoma Humano , Humanos , Modelos Estatísticos , Oligonucleotídeos/química , Sequências Repetitivas de Ácido Nucleico , Termodinâmica
2.
J Theor Biol ; 206(4): 525-37, 2000 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-11013113

RESUMO

We study the coding potential of human DNA sequences, using the positional asymmetry function (D(p)) and the positional information function (I(q)). Both D(p)and I(q)are based on the positional dependence of single nucleotide frequencies. We investigate the accuracy of D(p)and I(q)in distinguishing coding and non-coding DNA as a function of the parameters p and q, respectively, and explore at which parameters p(opt)and q(opt)both D(p)and I(q)distinguish coding and non-coding DNA most accurately. We compare our findings with classically used parameter values and find that optimized coding potentials yield comparable accuracies as classical frame-independent coding potentials trained on prior data. We find that p(opt)and q(opt)vary only slightly with the sequence length.


Assuntos
Códon , Genoma Humano , Modelos Genéticos , Análise de Sequência de DNA , Humanos
3.
J Mol Evol ; 51(4): 353-62, 2000 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11040286

RESUMO

It has been hypothesized that a large fraction of 24% noncoding DNA in R. prowazekii consists of degraded genes. This hypothesis has been based on the relatively high G+C content of noncoding DNA. However, a comparison with other genomes also having a low overall G+C content shows that this argument would also apply to other bacteria. To test this hypothesis, we study the coding potential in sets of genes, pseudogenes, and intergenic regions. We find that the correlation function and the chi(2)-measure are clearly indicative of the coding function of genes and pseudogenes. However, both coding potentials make almost no indication of a preexisting reading frame in the remaining 23% of noncoding DNA. We simulate the degradation of genes due to single-nucleotide substitutions and insertions/deletions and quantify the number of mutations required to remove indications of the reading frame. We discuss a reduced selection pressure as another possible origin of this comparatively large fraction of noncoding sequences.


Assuntos
DNA Intergênico , Genes Bacterianos , Rickettsia prowazekii/genética , Modelos Genéticos , Mutação Puntual , Polimorfismo de Nucleotídeo Único
4.
Pac Symp Biocomput ; : 614-23, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-10902209

RESUMO

One basic problem in the analysis of DNA sequences is the recognition of protein-coding genes. Computer algorithms to facilitate gene identification have become important as genome sequencing projects have turned from mapping to large-scale sequencing, resulting in an exponentially growing number of sequenced nucleotides that await their annotation. Many statistical patterns have been discovered that are different in coding and noncoding DNA, but most of them vary from species to species, and hence require prior training on organism-specific data sets. Here, we investigate if there exist species-independent statistical patterns that are different in coding and noncoding DNA. We introduce an information-theoretic quantity, the average mutual information (AMI), and we find that the probability distribution functions of the AMI are significantly different in coding and noncoding DNA, while they are almost identical for different species. This finding suggests that the AMI might be useful for the recognition of protein-coding regions in genomes for which training sets do not exist.


Assuntos
DNA/genética , Modelos Genéticos , Algoritmos , Animais , Códon/genética , Simulação por Computador , Proteínas/genética , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...