RESUMO
Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences--thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.
Assuntos
Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , DNA/genética , Evolução Molecular , Armazenamento e Recuperação da Informação/métodos , Modelos Genéticos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Biologia Computacional/métodos , DNA/química , Variação Genética/genética , Modelos EstatísticosRESUMO
Shannon information (SI) and its special case, divergence, are defined for a DNA sequence in terms of probabilities of chemical words in the sequence and are computed for a set of complete genomes highly diverse in length and composition. We find the following: SI (but not divergence) is inversely proportional to sequence length for a random sequence but is length independent for genomes; the genomic SI is always greater and, for shorter words and longer sequences, hundreds to thousands times greater than the SI in a random sequence whose length and composition match those of the genome; genomic SIs appear to have word-length dependent universal values. The universality is inferred to be an evolution footprint of a universal mode for genome growth.
Assuntos
Biologia Computacional/métodos , Genoma , DNA/química , Bases de Dados Genéticas , Entropia , Evolução Molecular , Genes Bacterianos , Genoma Bacteriano , Modelos Estatísticos , Dados de Sequência Molecular , Análise de Sequência de DNA , Software , TermodinâmicaRESUMO
Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences - thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.