Search | VHL Regional Portal

Driven progressive evolution of genome sequence complexity in Cyanobacteria.

Moya, Andrés; Oliver, José L; Verdú, Miguel; Delaye, Luis; Arnau, Vicente; Bernaola-Galván, Pedro; de la Fuente, Rebeca; Díaz, Wladimiro; Gómez-Martín, Cristina; González, Francisco M; Latorre, Amparo; Lebrón, Ricardo; Román-Roldán, Ramón.

Sci Rep ; 10(1): 19073, 2020 11 04.

Article in English | MEDLINE | ID: mdl-33149190

ABSTRACT

Progressive evolution, or the tendency towards increasing complexity, is a controversial issue in biology, which resolution entails a proper measurement of complexity. Genomes are the best entities to address this challenge, as they encode the historical information of a species' biotic and environmental interactions. As a case study, we have measured genome sequence complexity in the ancient phylum Cyanobacteria. To arrive at an appropriate measure of genome sequence complexity, we have chosen metrics that do not decipher biological functionality but that show strong phylogenetic signal. Using a ridge regression of those metrics against root-to-tip distance, we detected positive trends towards higher complexity in three of them. Lastly, we applied three standard tests to detect if progressive evolution is passive or driven-the minimum, ancestor-descendant, and sub-clade tests. These results provide evidence for driven progressive evolution at the genome-level in the phylum Cyanobacteria.

Subject(s)

Cyanobacteria/genetics , Evolution, Molecular , Genome, Bacterial , Cyanobacteria/classification , Phylogeny

Compositional searching of CpG islands in the human genome.

Luque-Escamilla, Pedro Luis; Martínez-Aroza, José; Oliver, José L; Gómez-Lopera, Juan Francisco; Román-Roldán, Ramón.

Phys Rev E Stat Nonlin Soft Matter Phys ; 71(6 Pt 1): 061925, 2005 Jun.

Article in English | MEDLINE | ID: mdl-16089783

ABSTRACT

We report on an entropic edge detector based on the local calculation of the Jensen-Shannon divergence with application to the search for CpG islands. CpG islands are pieces of the genome related to gene expression and cell differentiation, and thus to cancer formation. Searching for these CpG islands is a major task in genetics and bioinformatics. Some algorithms have been proposed in the literature, based on moving statistics in a sliding window, but its size may greatly influence the results. The local use of Jensen-Shannon divergence is a completely different strategy: the nucleotide composition inside the islands is different from that in their environment, so a statistical distance--the Jensen-Shannon divergence--between the composition of two adjacent windows may be used as a measure of their dissimilarity. Sliding this double window over the entire sequence allows us to segment it compositionally. The fusion of those segments into greater ones that satisfy certain identification criteria must be achieved in order to obtain the definitive results. We find that the local use of Jensen-Shannon divergence is very suitable in processing DNA sequences for searching for compositionally different structures such as CpG islands, as compared to other algorithms in literature.

Subject(s)

Algorithms , Chromosome Mapping/methods , CpG Islands/genetics , Databases, Nucleic Acid , Genome, Human , Models, Genetic , Sequence Analysis, DNA/methods , Base Composition/genetics , Base Sequence , Computer Simulation , DNA/analysis , DNA/chemistry , DNA/genetics , Humans , Models, Chemical , Molecular Sequence Data , Pattern Recognition, Automated/methods

Isochore chromosome maps of the human genome.

Oliver, José L; Carpena, Pedro; Román-Roldán, Ramón; Mata-Balaguer, Trinidad; Mejías-Romero, Andrés; Hackenberg, Michael; Bernaola-Galván, Pedro.

Gene ; 300(1-2): 117-27, 2002 Oct 30.

Article in English | MEDLINE | ID: mdl-12468093

ABSTRACT

The human genome is a mosaic of isochores, which are long DNA segments (z.Gt;300 kbp) relatively homogeneous in G+C. Human isochores were first identified by density-gradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores.

Subject(s)

Genome, Human , Isochores/genetics , Alu Elements/genetics , Base Composition , Chromosome Mapping , Chromosomes, Human, Pair 21/genetics , Chromosomes, Human, Pair 22/genetics , DNA/chemistry , DNA/genetics , Genes/genetics , Humans , Long Interspersed Nucleotide Elements/genetics , Polymorphism, Single Nucleotide/genetics

Analysis of symbolic sequences using the Jensen-Shannon divergence.

Grosse, Ivo; Bernaola-Galván, Pedro; Carpena, Pedro; Román-Roldán, Ramón; Oliver, Jose; Stanley, H Eugene.

Phys Rev E Stat Nonlin Soft Matter Phys ; 65(4 Pt 1): 041905, 2002 Apr.

Article in English | MEDLINE | ID: mdl-12005871

ABSTRACT

We study statistical properties of the Jensen-Shannon divergence D, which quantifies the difference between probability distributions, and which has been widely applied to analyses of symbolic sequences. We present three interpretations of D in the framework of statistical physics, information theory, and mathematical statistics, and obtain approximations of the mean, the variance, and the probability distribution of D in random, uncorrelated sequences. We present a segmentation method based on D that is able to segment a nonstationary symbolic sequence into stationary subsequences, and apply this method to DNA sequences, which are known to be nonstationary on a wide range of different length scales.

Subject(s)

Models, Statistical , Sequence Analysis, DNA/statistics & numerical data , Computational Biology/statistics & numerical data , DNA/genetics , DNA, Bacterial/genetics , Entropy , Escherichia coli/genetics , Genome, Bacterial , Humans , Information Theory , Major Histocompatibility Complex/genetics , Models, Theoretical , Probability

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL