Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
Add more filters










Publication year range
1.
Genome Res ; 11(11): 1854-60, 2001 Nov.
Article in English | MEDLINE | ID: mdl-11691850

ABSTRACT

In an attempt to understand the origin of CpG islands (CGIs) in mammalian genomes, we have studied their location and structure according to the expression pattern of genes and to the G + C content of isochores in which they are embedded. We show that CGIs located over the transcription start site (named start CGIs) are very different structurally from the others (named no-start CGIs): (1) 61.6% of the no-start CGIs are due to repeated sequences (79 % are due to Alus), whereas only 5.6% of the start CGIs are due to such repeats; (2) start CGIs are longer and display a higher CpGo/e ratio and G + C level than no-start CGIs. The frequency of tissue-specific genes associated to a start CGI varies according to the genomic G + C content, from 25% in G + C-poor isochores to 64% in G + C-rich isochores. Conversely, the frequency of housekeeping genes associated to a start CGI (90%) is independent of the isochore context. Interestingly, the structure of start CGIs is very similar for tissue-specific and housekeeping genes. Moreover, 93% of genes expressed in early embryo are found to exhibit a CpG island over their transcription start point. These observations are consistent with the hypothesis that the occurrence of these CGIs is the consequence of gene expression at this stage, when the methylation pattern is installed.


Subject(s)
CpG Islands/genetics , Embryo, Mammalian/metabolism , Gene Expression Regulation, Developmental/genetics , Transcription Initiation Site , Animals , Base Composition , Embryo, Mammalian/chemistry , Expressed Sequence Tags , GC Rich Sequence , Gene Expression Profiling/methods , Genes/genetics , Humans , Mice , Organ Specificity/genetics , Repetitive Sequences, Nucleic Acid
3.
J Mol Evol ; 53(1): 70-6, 2001 Jul.
Article in English | MEDLINE | ID: mdl-11683325

ABSTRACT

We compared nonsynonymous substitution rates (Ka) of nuclear coding genes between four major groups of living sauropsids (reptiles): birds, squamates, crocodiles, and turtles. Since only 9 orthologous genes are known in all the four taxonomic groups, we searched for orthologous genes known in chicken and at least one of any representative of poikilotherm sauropsids. Thus, we analyzed three additional data sets: 28 genes identified in chicken and various squamates, 24 genes identified in chicken and crocodilians, and 20 genes identified in chicken and turtles. To compare nonsynonymous substitution rates between all lineages of sauropsids, we used the relative-rate test with human genes as the outgroup. We show that 22/28 nuclear coding genes of squamates, especially snakes (15/16), have an higher evolutionary rate than those in chicken (in mean, 30-40% faster). However, no such difference is detected between crocodiles, turtles and chicken. Higher substitution rate in squamates nuclear coding genes than in chicken, and probably than in other sauropsids, could explain some of the difficulties in resolving the molecular phylogeny of reptiles.


Subject(s)
Chickens/genetics , Evolution, Molecular , Reptiles/genetics , Animals , Cell Nucleus/genetics , Humans , Phylogeny , Statistics as Topic
4.
Proc Natl Acad Sci U S A ; 98(10): 5688-92, 2001 May 08.
Article in English | MEDLINE | ID: mdl-11320215

ABSTRACT

Understanding the factors responsible for variations in mutation patterns and selection efficacy along chromosomes is a prerequisite for deciphering genome sequences. Population genetics models predict a positive correlation between the efficacy of selection at a given locus and the local rate of recombination because of Hill-Robertson effects. Codon usage is considered one of the most striking examples that support this prediction at the molecular level. In a wide range of species including Caenorhabditis elegans and Drosophila melanogaster, codon usage is essentially shaped by selection acting for translational efficiency. Codon usage bias correlates positively with recombination rate in Drosophila, apparently supporting the hypothesis that selection on codon usage is improved by recombination. Here we present an exhaustive analysis of codon usage in C. elegans and D. melanogaster complete genomes. We show that in both genomes there is a positive correlation between recombination rate and the frequency of optimal codons. However, we demonstrate that in both species, this effect is due to a mutational bias toward G and C bases in regions of high recombination rate, possibly as a direct consequence of the recombination process. The correlation between codon usage bias and recombination rate in these species appears to be essentially determined by recombination-dependent mutational patterns, rather than selective effects. This result highlights that it is necessary to take into account the mutagenic effect of recombination to understand the evolutionary role and impact of recombination.


Subject(s)
Caenorhabditis elegans/genetics , Codon , Drosophila melanogaster/genetics , Recombination, Genetic , Selection, Genetic , Animals
6.
Genome Res ; 10(5): 672-8, 2000 May.
Article in English | MEDLINE | ID: mdl-10810090

ABSTRACT

The human genome is estimated to contain 23,000 to 33,000 retropseudogenes. To study the properties of genes giving rise to these retroelements, we compared the structure and expression of genes with or without known retropseudogenes. Four main features have emerged from the analysis of 181 genes associated to retropseudogenes: Reverse-transcribed genes are (1) widely expressed, (2) highly conserved, (3) short, and (4) GC-poor. The first two properties probably reflect the fact that genes giving rise to retropseudogenes have to be expressed in the germ-line. The two latter points suggest that reverse-transcription and transposition is more efficient for short GC-poor mRNAs. In addition, this analysis allowed us to reject previous hypotheses that widely expressed genes are GC rich. Rather, globally, genes with a wide tissue distribution are GC poor.


Subject(s)
Pseudogenes/genetics , Retroelements/genetics , Base Sequence/genetics , Computational Biology , Conserved Sequence/genetics , Databases, Factual , Evolution, Molecular , GC Rich Sequence , Gene Expression Regulation/genetics , Humans , Organ Specificity/genetics
7.
Mol Biol Evol ; 17(1): 68-74, 2000 Jan.
Article in English | MEDLINE | ID: mdl-10666707

ABSTRACT

To determine whether gene expression patterns affect mutation rates and/or selection intensity in mammalian genes, we studied the relationships between substitution rates and tissue distribution of gene expression. For this purpose, we analyzed 2,400 human/rodent and 834 mouse/rat orthologous genes, and we measured (using expressed sequence tag data) their expression patterns in 19 tissues from three development states. We show that substitution rates at nonsynonymous sites are strongly negatively correlated with tissue distribution breadth: almost threefold lower in ubiquitous than in tissue-specific genes. Nonsynonymous substitution rates also vary considerably according to the tissues: the average rate is twofold lower in brain-, muscle-, retina- and neuron-specific genes than in lymphocyte-, lung-, and liver-specific genes. Interestingly, 5' and 3' untranslated regions (UTRs) show exactly the same trend. These results demonstrate that the expression pattern is an essential factor in determining the selective pressure on functional sites in both coding and noncoding regions. Conversely, silent substitution rates do not vary with expression pattern, even in ubiquitously expressed genes. This latter result thus suggests that synonymous codon usage is not constrained by selection in mammals. Furthermore, this result also indicates that there is no reduction of mutation rates in genes expressed in the germ line, contrary to what had been hypothesized based on the fact that transcribed DNA is more efficiently repaired than nontranscribed DNA.


Subject(s)
Evolution, Molecular , Mammals/genetics , Mutation , Animals , Humans
9.
Mol Biol Evol ; 16(11): 1521-7, 1999 Nov.
Article in English | MEDLINE | ID: mdl-10555283

ABSTRACT

The genomes of warm-blooded vertebrates are characterized by a strong heterogeneity in base composition, with GC-rich and GC-poor isochores. The GC content of sequences, especially in third codon positions, is highly correlated with that of the isochore they are embedded in. In amphibian and fish genomes, GC-rich isochores are nearly absent. Thus, it has been proposed that the GC increase in a part of mammalian and avian genomes represents an adaptation to homeothermy. To test this selective hypothesis, we sequenced marker protein genes in two cold-blooded vertebrates, the Nile crocodile Crocodylus niloticus (10 genes) and the red-eared slider Trachemys scripta elegans (6 genes). The analysis of base composition in third codon position of this original data set shows that the Nile crocodile and the turtle also exhibit GC-rich isochores, which rules out the homeothermy hypothesis. Instead, we propose that the GC increase results from a mutational bias that took place earlier than the adaptation to homeothermy in birds and before the turtle/crocodile divergence. Surprisingly, the isochore structure appears very similar between the red-eared slider and the Nile crocodile than between the chicken and the Nile crocodile. This point questions the phylogenetic position of turtles as a basal lineage of extant reptiles. We also observed a regular molecular clock in the Archosauria, which enables us, by using a more extended data set, to confirm Kumar and Hedges's dating of the bird-crocodile split.


Subject(s)
Alligators and Crocodiles/genetics , Turtles/genetics , Animals , Evolution, Molecular , Female , GC Rich Sequence , Genetic Variation , Genome , Phylogeny , Reverse Transcriptase Polymerase Chain Reaction
10.
Bioinformatics ; 15(5): 424-5, 1999 May.
Article in English | MEDLINE | ID: mdl-10366663

ABSTRACT

SUMMARY: JaDis is a Java application for computing evolutionary distances between nucleic acid sequences and G+C base frequencies. It allows specific comparison of coding sequences, of non-coding sequences or of a non-coding sequence with coding sequences. AVAILABILITY: http://pbil.univ-lyon1.fr/software/jadis.html


Subject(s)
Sequence Analysis, DNA/methods , Software , Base Composition
11.
Proc Natl Acad Sci U S A ; 96(8): 4482-7, 1999 Apr 13.
Article in English | MEDLINE | ID: mdl-10200288

ABSTRACT

We measured the expression pattern and analyzed codon usage in 8,133, 1,550, and 2,917 genes, respectively, from Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana. In those three species, we observed a clear correlation between codon usage and gene expression levels and showed that this correlation is not due to a mutational bias. This provides direct evidence for selection on silent sites in those three distantly related multicellular eukaryotes. Surprisingly, there is a strong negative correlation between codon usage and protein length. This effect is not due to a smaller size of highly expressed proteins. Thus, for a same-expression pattern, the selective pressure on codon usage appears to be lower in genes encoding long rather than short proteins. This puzzling observation is not predicted by any of the current models of selection on codon usage and thus raises the question of how translation efficiency affects fitness in multicellular organisms.


Subject(s)
Arabidopsis/genetics , Biological Evolution , Caenorhabditis elegans/genetics , Codon/genetics , Drosophila melanogaster/genetics , Gene Expression Regulation , Animals , Caenorhabditis elegans/embryology , Caenorhabditis elegans/growth & development , Expressed Sequence Tags , Gene Expression Regulation, Developmental , Gene Expression Regulation, Plant , Mutation , RNA, Messenger/analysis , Selection, Genetic
12.
Genetics ; 150(4): 1577-84, 1998 Dec.
Article in English | MEDLINE | ID: mdl-9832533

ABSTRACT

Codon usage in mammals is mainly determined by the spatial arrangement of genomic G + C-content, i.e., the isochore structure. Ancestral G + C-content at third codon positions of 27 nuclear protein-coding genes of eutherian mammals was estimated by maximum-likelihood analysis on the basis of a nonhomogeneous DNA substitution model, accounting for variable base compositions among present-day sequences. Data consistently supported a human-like ancestral pattern, i.e., highly variable G + C-content among genes. The mouse genomic structure-more narrow G + C-content distribution-would be a derived state. The circumstances of isochore evolution are discussed with respect to this result. A possible relationship between G + C-content homogenization in murid genomes and high mutation rate is proposed, consistent with the negative selection hypothesis for isochore maintenance in mammals.


Subject(s)
Evolution, Molecular , Mammals/genetics , Animals , Cytosine , Guanine , Humans , Mice , Models, Genetic , Phylogeny
13.
Mol Biol Evol ; 15(9): 1091-8, 1998 Sep.
Article in English | MEDLINE | ID: mdl-9729873

ABSTRACT

Relative-rate tests may be used to compare substitution rates between more than two sequences, which yields two main questions: What influence does the number of sequences have on relative-rate tests and what is the influence of the sampling strategy as characterized by the phylogenetic relationships between sequences? Using both simulations and analysis of real data from murids (APRT and LCAT nuclear genes), we show that comparing large numbers of species significantly improves the power of the test. This effect is stronger if species are more distantly related. On the other hand, it appears to be less rewarding to increase outgroup sampling than to use the single nearest outgroup sequence. Rates may be compared between paraphyletic ingroups and using paraphyletic outgroups, but unbalanced taxonomic sampling can bias the test. We present a simple phylogenetic weighting scheme which takes taxonomic sampling into account and significantly improves the relative-rate test in cases of unbalanced sampling. The answers are thus: (1) large taxonomic sampling of compared groups improves relative-rate tests, (2) sampling many outgroups does not bring significant improvement, (3) the only constraint on sampling strategy is that the outgroup be valid, and (4) results are more accurate when phylogenetic relationships between the investigated sequences are taken into account. Given current limitations of the maximum-likelihood and nonparametric approaches, the relative-rate test generalized to any number of species with phylogenetic weighting appears to be the most general test available to compare rates between lineages.


Subject(s)
Phylogeny , Adenine Phosphoribosyltransferase/genetics , Animals , Humans , Phosphatidylcholine-Sterol O-Acyltransferase/genetics
14.
Mol Biol Evol ; 14(8): 823-8, 1997 Aug.
Article in English | MEDLINE | ID: mdl-9254920

ABSTRACT

The most deviant isochore pattern within mammals was found in rat and mouse; most other mammals possess a different kind of isochore organization called the "general pattern." However, isochore patterns remain largely unknown in rodents other than mouse and rat. To investigate the taxonomic distribution of isochore patterns in rodents, we sequenced the nuclear gene LCAT (lecithin:cholesterol acyltransferase) from 17 rodents species (bringing the total of LCAT sequences in rodent to 19) and compared their GC contents at third codon positions and in introns. We also analyzed an extensive sequence database from rodents other than rat and mouse. All murid LCAT sequences are much poorer in GC than all nonrodent LCAT sequences, and the hamster sequence database shows exactly the same isochore pattern as rat and mouse. Thus, all murids share the same special isochore pattern--GC homogenization. LCAT sequences are GC-poor in hystricomorphs too, but the guinea pig sequence database indicates that large changes in GC content occur without an overall modification of the isochore pattern. This novel mode of isochore evolution is called GC reordering. LCAT sequences also show that the evolution of isochores in sciurids and glirids is nonconservative in comparison with that in nonrodents. Thus, at least two novel patterns of isochore evolution were found. No rodent investigated to date shared the general mammalian pattern.


Subject(s)
Evolution, Molecular , Rodentia/genetics , Animals , Chickens/genetics , Cricetinae , Genes , Genetic Markers , Guinea Pigs/genetics , Mammals/genetics , Mice , Molecular Sequence Data , Multigene Family , Phosphatidylcholine-Sterol O-Acyltransferase/genetics , Phylogeny , Rats , Rodentia/classification , Sciuridae/genetics , Species Specificity
15.
J Mol Evol ; 44 Suppl 1: S44-51, 1997.
Article in English | MEDLINE | ID: mdl-9071011

ABSTRACT

The vertebrate genome underwent two major compositional transitions, between therapsids and mammals and between dinosaurs and birds. These transitions concerned a sizable part (roughly one-third) of the genome, the gene-richest part of it, and consisted in an increase in GC levels (GC is the molar fraction of guanine + cytosine in DNA) which affected both coding sequences (especially third codon positions) and noncoding sequences. These major transitions were studied here by comparing GC3 levels (GC3 is the GC of third codon positions) of orthologous genes from Xenopus, chicken, calf, and man.


Subject(s)
Base Composition , Genome , Vertebrates/genetics , Animals , Evolution, Molecular , Humans , Molecular Sequence Data , Phylogeny
16.
Gene ; 205(1-2): 317-22, 1997 Dec 31.
Article in English | MEDLINE | ID: mdl-9461406

ABSTRACT

Murid nuclear genomes are more homogeneous in GC content than those of most mammals, which leads to the question of how such important compositional changes have accumulated. This paper reports on relationships between frequencies of synonymous differences and GC change, in the lineages leading to human and murids. For this, we used the four-species approach: GC changes between human and murids were compared to the frequencies of synonymous differences, measured between two independent species without GC change (bovine and pig), by using orthologous genes common to all four species. We report three conclusions: (1) Among genes with little GC change, 60% of the variability of synonymous substitution frequencies is explained by the gene-specific rate component. (2) GC changes in murid genomes are independent of the gene-specific rate component. Slowly evolving genes in pig bovine comparison can show strong GC change in murids. (3) By using a GC-independent estimate of the substitution rate, we show that GC changes in murid genomes increase synonymous substitution frequencies. The GC homogenization considerably weakens the gene-specific conservation of substitution rates in murids, and could explain part of the increase of evolutionary rates observed in this group. We present a mechanism that can account for the evolution of the GC homogenization in murids.


Subject(s)
Cytosine/analysis , Guanine/analysis , Muridae/genetics , Animals , Cattle , Evolution, Molecular , Genome , Humans , Muridae/physiology
17.
Mol Phylogenet Evol ; 8(3): 423-34, 1997 Dec.
Article in English | MEDLINE | ID: mdl-9417899

ABSTRACT

Phylogenetic relationships among 19 extant species of rodents, with special emphasis on rats, mice, and allied Muroidea, were studied using sequences of the nuclear protein-coding gene LCAT (lecithin:cholesterol acyltransferase), an enzyme of cholesterol metabolism. Analysis of 705 base pairs from the exonic regions of LCAT confirmed known groupings in and around Muroidea. Strong support was found for the families Sciuridae (squirrel and marmot) and Gliridae (dormice) and for suprafamilial taxa Muroidea and Caviomorpha (guinea pig and allies). Within Muroidea, the first branching leads to the fossorial mole rats Spalacinae and bamboo rats Rhizomyinae. The other Muroidea appear as a polytomy from which are issued Gerbillinae (gerbils), Murinae (rats and mice), Sigmodontinae (New World cricetids), Cricetinae (hamsters), and Arvicolinae (voles). Evidence from LCAT sequences agrees with that from a number of previous molecular and morphological studies, both concerning branching orders inside Muroidea and the bush-like radiation of rodent suprafamilial taxa (caviomorphs, sciurids, glirids, muroids), thus suggesting that this nuclear gene is an appropriate candidate for addressing questions of rodents relationships.


Subject(s)
Cell Nucleus/enzymology , Muridae/genetics , Phylogeny , Sterol O-Acyltransferase/genetics , Animals , Base Sequence , DNA , Molecular Sequence Data , Muridae/classification , Sequence Homology, Nucleic Acid , Species Specificity
18.
Mol Phylogenet Evol ; 5(1): 2-12, 1996 Feb.
Article in English | MEDLINE | ID: mdl-8673288

ABSTRACT

As the correlations between GC levels in third codon positions (GC3) and intergenic sequence GC levels can be used to assess the distribution of genes in the human genome, they were studied in detail. Previous work from our laboratory has demonstrated the existence of linear correlations between GC levels of exons, introns, third codon positions, 5' flanking regions of genes, and long genomic DNA sequences (> or = 10 kb) or DNA molecules (50-100 kb) in which the genes are embedded. The present study confirms and extends the previous results using a larger set of data. Furthermore, an analysis of 4270 human genomic DNA and cDNA sequences has allowed us to confirm a correlation of GC3 against GC1+2. Recent additions to the sequence database have also allowed separate analyses of the 5' flanking regions of CpG island and non-CpG island genes as well as analyses of 3' flanking regions, which suggest that the GC levels of 3' flanking regions are closer to those of intergenic DNA than are those of other regions of genes.


Subject(s)
Codon/genetics , DNA/chemistry , DNA/genetics , Base Composition , CpG Islands , Databases, Factual , Exons , Genome, Human , Humans , Introns , Regression Analysis
19.
J Mol Evol ; 40(3): 308-17, 1995 Mar.
Article in English | MEDLINE | ID: mdl-7723057

ABSTRACT

We compared the exon/intron organization of vertebrate genes belonging to different isochore classes, as predicted by their GC content at third codon position. Two main features have emerged from the analysis of sequences published in GenBank: (1) genes coding for long proteins (i.e., > or = 500 aa) are almost two times more frequent in GC-poor than in GC-rich isochores; (2) intervening sequences (= sum of introns) are on average three times longer in GC-poor than in GC-rich isochores. These patterns are observed among human, mouse, rat, cow, and even chicken genes and are therefore likely to be common to all warm-blooded vertebrates. Analysis of Xenopus sequences suggests that the same patterns exist in cold-blooded vertebrates. It could be argued that such results do not reflect the reality because sequence databases are not representative of entire genomes. However, analysis of biases in GenBank revealed that the observed discrepancies between GC-rich and GC-poor isochores are not artifactual, and are probably largely underestimated. We investigated the distribution of microsatellites and interspersed repeats in introns of human and mouse genes from different isochores. This analysis confirmed previous studies showing that L1 repeats are almost absent from GC-rich isochores. Microsatellites and SINES (Alu, B1, B2) are found at roughly equal frequencies in introns from all isochore classes. Globally, the presence of repeated sequences does not account for the increased intron length in GC-poor isochores. The relationships between gene structure and global genome organization and evolution are discussed.


Subject(s)
Genes , Vertebrates/genetics , Animals , Base Composition , Base Sequence , Biological Evolution , DNA, Complementary/genetics , Humans , Introns , Mice , Molecular Sequence Data , Repetitive Sequences, Nucleic Acid
20.
J Mol Evol ; 40(1): 107-13, 1995 Jan.
Article in English | MEDLINE | ID: mdl-7714909

ABSTRACT

The frequencies of synonymous substitutions of mammalian genes cover a much wider range than previously thought. We report here that the different frequencies found in homologous genes from a given mammalian pair are correlated with those in the same homologous genes from a different mammalian pair. This indicates that the frequencies of synonymous substitutions are gene-specific (as are the frequencies of nonsynonymous substitutions), or, in other words, that "fast" and "slow" genes in one mammal are fast and slow, respectively, in any other one. Moreover, the frequencies of synonymous substitutions are correlated with the frequencies of nonsynonymous substitution in the same genes.


Subject(s)
Gene Frequency , Mammals/genetics , Animals , Biological Evolution , Computer Simulation , DNA Repair , Humans , Models, Biological
SELECTION OF CITATIONS
SEARCH DETAIL
...