Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
1.
Mol Biol Evol ; 40(5)2023 05 02.
Article in English | MEDLINE | ID: mdl-37139943

ABSTRACT

The formation of new genes during evolution is an important motor of functional innovation, but the rate at which new genes originate and the likelihood that they persist over longer evolutionary periods are still poorly understood questions. Two important mechanisms by which new genes arise are gene duplication and de novo formation from a previously noncoding sequence. Does the mechanism of formation influence the evolutionary trajectories of the genes? Proteins arisen by gene duplication retain the sequence and structural properties of the parental protein, and thus they may be relatively stable. Instead, de novo originated proteins are often species specific and thought to be more evolutionary labile. Despite these differences, here we show that both types of genes share a number of similarities, including low sequence constraints in their initial evolutionary phases, high turnover rates at the species level, and comparable persistence rates in deeper branchers, in both yeast and flies. In addition, we show that putative de novo proteins have an excess of substitutions between charged amino acids compared with the neutral expectation, which is reflected in the rapid loss of their initial highly basic character. The study supports high evolutionary dynamics of different kinds of new genes at the species level, in sharp contrast with the stability observed at later stages.


Subject(s)
Evolution, Molecular , Proteins , Proteins/genetics , Gene Duplication , Saccharomyces cerevisiae/genetics , Phylogeny
2.
Int J Mol Sci ; 24(3)2023 Feb 03.
Article in English | MEDLINE | ID: mdl-36769294

ABSTRACT

A large part of the genome is known to be transcribed as non-coding DNA including some tandem repeats (satellites) such as telomeric/centromeric satellites in different species. However, there has been no detailed study on the eventual transcription of the interspersed satellites found in many species. In the present paper, we studied the transcription of the abundant DNA satellites in the nematode Caenorhabditis elegans using available RNA-Seq results. We found that many of them have been transcribed, but usually in an irregular manner; different regions of a satellite have been transcribed with variable efficiency. Satellites with a similar repeat sequence also have a different transcription pattern depending on their position in the genome. We also describe the peculiar features of satellites associated with Helitron transposons in C. elegans. Our demonstration that some satellite RNAs are transcribed adds a new family of non-coding RNAs, a new element in the world of RNA interference, with new paths for the control of mRNA translation. This is a field that requires further investigation and will provide a deeper understanding of gene expression and control.


Subject(s)
Caenorhabditis elegans , DNA, Satellite , Animals , DNA, Satellite/genetics , Caenorhabditis elegans/genetics , Repetitive Sequences, Nucleic Acid , DNA , Centromere
3.
Genes (Basel) ; 12(11)2021 10 20.
Article in English | MEDLINE | ID: mdl-34828257

ABSTRACT

It has been shown in recent years that many repeated sequences in the genome are expressed as RNA transcripts, although the role of such RNAs is poorly understood. Some isolated and tandem repeats (satellites) have been found to be transcribed, such as mammalian Alu sequences and telomeric/centromeric satellites in different species. However, there is no detailed study on the eventual transcription of the interspersed satellites found in many species. Therefore, we decided to study for the first time the transcription of the abundant DNA satellites in the bacterium Bacillus coagulans and in the nematode Caenorhabditis elegans. We have updated the data for C. elegans satellites using the latest version of the genome. We analyzed the transcription of satellites in both species in available RNA-seq results and found that they are widely transcribed. Our demonstration that satellite RNAs are transcribed adds a new family of non-coding RNAs. This is a field that requires further investigation and will provide a deeper understanding of gene expression and control.


Subject(s)
Bacillus coagulans/genetics , Caenorhabditis elegans/genetics , DNA, Satellite/genetics , Animals , Bacteria/genetics , Eukaryota/genetics , Gene Expression Regulation , Genome/genetics , RNA, Untranslated/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, DNA , Sequence Analysis, RNA , Transcription, Genetic
4.
Int J Mol Sci ; 22(10)2021 May 20.
Article in English | MEDLINE | ID: mdl-34065296

ABSTRACT

Little is known about DNA tandem repeats across prokaryotes. We have recently described an enigmatic group of tandem repeats in bacterial genomes with a constant repeat size but variable sequence. These findings strongly suggest that tandem repeat size in some bacteria is under strong selective constraints. Here, we extend these studies and describe tandem repeats in a large set of Bacillus. Some species have very few repeats, while other species have a large number. Most tandem repeats have repeats with a constant size (either 52 or 20-21 nt), but a variable sequence. We characterize in detail these intriguing tandem repeats. Individual species have several families of tandem repeats with the same repeat length and different sequence. This result is in strong contrast with eukaryotes, where tandem repeats of many sizes are found in any species. We discuss the possibility that they are transcribed as small RNA molecules. They may also be involved in the stabilization of the nucleoid through interaction with proteins. We also show that the distribution of tandem repeats in different species has a taxonomic significance. The data we present for all tandem repeats and their families in these bacterial species will be useful for further genomic studies.


Subject(s)
Bacillus/genetics , Tandem Repeat Sequences/genetics , Bacteria/genetics , Eukaryota/genetics , Genome, Bacterial/genetics , Genomics/methods , Prokaryotic Cells/physiology , Species Specificity
5.
Nat Commun ; 12(1): 604, 2021 01 27.
Article in English | MEDLINE | ID: mdl-33504782

ABSTRACT

De novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes.


Subject(s)
Genes, Fungal , Saccharomyces cerevisiae/genetics , Transcriptome/genetics , Conserved Sequence/genetics , Gene Expression Regulation, Fungal , Gene Regulatory Networks , Open Reading Frames/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
6.
J Bacteriol ; 202(21)2020 10 08.
Article in English | MEDLINE | ID: mdl-32839174

ABSTRACT

DNA tandem repeats, or satellites, are well described in eukaryotic species, but little is known about their prevalence across prokaryotes. Here, we performed the most complete characterization to date of satellites in bacteria. We identified 121,638 satellites from 12,233 fully sequenced and assembled bacterial genomes with a very uneven distribution. We also determined the families of satellites which have a related sequence. There are 85 genomes that are particularly satellite rich and contain several families of satellites of yet unknown function. Interestingly, we only found two main types of noncoding satellites, depending on their repeat sizes, 22/44 or 52 nucleotides (nt). An intriguing feature is the constant size of the repeats in the genomes of different species, whereas their sequences show no conservation. Individual species also have several families of satellites with the same repeat length and different sequences. This result is in marked contrast with previous findings in eukaryotes, where noncoding satellites of many sizes are found in any species investigated. We describe in greater detail these noncoding satellites in the spirochete Leptospira interrogans and in several bacilli. These satellites undoubtedly play a specific role in the species which have acquired them. We discuss the possibility that they represent binding sites for transcription factors not previously described or that they are involved in the stabilization of the nucleoid through interaction with proteins.IMPORTANCE We found an enigmatic group of noncoding satellites in 85 bacterial genomes with a constant repeat size but variable sequence. This pattern of DNA organization is unique and had not been previously described in bacteria. These findings strongly suggest that satellite size in some bacteria is under strong selective constraints and thus that satellites are very likely to play a fundamental role. We also provide a list and properties of all satellites in 12,233 genomes, which may be used for further genomic analysis.


Subject(s)
Bacteria/genetics , DNA, Bacterial/genetics , DNA, Satellite/genetics , Genome, Bacterial , Leptospira interrogans/genetics , Databases, Genetic , Genomics
7.
BMC Evol Biol ; 19(1): 181, 2019 09 18.
Article in English | MEDLINE | ID: mdl-31533616

ABSTRACT

BACKGROUND: Satellites or tandem repeats are very abundant in many eukaryotic genomes. Occasionally they have been reported to be present in some prokaryotes, but to our knowledge there is no general comparative study on their occurrence. For this reason we present here an overview of the distribution and properties of satellites in a set of representative species. Our results provide novel insights into the evolutionary relationship between eukaryotes, Archaea and Bacteria. RESULTS: We have searched all possible satellites present in the NCBI reference group of genomes in Archaea (142 species) and in Bacteria (119 species), detecting 2735 satellites in Archaea and 1067 in Bacteria. We have found that the distribution of satellites is very variable in different organisms. The archaeal Methanosarcina class stands out for the large amount of satellites in their genomes. Satellites from a few species have similar characteristics to those in eukaryotes, but most species have very few satellites: only 21 species in Archaea and 18 in Bacteria have more than 4 satellites/Mb. The distribution of satellites in these species is reminiscent of what is found in eukaryotes, but we find two significant differences: most satellites have a short length and many of them correspond to segments of genes coding for amino acid repeats. Transposition of non-coding satellites throughout the genome occurs rarely: only in the bacteria Leptospira interrogans and the archaea Methanocella conradii we have detected satellite families of transposed satellites with long repeats. CONCLUSIONS: Our results demonstrate that the presence of satellites in the genome is not an exclusive feature of eukaryotes. We have described a few prokaryotes which do contain satellites. We present a discussion on their eventual evolutionary significance.


Subject(s)
DNA, Satellite/genetics , Prokaryotic Cells/metabolism , Archaea/genetics , Bacteria/genetics , Microsatellite Repeats/genetics , Phylogeny
8.
Genes (Basel) ; 9(10)2018 Oct 16.
Article in English | MEDLINE | ID: mdl-30332836

ABSTRACT

Repetitive genome regions have been difficult to sequence, mainly because of the comparatively small size of the fragments used in assembly. Satellites or tandem repeats are very abundant in nematodes and offer an excellent playground to evaluate different assembly methods. Here, we compare the structure of satellites found in three different assemblies of the Caenorhabditis elegans genome: the original sequence obtained by Sanger sequencing, an assembly based on PacBio technology, and an assembly using Nanopore sequencing reads. In general, satellites were found in equivalent genomic regions, but the new long-read methods (PacBio and Nanopore) tended to result in longer assembled satellites. Important differences exist between the assemblies resulting from the two long-read technologies, such as the sizes of long satellites. Our results also suggest that the lengths of some annotated genes with internal repeats which were assembled using Sanger sequencing are likely to be incorrect.

9.
Nat Ecol Evol ; 2(5): 890-896, 2018 05.
Article in English | MEDLINE | ID: mdl-29556078

ABSTRACT

Accumulating evidence indicates that some protein-coding genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that are translated at significant levels and that can at some point acquire new functions. Here, we use deep ribosome-profiling sequencing data, together with proteomics and single nucleotide polymorphism information, to search for these peptides. We find hundreds of open reading frames that are translated and that show no evolutionary conservation or selective constraints. These data suggest that the translation of these neutrally evolving peptides may be facilitated by the chance occurrence of open reading frames with a favourable codon composition. We conclude that the pervasive translation of the transcriptome provides plenty of material for the evolution of new functional proteins.


Subject(s)
Evolution, Molecular , Peptides/chemistry , Polymorphism, Single Nucleotide , Ribosomes/chemistry , Animals , High-Throughput Nucleotide Sequencing , Humans , Mice , Mice, Inbred BALB C , Proteomics
10.
Genes (Basel) ; 8(12)2017 Nov 28.
Article in English | MEDLINE | ID: mdl-29182550

ABSTRACT

The availability of the genome sequence of the unisexual (male-female) Caenorhabditis nigoni offers an opportunity to compare its non-coding features with the related hermaphroditic species Caenorhabditis briggsae; to understand the evolutionary dynamics of their tandem repeat sequences (satellites), as a result of evolution from the unisexual ancestor. We take advantage of the previously developed SATFIND program to build satellite families defined by a consensus sequence. The relative number of satellites (satellites/Mb) in C. nigoni is 24.6% larger than in C. briggsae. Some satellites in C. nigoni have developed from a proto-repeat present in the ancestor species and are conserved as an isolated sequence in C. briggsae. We also identify unique satellites which occur only once and joint satellite families with a related sequence in both species. Some of these families are only found in C. nigoni, which indicates a recent appearance; they contain conserved adjacent 5' and 3' regions, which may favor transposition. Our results show that the number, length and turnover of satellites are restricted in the hermaphrodite C. briggsae when compared with the unisexual C. nigoni. We hypothesize that this results from differences in unequal recombination during meiotic chromosome pairing, which limits satellite turnover in hermaphrodites.

11.
BMC Evol Biol ; 15: 218, 2015 Oct 05.
Article in English | MEDLINE | ID: mdl-26438045

ABSTRACT

BACKGROUND: The high density of tandem repeat sequences (satellites) in nematode genomes and the availability of genome sequences from several species in the group offer a unique opportunity to better understand the evolutionary dynamics and the functional role of these sequences. We take advantage of the previously developed SATFIND program to study the satellites in four Caenorhabditis species and investigate these questions. METHODS: The identification and comparison of satellites is carried out in three steps. First we find all the satellites present in each species with the SATFIND program. Each satellite is defined by its length, number of repeats, and repeat sequence. Only satellites with at least ten repeats are considered. In the second step we build satellite families with a newly developed alignment program. Satellite families are defined by a consensus sequence and the number of satellites in the family. Finally we compare the consensus sequence of satellite families in different species. RESULTS: We give a catalog of individual satellites in each species. We have also identified satellite families with a related sequence and compare them in different species. We analyze the turnover of satellites: they increased in size through duplications of fragments of 100-300 bases. It appears that in many cases they have undergone an explosive expansion. In C. elegans we have identified a subset of large satellites that have strong affinity for the centromere protein CENP-A. We have also compared our results with those obtained from other species, including one nematode and three mammals. CONCLUSIONS: Most satellite families found in Caenorhabditis are species-specific; in particular those with long repeats. A subset of these satellites may facilitate the formation of kinetochores in mitosis. Other satellite families in C. elegans are either related to Helitron transposons or to meiotic pairing centers.


Subject(s)
Caenorhabditis/classification , Caenorhabditis/genetics , DNA, Helminth/genetics , Animals , Autoantigens/genetics , Biological Evolution , Caenorhabditis elegans/genetics , Centromere , Centromere Protein A , Chromosomal Proteins, Non-Histone/genetics , DNA, Satellite/genetics , Repetitive Sequences, Nucleic Acid , Species Specificity
12.
Elife ; 3: e03523, 2014 Sep 16.
Article in English | MEDLINE | ID: mdl-25233276

ABSTRACT

Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution.


Subject(s)
Open Reading Frames/genetics , Peptides/genetics , Protein Biosynthesis , RNA, Long Noncoding/genetics , Animals , Arabidopsis , Drosophila melanogaster , Evolution, Molecular , Gene Expression Profiling , Humans , Mice , Proteomics , RNA, Long Noncoding/metabolism , Ribosomes/genetics , Ribosomes/metabolism , Saccharomyces cerevisiae , Selection, Genetic , Sequence Analysis, RNA , Species Specificity , Zebrafish
13.
PLoS One ; 8(4): e62221, 2013.
Article in English | MEDLINE | ID: mdl-23638010

ABSTRACT

Centromere sequences in the genome are associated with the formation of kinetochores, where spindle microtubules grow in mitosis. Centromere sequences usually have long tandem repeats (satellites). In holocentric nematodes it is not clear how kinetochores are formed during mitosis; they are distributed throughout the chromosomes. For this reason it appeared of interest to study the satellites in nematodes in order to determine if they offer any clue on how kinetochores are assembled in these species. We have studied the satellites in the genome of six nematode species. We found that the presence of satellites depends on whether the nematode chromosomes are holocentric or monocentric. It turns out that holocentric nematodes are unique because they have a large number of satellites scattered throughout their genome. Their number, length and composition are different in each species: they apparently have very little evolutionary conservation. In contrast, no scattered satellites are found in the monocentric nematode Trichinella spiralis. It appears that the absence/presence of scattered satellites in the genome distinguishes monocentric from holocentric nematodes. We conclude that the presence of satellites is related to the holocentric nature of the chromosomes of most nematodes. Satellites may stabilize a higher order structure of chromatin and facilitate the formation of kinetochores. We also present a new program, SATFIND, which is suited to find satellite sequences.


Subject(s)
DNA, Satellite , Genome, Helminth , Trichinella spiralis/genetics , Animals , Base Composition , Base Sequence , Centromere/genetics , Genome Size , Nucleotide Motifs , Phylogeny , Trichinella spiralis/classification
14.
Mol Biosyst ; 8(8): 2085-96, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22710377

ABSTRACT

We carried out a systems-level study of the mechanisms underlying organ-specific metastases of breast cancer. We followed a network-based approach using microarray expression data from human breast cancer metastases to select organ-specific proteins that exert a range of functions allowing cell survival and growth in the microenvironment of distant organs. MinerProt, a home-made software application, was used to group organ-specific signatures of brain (1191 genes), bone (1623 genes), liver (977 genes) and lung (254 genes) metastases by function and select the most differentially expressed gene in each function. As a result, we obtained 19 functional representative proteins in brain, 23 in bone, 15 in liver and 9 in lung, with which we constructed four organ-specific protein-protein interaction networks. The network taxonomy included seven proteins that interacted in brain metastasis, which were mainly associated with signal transduction. Proteins related to immune response functions were bone specific, while those involved in proteolysis, signal transduction and hepatic glucose metabolism were found in liver metastasis. No experimental protein-protein interaction was found in lung metastasis; thus, computationally determined interactions were included in this network. Moreover, three of these selected genes (CXCL12, DSC2 and TFDP2) were associated with progression to specific organs when tested in an independent dataset. In conclusion, we present a network-based approach to filter information by selecting key protein functions as metastatic markers or therapeutic targets.


Subject(s)
Breast Neoplasms/complications , Breast Neoplasms/metabolism , Protein Interaction Maps , Brain Neoplasms/genetics , Brain Neoplasms/metabolism , Brain Neoplasms/secondary , Breast Neoplasms/genetics , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Liver Neoplasms/genetics , Liver Neoplasms/metabolism , Liver Neoplasms/secondary , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Lung Neoplasms/secondary , Software
15.
J Theor Biol ; 283(1): 28-34, 2011 Aug 21.
Article in English | MEDLINE | ID: mdl-21635904

ABSTRACT

There are general features of chromosome dynamics, such as homologue recognition in early meiosis, which are expected to involve related sequence motifs in non-coding DNA, with a similar distribution in different species. A search for such motifs is presented here. It has been carried out with the CONREPP programme. It has been found that short alternating AT sequences (10-20 bases) have a similar distribution in most eukaryotic organisms, with some exceptions related to unique meiotic features. All other microsatellite and repeat sequences vary significantly in different organisms. It is concluded that the unique structural features and uniform distribution of alternating AT sequences indicate that they may facilitate homologous chromosome pairing in the early preleptotene stage of meiosis. They may also play a role in the compaction of DNA in mitotic chromosomes.


Subject(s)
Eukaryota/genetics , Meiosis/genetics , Microsatellite Repeats/genetics , Animals , Base Sequence , Caenorhabditis elegans/genetics , Chromosome Pairing/genetics , Databases, Nucleic Acid , Genome , Humans , Molecular Sequence Data , Nucleosomes/genetics , Saccharomyces cerevisiae Proteins/genetics , Species Specificity
16.
Nucleic Acids Res ; 38(4): 1172-81, 2010 Mar.
Article in English | MEDLINE | ID: mdl-19966278

ABSTRACT

The purpose of this work is to determine the most frequent short sequences in non-coding DNA. They may play a role in maintaining the structure and function of eukaryotic chromosomes. We present a simple method for the detection and analysis of such sequences in several genomes, including Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. We also study two chromosomes of man and mouse with a length similar to the whole genomes of the other species. We provide a list of the most common sequences of 9-14 bases in each genome. As expected, they are present in human Alu sequences. Our programs may also give a graph and a list of their position in the genome. Detection of clusters is also possible. In most cases, these sequences contain few alternating regions. Their intrinsic structure and their influence on nucleosome formation are not known. In particular, we have found new features of short sequences in C. elegans, which are distributed in heterogeneous clusters. They appear as punctuation marks in the chromosomes. Such clusters are not found in either A. thaliana or D. melanogaster. We discuss the possibility that they play a role in centromere function and homolog recognition in meiosis.


Subject(s)
DNA, Intergenic/chemistry , Alu Elements , Animals , Arabidopsis/genetics , Base Sequence , Caenorhabditis elegans/genetics , Genomics , Humans , Mice , Microsatellite Repeats , Short Interspersed Nucleotide Elements
17.
Article in English | MEDLINE | ID: mdl-19407343

ABSTRACT

Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire.


Subject(s)
DNA/chemistry , Interspersed Repetitive Sequences , Markov Chains , Sequence Alignment/methods , Sequence Homology, Nucleic Acid , Base Sequence , Computer Simulation , DNA, Bacterial/chemistry , Genome, Bacterial , Models, Statistical , Molecular Sequence Data , Mycoplasma genitalium/genetics , Software
18.
Gene ; 408(1-2): 124-32, 2008 Jan 31.
Article in English | MEDLINE | ID: mdl-18022767

ABSTRACT

We present an analysis of tandem repeats of short sequence motifs (microsatellites) in twelve eukaryotes for which a large part of the genome has been sequenced and assembled. The pattern of motif abundance varies significantly in different species, but it is very similar in different chromosomes of the same species. The most abundant repeats can be classified in two main families. The first family has a rigid conformation, with purines in one strand and pyrimidines in the complementary strand, mainly A(n)/T(n) and (AG)(n)/(CT)(n). The second family has alternating, flexible sequences, such as (AT)(n), (AC)(n) and related sequences. In the pluricellular organisms the relative frequency of both families is rather constant. These observations indicate that microsatellites have structural information and may be involved in the organization of chromatin fibers and in chromosome architecture in general. An additional intriguing finding is the absence of microsatellites with sequences which appear to be forbidden, such as (AATT)(n).


Subject(s)
Genome , Microsatellite Repeats , Animals , Base Sequence , Humans , Species Specificity
19.
Methods Mol Biol ; 396: 135-52, 2007.
Article in English | MEDLINE | ID: mdl-18025691

ABSTRACT

During the course of evolution, genomes can undergo large-scale mutation events such as rearrangement and lateral transfer. Such mutations can result in significant variations in gene order and gene content among otherwise closely related organisms. The Mauve genome alignment system can successfully identify such rearrangement and lateral transfer events in comparisons of multiple microbial genomes even under high levels of recombination. This chapter outlines the main features of Mauve and provides examples that describe how to use Mauve to conduct a rigorous multiple genome comparison and study evolutionary patterns.


Subject(s)
Biological Evolution , Genome, Bacterial , Sequence Alignment , Gene Transfer, Horizontal , Phylogeny , Yersinia pestis/genetics
20.
Genome Biol ; 8(7): R140, 2007.
Article in English | MEDLINE | ID: mdl-17626644

ABSTRACT

BACKGROUND: Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation. RESULTS: We show that mammalian housekeeping genes, expressed in all or nearly all tissues, show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues. In addition, we evaluate the effect of gene function, CpG island content and protein evolutionary rate on promoter sequence conservation. Finally, we identify a subset of transcription factors that bind to motifs that are specifically over-represented in housekeeping gene promoters. CONCLUSION: This is the first report that shows that the promoters of housekeeping genes show reduced sequence conservation with respect to genes expressed in a more tissue-restricted manner. This is likely to be related to simpler gene expression, requiring a smaller number of functional cis-regulatory motifs.


Subject(s)
CpG Islands , Promoter Regions, Genetic , Animals , Base Sequence , Conserved Sequence , Evolution, Molecular , Gene Expression , Genetic Variation , Humans , Mice , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...