Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
Biosystems ; 49(2): 83-103, 1999 Feb.
Article in English | MEDLINE | ID: mdl-10203190

ABSTRACT

The subset X0=[AAC,AAT,ACC,ATC,ATT,CAG,CTC,CTG, GAA,GAC,GAG,GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC] of 20 trinucleotides has a preferential occurrence in the frame 0 (reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subset X0 is a complementary maximal circular code with two permutated maximal circular codes X1 and X2 in the frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5'-3' direction). X0 is called a C3 code (Arquès and Michel, 1997, J. Biosyst 44, 107-134). A quantitative study of these three subsets X0, X1 and X2 in the three frames 0, 1 and 2 of eukaryotic protein genes shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of X0, X1 and X2 in the frame 0 of eukaryotic protein genes are 48.5%, 29% and 22.5% respectively. These properties are not observed in the 5' and 3' regions of eukaryotes where X0, X1 and X2 occur with variable frequencies around the random value (1/3). Several frequency asymmetries unexpectedly observed, e.g. the frequency difference between X1 and X2 in the frame 0, are related to a new property of the C3 code X0 involving substitutions. An evolutionary analytical model at three parameters (p, q, t) based on an independent mixing of the 20 codons (trinucleotides in the frame 0) of X0 with equiprobability (1/20) followed by t approximately 4 substitutions per codon according to the proportions p approximately 0.1, q approximately 0.1 and r = 1 - p - q approximately 0.8 in the three codon sites respectively, retrieves the frequencies of X0, X1 and X2 observed in the three frames of protein genes and explains these asymmetries. The complex behaviour of these analytical curves is totally unexpected and a priori difficult to imagine. Finally, the evolutionary analytical method developed could be applied to the phylogenetic tree reconstruction and the DNA sequence alignment.


Subject(s)
Biological Evolution , Computer Simulation , Genetic Code , Models, Genetic , Animals , Base Sequence , Molecular Sequence Data , Oligodeoxyribonucleotides
2.
Bull Math Biol ; 60(1): 163-94, 1998 Jan.
Article in English | MEDLINE | ID: mdl-9530018

ABSTRACT

The self-complementary subset T0 = X0 [symbol: see text] ¿AAA, TTT¿ with X0 = ¿AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC¿ of 22 trinucleotides has a preferential occurrence in the frame 0 (reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. The subsets T1 = X1 [symbol: see text] ¿CCC¿ and T2 = X2 [symbol: see text] ¿GGG¿ of 21 trinucleotides have a preferential occurrence in the shifted frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5'-3' direction). T1 and T2 are complementary to each other. The subset T0 contains the subset X0 which has the rarity property (6 x 10(-8) to be a complementary maximal circular code with two permutated maximal circular codes X1 and X2 in the frames 1 and 2 respectively. X0 is called a C3 code. A quantitative study of these three subsets T0, T1, T2 in the three frames 0, 1, 2 of protein genes, and the 5' and 3' regions of eukaryotes, shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of T0, T1, T2 in the frame 0 of protein genes are 49, 28.5 and 22.5% respectively. In contrast, the frequencies of T0, T1, T2 in the 5' and 3' regions of eukaryotes, are independent of the frame. Indeed, the frequency of T0 in the three frames of 5' (respectively 3') regions is equal to 35.5% (respectively 38%) and is greater than the frequencies T1 and T2, both equal to 32.25% (respectively 31%) in the three frames. Several frequency asymmetries unexpectedly observed (e.g. the frequency difference between T1 and T2 in the frame 0), are related to a new property of the subset T0 involving substitutions. An evolutionary analytical model at three parameters (p, q, t) based on an independent mixing of the 22 codons (trinucleotides in frame 0) of T0 with equiprobability (1/22) followed by t approximately 4 substitutions per codon according to the proportions p approximately 0.1, q approximately 0.1 and r = 1 - p - q approximately 0.8 in the three codon sites respectively, retrieves the frequencies of T0, T1, T2 observed in the three frames of protein genes and explains these asymmetries. Furthermore, the same model (0.1, 0.1, t) after t approximately 22 substitutions per codon, retrieves the statistical properties observed in the three frames of the 5' and 3' regions. The complex behaviour of these analytical curves is totally unexpected and a priori difficult to imagine.


Subject(s)
Biological Evolution , Genetic Code , Models, Genetic , Proteins/genetics , Animals , Codon/genetics , Humans , Mammals , Probability , Rodentia , Vertebrates
3.
J Theor Biol ; 185(2): 241-53, 1997 Mar 21.
Article in English | MEDLINE | ID: mdl-9135803

ABSTRACT

The subset X0 = [sequence: see text] of 20 trinucleotides has a preferential occurrence in frame 0 (a reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subset X0++ has the rarity property (6 x 10(-8)) to be a complementary maximal circular code with two permutated maximal circular codes X1 and X2 in frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5'-3' direction). X0 is called a C3 code. A quantitative study of these three subsets X0, X1 and X2 in the three frames 0, 1 and 2 of eukaryotic protein genes shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of X0, X1 and X2 in frame 0 of the eukaryotic protein genes are 48.5%, 29% and 22.5% respectively. These properties are not observed in the 5' and 3' regions of eukaryotes where X0, X1 and X2 occur with variable frequencies around the random value (1/3). Several frequency asymmetries unexpectedly observed, e.g. the frequency difference between X1 and X2 in the frame 0, are related to a new property of the C3 code X0 involving substitutions. An evolutionary model at three parameters (p, q, k) based on an independent mixing of the 20 codons (trinucleotides in frame 0) of X0 with equiprobability (1/20) followed by k approximately 5 substitutions per codon in the three codon sites in proportions p approximately 0.1, q approximately 0.1 and r = 1-p-q approximately 0.8 respectively, retrieves the frequencies of X0, X1 and X2 observed in the three frames of protein genes and explains these asymmetries.


Subject(s)
DNA, Circular , DNA, Complementary , Evolution, Molecular , Genetic Code , Models, Genetic , Animals , Eukaryotic Cells/physiology , Proteins/genetics
4.
Biosystems ; 44(2): 107-34, 1997.
Article in English | MEDLINE | ID: mdl-9429747

ABSTRACT

A statistical analysis with 12,288 autocorrelation functions applied in protein (coding) genes of prokaryotes and eukaryotes identifies three subsets of trinucleotides in their three frames: T0 = X0 [symbol: see text] {AAA, TTT} with X0 = {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} in frame 0 (the reading frame established by the ATG start trinucleotide), T1 = X1 [symbol: see text] {CCC} in frame 1 and T2 = X2 [symbol: see text] {GGG} in frame 2 (the frames 1 and 2 being the frame 0 shifted by one and two nucleotides, respectively, to the right). These three subsets are identical in these two gene populations and have five important properties: (i) the property of maximal (20 trinucleotides) circular code for X0 (resp. X1, X2) allowing to retrieve automatically the frame 0 (resp. 1, 2) in any region of the gene without start codon; (ii) the DNA complementarity property C (e.g. C(AAC) = GTT): C(T0) = T0, C(T1) = T2 and C(T2) = T1 allowing the two paired reading frames of a DNA double helix simultaneously to code for amino acids; (iii) the circular permutation property P (e.g. P(AAC) = ACA): P(X0) = X1 and P(X1) = X2 implying that the two subsets X1 and X2 can be deduced from X0; (iv) the rarity property with an occurrence probability of X0 = 6 x 10(-8); and (v) the concatenation properties in favour of an evolutionary code: a high frequency (27.5%) of misplaced trinucleotides in the shifted frames, a maximum (13 nucleotides) length of the minimal window to retrieve automatically the frame and an occurrence of the four types of nucleotides in the three trinucleotide sites. In Discussion, a simulation based on an independent mixing of the trinucleotides of T0 allows to retrieve the two subsets T1 and T2. Then, the identified subsets T0, T1 and T2 replaced in the 2-letter genetic alphabet {R, Y} (R = purine = A or G, Y = pyrimidine = C or T) allow to retrieve the RNY model (N = R or Y) and to explain previous works in the alphabet {R, Y}. Then, these three subsets are related to the genetic code. The trinucleotides of T0 code for 13 amino acids: Ala, Asn, Asp, Gln, Glu, Gly, Ile, Leu, Lys, Phe, Thr, Tyr and Val. Finally, a strong correlation between the usage of the trinucleotides of T0 in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded by T0, have as expected the lowest frequencies in proteins of both prokaryotes and eukaryotes.


Subject(s)
Computer Simulation , Genetic Code , Models, Biological , Proteins/genetics , Animals , Base Sequence , Humans , Molecular Sequence Data
5.
J Theor Biol ; 189(3): 273-90, 1997 Dec 07.
Article in English | MEDLINE | ID: mdl-9441820

ABSTRACT

A new maximal circular code X0(MIT) with two permutated maximal circular codes X1(MIT) and X2(MIT) is identified in the protein coding genes of mitochondria. The three subsets of 20 trinucleotides X0(MIT)={ACA, ACC, ATA, ATC, CTA, CTC, GAA, GAC, GAT, GCA, GCC, GCT, GGA, GGC, GGT, GTA, GTC, GTT, TTA, TTC}, X1(MIT) and X2(MIT) are in frame 0 (reading frame), 1 and 2 respectively. X1(MIT) and X2(MIT) are deduced by one and two circular permutations of X0(MIT) respectively. The code X0(MIT) has four important properties: a length of the minimal window to automatically retrieve frame 0 which is equal to five nucleotides; an occurrence probability equal to 6.3 x 10(-5); a low frequency (12% in average) of misplaced trinucleotides in the shifted frames; and an occurrence of four types of nucleotides in the first and second trinucleotide sites but no nucleotide G in the third trinucleotide site. Several biological consequences are presented in the Discussion.


Subject(s)
DNA, Mitochondrial , Mitochondria/genetics , Proteins/genetics , Amino Acids/genetics , Animals , Base Sequence , Codon , Eukaryotic Cells , Molecular Sequence Data , Prokaryotic Cells , Reading Frames
6.
J Theor Biol ; 182(1): 45-58, 1996 Sep 07.
Article in English | MEDLINE | ID: mdl-8917736

ABSTRACT

Recently, shifted periodicities 1 modulo 3 and 2 modulo 3 have been identified in protein (coding) genes of both prokaryotes and eukaryotes with autocorrelation functions analysing eight of 64 trinucleotides (Arquès et al., 1995). This observation suggests that the trinucleotides are associated with frames in protein genes. In order to verify this hypothesis, a distribution of the 64 trinucleotides AAA,..., TTT is studied in both gene populations by using a simple method based on the trinucleotide frequencies per frame. In protein genes, the trinucleotides can be read in three frames: the reading frame 0 established by the ATG start trinucleotide and frame 1 (resp. 2) which is the frame 0 shifted by 1 (resp. 2) nucleotide in the 5'-3' direction. Then, the occurrence frequencies of the 64 trinucleotides are computed in the three frames. By classifying each of the 64 trinucleotides in its preferential occurrence frame, i.e. the frame associated with its highest frequency, three subsets of trinucleotides can be identified in the three frames. This approach is applied in the two gene populations. Unexpectedly, the same three subsets of trinucleotides are identified in these two gene populations: Tzero = Xzero [symbol: see text] {AAA,TTT} with Xzero = {AAC,AAT,ACC,ATC,ATT, CAG,CTC,CTG,GAA,GAC,GAG, GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC} in frame 0, T1 = X1 [symbol: see text] {CCC} in frame 1 and T2 = X2 [symbol: see text] {GGG} in frame 2, each subset Xzero, X1 and X2 having 20 trinucleotides. Surprisingly, these three subsets have five important properties: (i) the property of maximal circular code for Xzero (resp. X1, X2) allowing the automatical retrieval of frame 0 (resp. 1, 2) in any region of a protein gene model (formed by a series of trinucleotides of Xzero) without using a start codon; (ii) the DNA complementarity property C (e.g. C(AAC) = GTT): C(T0) = T0, C(T1) = T2 and C(T2) = T1 allowing the two paired reading frames of a DNA double helix simultaneously to code for amino acids; (iii) the circular permutation property P (e.g. P(AAC) = ACA): P(Xzero) = X1 and P(X1) = X2 implying that the two subsets X1 and X2 can be deduced from Xzero; (iv) the rarity property with an occurrence probability of Xzero equal to 6 x 10(-8); and (v) the concatenation property with: a high frequency (27.5%) of misplaced trinucleotides in the shifted frames, a maximum (13 nucleotides) length of the minimal window to automatically retrieve the frame and an occurrence of the four types of nucleotides in the three trinucleotides sites, in favour of an evolutionary code. In the Discussion, the identified subsets Tzero, T1 and T2 replaced in the three two-letter genetic alphabets purine/pyrimidine, amino/ceto and strong/weak interaction, allow us to deduce that the RNY model (R = purine = A or G, Y = pyrimidine = C or T, N = R or Y) (Eigen & Schuster, 1978) is the closest two-letter codon model to the trinucleotides of Tzero. Then, these three subsets are related to the genetic code. The trinucleotides of Tzero code for 13 amino acids: Ala, Asn, Asp, Gln, Glu, Gly, Ile, Leu, Lys, Phe, Thr, Tyr and Val. Finally, a strong correlation between the usage of the trinucleotides of Tzero in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded by Tzero, have as expected the lowest frequencies in proteins of both prokaryotes and eukaryotes.


Subject(s)
Eukaryotic Cells/physiology , Genetic Code , Models, Genetic , Prokaryotic Cells/physiology , Proteins/genetics , Amino Acids/genetics , Animals , Codon , DNA/genetics , Trinucleotide Repeats
7.
Int J Biol Macromol ; 19(2): 131-8, 1996 Aug.
Article in English | MEDLINE | ID: mdl-8842776

ABSTRACT

The collagens constitute an important population of proteins providing the structural support in vertebrate tissues A collagen is mainly based on a series of tripeptides of the type GX1X2 (G = Glycine, X1 and X2 being any residues). The nine amino acids occurring with significant frequencies in the X1 and X2 residue sites and G form the reduced protein alphabet Q = [A,D,E,G,K,L,P,Q,R,S] (A = Alanine, D = Aspartic acid, E = Glutamic acid, K = Lysine, L = Leucine, P = Proline, Q = Glutamine, R = Arginine, S = Serine). Surprisingly, the method based on the autocorrelation function w(X)iw' analysing the probability that an amino acid w' in Q occurs any i residues X after an amino acid w in Q (called i-motif w(X)iw'), identifies six types of modulo 3 periodicities in collagens: three basic types 0, 1 and 2 modulo 3 and three combined types 0,1, 0,2 and 1,2 modulo 3. Furthermore, the classification of these 100 i-motifs according to the types of periodicities shows several strong relations between four sub-sets of Q [G], [A,D,P,S], [E,L] and [K,Q,R]. Then, these relations allow the construction of a simple automaton for the generation of model collagen sequences. Indeed, this automaton can simulate the six types of periodicities and it retrieves the types of periodicities for almost all i-motifs. Finally, the autocorrelation function based on the sub-set [K,Q,R] identifies segments of 18 amino acids in collagens which may correspond to the exons (segments of genes of 54 nucleotides) coding for those collagens.


Subject(s)
Collagen/chemistry , Models, Chemical , Algorithms , Amino Acids/analysis , Amino Acids/chemistry , Computer Simulation , Peptides/chemistry , Probability
8.
J Theor Biol ; 175(4): 533-44, 1995 Aug 21.
Article in English | MEDLINE | ID: mdl-7475089

ABSTRACT

The mutation process is a classical evolutionary genetic process mainly based on the (random) substitutions of one base (A = Adenine, C = Cytosine, G = Guanine, T = Thymine) for another. Two analytical solutions derived here allow us to analyse in genes the occurrence probabilities of motifs (e.g. dinucleotides) after substitutions (in the evolutionary sense: from the past to the present) and, unexpectedly, also before substitutions (after back substitutions, in the inverse evolutionary sense: from the present to the past). We generalize on the alphabet [A, C, G, T] of the analytical solutions and of the properties derived on the alphabet [R, Y] (R = purine = A or G, Y = pyrimidine = C or T). Application of the theory is based on the analytical solution giving the probabilities of the 16 dinucleotides AA, ..., TT in the protein (coding) genes of (nuclear) eukaryotes, viruses and prokaryotes and in (eukaryotic) introns after back substitutions (called primitive genes). After back substitutions, four of 16 dinucleotides--CG, TA, GT and AC--occur with low probabilities in each of these four primitive gene populations, except for CG in the primitive prokaryotic protein genes. In the primitive eukaryotic protein genes, the dinucleotide AT has also a significant low probability. We present the properties of the two analytical solutions, and the functions which may have these five dinucleotides in primitive genes are described in terms of biological signals.


Subject(s)
Biological Evolution , Models, Genetic , Mutation , Sequence Analysis, DNA , Animals , Base Sequence , Probability
9.
J Theor Biol ; 172(3): 279-91, 1995 Feb 07.
Article in English | MEDLINE | ID: mdl-7715198

ABSTRACT

The distribution of nucleotides in protein coding genes is studied with autocorrelation functions. The autocorrelation function YRY(N)iYRY, analysing the occurrence probability of the i-motif YRY(N)iYRY (two motifs YRY separated by any i bases N, R = purine = Adenine or Guanine, Y = pyrimidine = Cytosine or Thymine, N = R or Y) in the protein coding genes of eukaryotes, prokaryotes and viruses, reveals the classical periodicity 0 modulo 3 associated with the normal frame 0 (maximal values of the function at i = 0, 3, 6, etc). The specification of YRY(N)iYRY on the alphabet [A, C, G, T] leads to 64 i-motifs: CAC(N)iCAC, CAC(N)iCAT, ..., TGF(N)iTGT. The 64 autocorrelation functions associated with these 64 i-motifs in protein coding genes have all the periodicity modulo 3, but, surprisingly, not always the expected periodicity 0 modulo 3. Two new types of periodicities are identified: a periodicity 1 modulo 3 associated with the shifted frame +1 (maximal values of the function at i = 1, 4, 7, etc) and a periodicity 2 modulo 3 associated with the shifted frame -1 (maximal values of the function at i = 2, 5, 8 etc). Furthermore, the classification of i-motifs according to the type of periodicity demonstrates a strong coherence relation between the 64 i-motifs, which is, in addition, common to the three gene populations, as the same i-motifs in the three gene populations have the same periodicities. The three periodicities 0, 1 and 2 modulo 3 can be simulated by an evolutionary model at two successive processes. The simulated genes are generated by a process of gene construction, with a stochastic automaton followed by a process of gene evolution with random insertions and deletions of trinucleotides simulating RNA editing. For almost all i-motifs, the autocorrelation functions in these simulated genes are strongly correlated with those in protein coding genes, for both the type and the probability level of periodicities. This paper describes the process of ribosomal frameshifting leading to the shifted periodicities, which may reveal overlapping genes or concatenated genes from different frames. It also presents the evolutionary aspects of the shifted periodicities. The shifted periodicities cannot be associated with the RNY model (Eigen & Schuster, 1978, Naturwissenschaften 65, 341-369) or the RRY model (Crick et al., 1976, Origins of Life 7, 389-397), but are compatible with the oligonucleotide mixing model (Arquès & Michel, 1990, Bull. math. Biol. 52, 741-772). Finally, a variant of the primitive translation model of Crick et al. (1976) is proposed to explain the shifted periodicities.


Subject(s)
Biological Evolution , Eukaryotic Cells/physiology , Genetic Code , Models, Genetic , Prokaryotic Cells/physiology , Viruses/genetics , Animals , Proteins/genetics
10.
Math Biosci ; 123(1): 103-25, 1994 Sep.
Article in English | MEDLINE | ID: mdl-7949744

ABSTRACT

The mutation process is a classical evolutionary genetic process. The type of mutations studied here is the random substitutions of a purine base R (adenine or guanine) by a pyrimidine base Y (cytosine or thymine) and reciprocally (transversions). The analytical expressions derived allow us to analyze in genes the occurrence probabilities of motifs and d-motifs (two motifs separated by any d bases) on the R/Y alphabet under transversions. These motif probabilities can be obtained after transversions (in the evolutionary sense; from the past to the present) and, unexpectedly, also before transversions (after back transversions, in the inverse evolutionary sense, from the present to the past). This theoretical part in Section 2 is a first generalization of a particular formula recently derived. The application in Section 3 is based on the analytical expression giving the autocorrelation function (the d-motif probabilities) before transversions. It allows us to study primitive genes from actual genes. This approach solves a biological problem. The protein coding genes of chloroplasts and mitochondria have a preferential occurrence of the 6-motif YRY(N)6YRY (maximum of the autocorrelation function for d = 6, N = R or Y) with a periodicity modulo 3. The YRY(N)6YRY preferential occurrence without the periodicity modulo 3 is also observed in the RNA coding genes (ribosomal, transfer, and small nuclear RNA genes) and in the noncoding genes (introns and 5' regions of eukaryotic nuclei). However, there are two exceptions to this YRY(N)6YRY rule: the protein coding genes of eukaryotic nuclei, and prokaryotes, where YRY(N)6YRY has the second highest value after YRY(N)0YRY (YRYYRY) with a periodicity modulo 3. When we go backward in time with the analytical expression, the protein coding genes of both eukaryotic nuclei and prokaryotes retrieve the YRY(N)6YRY preferential occurrence with a periodicity modulo 3 after 0.2 back transversions per base. In other words, the actual protein coding genes of chloroplasts and mitochondria are similar to the primitive protein coding genes of eukaryotic nuclei and prokaryotes. On the other hand, this application represents the first result concerning the mutation process in the model of DNA sequence evolution we recently proposed. According to this model, the actual genes on the R/Y alphabet derive from two successive evolutionary genetic processes: an independent mixing of a few nonrandom types of oligonucleotides leading to genes called primitive followed by a mutation process in these primitive genes.(ABSTRACT TRUNCATED AT 400 WORDS)


Subject(s)
Mutation , Base Sequence , Biological Evolution , DNA/genetics , Mathematics , Models, Genetic , Probability , Proteins/genetics , Purines , Pyrimidines
11.
Bull Math Biol ; 55(6): 1025-38, 1993 Nov.
Article in English | MEDLINE | ID: mdl-8281128

ABSTRACT

Recently, we proposed a new model of DNA sequence evolution (Arquès and Michel. 1990b. Bull. math. Biol. 52, 741-772) according to which actual genes on the purine/pyrimidine (R/Y) alphabet (R = purine = adenine or guanine, Y = pyrimidine = cytosine or thymine) are the result of two successive evolutionary genetic processes: (i) a mixing (independent) process of non-random oligonucleotides (words of base length less than 10: YRY(N)6, YRYRYR and YRYYRY are so far identified; N = R or Y) leading to primitive genes (words of several hundreds of base length) and followed by (ii) a random mutation process, i.e., transformations of a base R (respectively Y) into the base Y (respectively R) at random sites in these primitive genes. Following this model the problem investigated here is the study of the variation of the 8 R/Y codon probabilities RRR, ..., YYY under random mutations. Two analytical expressions solved here allow analysis of this variation in the classical evolutionary sense (from the past to the present, i.e., after random mutations), but also in the inverted evolutionary sense (from the present to the past, i.e., before random mutations). Different properties are also derived from these formulae. Finally, a few applications of these formulae are presented. They prove the proposition in Arquès and Michel (1990b. Bull. math. Biol. 52, 741-772), Section 3.3.2, with the existence of a maximal mean number of random mutations per base of the order 0.3 in the protein coding genes. They also confirm the mixing process of oligonucleotides by excluding the purine/pyrimidine contiguous and alternating tracts from the formation process of primitive genes.


Subject(s)
Codon , DNA/genetics , Mathematics , Models, Genetic , Mutation , Base Sequence , DNA/chemistry , Probability , Purines , Pyrimidines
12.
J Theor Biol ; 161(3): 329-42, 1993 Apr 07.
Article in English | MEDLINE | ID: mdl-8331957

ABSTRACT

The autocorrelation function analysing the occurrence probability of the i-motif YRY(N)iYRY in genes allows the identification of mainly two periodicities modulo 2, 3 and the preferential occurrence of the motif YRY(N)6YRY (R = purine = adenine or guanine, Y = pyrimidine = cytosine or thymine, N = R or Y). These non-random genetic statistical properties can be simulated by an independent mixing of the three oligonucleotides YRYRYR, YRYYRY and YRY(N)6 (Arquès & Michel, 1990b). The problem investigated in this study is whether new properties can be identified in genes with other autocorrelation functions and also simulated with an oligonucleotide mixing model. The two autocorrelation functions analysing the occurrence probability of the i-motifs RRR(N)iRRR and YYY(N)iYYY simultaneously identify three new non-random genetic statistical properties: a short linear decrease, local maxima for i identical to 3[6] (i = 3, 9, etc) and a large exponential decrease. Furthermore, these properties are common to three different populations of eukaryotic non-coding genes: 5' regions, introns and 3' regions (see section 2). These three non-random properties can also be simulated by an independent mixing of the four oligonucleotides R8, Y8, RRRYRYRRR, YYYRYRYYY and large alternating R/Y series. The short linear decrease is a result of R8 and Y8, the local maxima for i identical to 3[6], of RRRYRYRRR and YYYRYRYYY, and the large exponential decrease, of large alternating R/Y series (section 3). The biological meaning of these results and their relation to the previous oligonucleotide mixing model are presented in the Discussion.


Subject(s)
Genes/genetics , Models, Genetic , Models, Statistical , Animals , Biological Evolution , Mutation/genetics , Oligonucleotides/genetics , Probability
13.
Biochimie ; 75(5): 399-407, 1993.
Article in English | MEDLINE | ID: mdl-8347726

ABSTRACT

The nucleotide distribution in protein coding genes, introns and transfer RNA genes of eukaryotic subpopulations (primates, rodent and mammals) is studied by autocorrelation functions. The autocorrelation function analysing the occurrence probability of the i-motif YRY(N)iYRY (YRY-function) in protein coding genes and transfer RNA genes of these three eukaryotic subpopulations retrieves the preferential occurrence of YRY(N)6YRY (R = purine = adenine or guanine, Y = pyrimidine = cytosine or thymine, N = R or Y). The autocorrelation functions analysing the occurrence probability of the i-motifs RRR(N)iRRR (RRR-function) and YYY(N)iYYY (YYY-function) identify new non-random genetic statistical properties in these three eukaryotic subpopulations, mainly: i) in their protein coding genes: local maxima for i identical to 6 [12] (peaks for i = 6, 18, 30, 42) with the RRR-function and local maxima for i identical to 8 [10] (peaks for i = 8, 18, 28) with the YYY-function; and ii) in their introns: local maxima for i identical to 3 [6] (peaks for i = 3, 9, 15) and a short linear decrease followed by a large exponential decrease both with the RRR- and YYY-functions. The non-random properties identified in eukaryotic intron subpopulations are modelised with a process of random insertions and deletions of nucleotides simulating the RNA editing.


Subject(s)
Genes , Sequence Analysis, DNA , Animals , Base Sequence , Data Interpretation, Statistical , Introns , Mammals , Models, Genetic , Primates , Probability , Proteins/genetics , RNA Editing , RNA, Transfer/genetics , Rodentia
14.
J Theor Biol ; 156(1): 113-27, 1992 May 07.
Article in English | MEDLINE | ID: mdl-1379311

ABSTRACT

Recently, a new genetic process termed RNA editing has been identified showing insertions and deletions of nucleotides in particular RNA molecules. On the other hand, there are a few non-random statistical properties in genes: in particular, the periodicity modulo 3 (P3) associated with an open reading frame, the periodicity modulo 2 (P2) associated with alternating purine/pyrimidine stretches, the YRY(N)6YRY preferential occurrence (R = purine = adenine or guanine, Y = pyrimidine = cytosine or thymine, N = R or Y) representing a "code" of the DNA helix pitch, etc. The problem investigated here is whether a process of the type RNA editing can lead to the non-random statistical properties commonly observed in genes. This paper will show in particular that: The process of insertions and deletions of mononucleotides in the initial sequence [YRY(N)3]* [series of YRY(N)3] can lead to the periodicity modulo 2 (P2). The process of insertions and deletions of trinucleotides in the initial sequence [YRY(N)6]* [series of YRY(N)6] can lead to the periodicity modulo 3 (P3) and the YRY(N)6YRY preferential occurrence. Furthermore, these two processes lead to a strong correlation with the reality, namely the mononucleotide insertion/deletion process, with the 5' eukaryotic regions and the trinucleotide insertion/deletion process, with the eukaryotic protein coding genes.


Subject(s)
Computer Simulation , Models, Genetic , Mutagenesis/genetics , RNA/genetics , Chromosome Deletion , Humans
15.
Comput Appl Biosci ; 8(1): 5-14, 1992 Feb.
Article in English | MEDLINE | ID: mdl-1568126

ABSTRACT

The software AGE (Analysis of Gene Evolution) has been developed both to study a genetic reality, i.e. the identification of statistical properties in genes (e.g. periodicities), and to simulate this observed genetic reality, by models of molecular evolution. AGE has two types of models: (i) models of sequence creation from oligonucleotides: concatenation model in series of an oligonucleotide, independent (or Markov) mixing model of oligonucleotides according to given probabilities (or a Markov matrix); (ii) models of sequence evolution from created sequences: insertion/deletion process of (mono,di,tri)nucleotides, base mutation process. The study of a reality and the development of simulation models are based on several new algorithms: approximated simulation and exact calculus to compute various autocorrelation functions, Fourier transformation of autocorrelation curves, recognition of a curve form, etc. AGE is implemented on IBM or compatible microcomputers and can be used by biologists without any computer knowledge to identify statistical properties in their newly determined DNA sequence and to explain them by models of molecular evolution.


Subject(s)
Biological Evolution , Genes , Software , Algorithms , Base Sequence , DNA/genetics , Models, Genetic , Molecular Biology , Oligonucleotides/genetics , Repetitive Sequences, Nucleic Acid
16.
J Theor Biol ; 143(3): 307-18, 1990 Apr 05.
Article in English | MEDLINE | ID: mdl-2385108

ABSTRACT

Gene population statistical studies of protein coding genes and introns have identified two types of periodicities on the purine/pyrimidine alphabet: (i) the modulo 3 periodicity or coding periodicity (periodicity P3) in protein coding genes of eukaryotes, prokaryotes, viruses, chloroplasts, mitochondria, plasmids and in introns of viruses and mitochondria, and (ii) the modulo 2 periodicity (periodicity P2) in the eukaryotic introns. The periodicity study is herein extended to the 5' and 3' regions of eukaryotes, prokaryotes and viruses and shows: (i) the periodicity P3 in the 5' and 3' regions of eukaryotes. Therefore, these observations suggest a unitary and dynamic concept for the genes as for a given genome, the 5' and 3' regions have the genetic information for protein coding genes and for introns: (1) In the eukaryotic genome, the 5' (P2 and P3) and 3' (P2 and P3) regions have the information for protein coding genes (P3) and for introns (P2). The intensity of P3 is high in 5' regions and weak in 3' regions, while the intensity of P2 is weak in 5' regions and high in 3' regions. (2) In the prokaryotic genome, the 5' (P3) and 3' (P3) regions have the information for protein coding genes (P3). (3) In the viral genome, the 5' (P3) and 3' (P3) regions have the information for protein coding genes (P3) and for introns (P3). The absence of P2 in viral introns (in opposition to eukaryotic introns) may be related to the absence of P2 in 5' and 3' regions of viruses.


Subject(s)
Genes , Repetitive Sequences, Nucleic Acid , Animals , Base Sequence , Eukaryotic Cells , Introns , Prokaryotic Cells , Purine Nucleotides , Pyrimidine Nucleotides , Viruses/genetics
17.
Bull Math Biol ; 52(6): 741-72, 1990.
Article in English | MEDLINE | ID: mdl-2279193

ABSTRACT

Statistical studies of gene populations on the purine/pyrimidine alphabet have shown that the mean occurrence probability of the i-motif YRY(N)iYRY (R = purine, Y = pyrimidine, N = R or Y) is not uniform by varying i in the range, but presents a maximum at i = 6 in the following populations: protein coding genes of eukaryotes, prokaryotes, chloroplasts and mitochondria, and also viral introns, ribosomal RNA genes and transfer RNA genes (Arquès and Michel, 1987b, J. theor. Biol. 128, 457-461). From the "universality" of this observation, we suggested that the oligonucleotide YRY(N)6 is a primitive one and that it has a central function in DNA sequence evolution (Arquès and Michel, 1987b, J. theor. Biol. 128, 457-461). Following this idea, we introduce a concept of a model of DNA sequence evolution which will be validated according to a schema presented in three parts. In the first part, using the last version of the gene database, the YRY(N)6YRY preferential occurrence (maximum at i = 6) is confirmed for the populations mentioned above and is extended to some newly analysed populations: chloroplast introns, chloroplast 5' regions, mitochondrial 5' regions and small nuclear RNA genes. On the other hand, the YRY(N)6YRY preferential occurrence and periodicities are used in order to classify 18 gene populations. In the second part, we will demonstrate that several statistical features characterizing different gene populations (in particular the YRY(N)6YRY preferential occurrence and the periodicities) can be retrieved from a simple Markov model based on the mixing of the two oligonucleotides YRY(N)6 and YRY(N)3 and based on the percentages of RYR and YRY in the unspecified trinucleotides (N)3 of YRY(N)6 and YRY(N)3. Several properties are identified and prove in particular that the oligonucleotide mixing is an independent process and that several different features are functions of a unique parameter. In the third part, the return of the model to the reality shows a strong correlation between reality and simulation concerning the presence of a large alternating purine/pyrimidine stretches and of periodicities. It also contributes to a greater understanding of biological reality, e.g. the presence or the absence of large alternating purine/pyrimidine stretches can be explained as being a simple consequence of the mixing of two particular oligonucleotides. Finally, we believe that such an approach is the first step toward a unified model of DNA sequence evolution allowing the molecular understanding of both the origin of life and the actual biological reality.


Subject(s)
DNA/genetics , Models, Genetic , Base Sequence , Biological Evolution , Genetics, Population
18.
J Theor Biol ; 128(4): 457-61, 1987 Oct 21.
Article in English | MEDLINE | ID: mdl-3446957

ABSTRACT

A statistical parameter identifies, with a high degree of significance, a motif which is present in protein-coding sequences of eukaryotes, prokaryotes, chloroplasts, mitochondria, viral introns, ribosomal RNA genes, and transfer RNA genes. The random probability of occurrence of such a situation is 10(-12). This motif has the following properties: (i) its significant presence in almost all present-day genes explains why it can be considered as primitive oligonucleotide, (ii) its nucleotide order is: YRY (N)6YRY, R being a purine base, Y a pyrimidine one and N any base, (iii) its length and its terminal trinucleotides YRY suggest a primordial function related to the spatial structure of the DNA sequences. This motif is found in some viral protein-coding genes, but not in eukaryotic introns.


Subject(s)
Genes , Purines , Pyrimidines , Base Sequence , DNA/classification , Phylogeny , Probability
19.
Nucleic Acids Res ; 15(18): 7581-92, 1987 Sep 25.
Article in English | MEDLINE | ID: mdl-3658704

ABSTRACT

The sequence information for the splicing process of introns is found in the consensus sequences at the two splice sites. For long introns, of 300 or more nucleotides, the middle regions may provide additional specificity for splicing which can be investigated by defining an adequate quantitative parameter. This methodology permits to retrieve the coding periodicity in the viral and mitochondrial introns and to identify with a statistical significance, a surprising alternating purine-pyrimidine base sequence -i.e. a modulo 2 periodicity- in the eukaryotic introns, and particularly in the vertebrate introns. This alternating structure suggests that the vertebrate introns do not have the genetic information to code for proteins, they carry structural and regulatory functions.


Subject(s)
Introns , RNA Splicing , Models, Genetic
20.
Gene ; 44(1): 147-50, 1986.
Article in English | MEDLINE | ID: mdl-2945762

ABSTRACT

We have found that the amino acid (aa) sequence of the tip of phage T4 tail fibre (gene 37) shows more than 50% homology with the aa sequence predicted from an open reading frame (ORF314) in the phage lambda genome. ORF314 is near the 3' end of the late morphogenetic operon, beyond gene J coding for the lambda tail fibre. The homologous sequences are for the most part composed of repeated aa, the most remarkable of which is a Gly-X-His-Y-His motif where X and Y are small, uncharged aa, found six times in the T4 protein and seven times in the lambda ORF314 sequence.


Subject(s)
Bacteriophage lambda/genetics , Escherichia coli/genetics , Genes, Viral , Genes , Operon , T-Phages/genetics , Viral Proteins/genetics , Amino Acid Sequence , Repetitive Sequences, Nucleic Acid , Sequence Homology, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...