Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Gene ; 543(1): 45-52, 2014 Jun 10.
Article in English | MEDLINE | ID: mdl-24709107

ABSTRACT

The standard classification scheme of the genetic code is organized for alphabetic ordering of nucleotides. Here we introduce the new, "ideal" classification scheme in compact form, for the first time generated by codon sextets encoding Ser, Arg and Leu amino acids. The new scheme creates the known purine/pyrimidine, codon-anticodon, and amino/keto type symmetries and a novel A+U rich/C+G rich symmetry. This scheme is built from "leading" and "nonleading" groups of 32 codons each. In the ensuing 4 × 16 scheme, based on trinucleotide quadruplets, Ser has a central role as initial generator. Six codons encoding Ser and six encoding Arg extend continuously along a linear array in the "leading" group, and together with four of six Leu codons uniquely define construction of the "leading" group. The remaining two Leu codons enable construction of the "nonleading" group. The "ideal" genetic code suggests the evolution of genetic code with serine as an initiator.


Subject(s)
Amino Acids/genetics , Codon/chemistry , Codon/classification , Genetic Code/genetics , Nucleic Acid Conformation , Amino Acid Sequence/physiology , Base Sequence/physiology , Codon/genetics , Evolution, Molecular , Models, Genetic , Serine/genetics
2.
Gene ; 531(2): 184-90, 2013 Dec 01.
Article in English | MEDLINE | ID: mdl-24042127

ABSTRACT

The origin and logic of genetic code are two of greatest mysteries of life sciences. Analyzing DNA sequences we showed that the start/stop trinucleotides have broader importance than just marking start and stop of exons in coding DNA. On this basis, here we introduced new classification of trinucleotides and showed that all A+T rich trinucleotides consisting of three different nucleotides arise from start-ATG, stop-TGA and stop-TAG using their complement, reverse complement and reverse transformations. Due to the same transformations during generations of crossing-over they can switch from one form to the other. By direct process the start-ATG and stop-TAG can irreversibly transform into stop-TAA. By transformation into A+T rich trinucleotides and 16/32 C+G rich they can lose the start/stop function and take the role of a sense codon in reversible way. The remaining 16 C+G trinucleotides cannot directly transform into start/stop trinucleotides and thus remain a firm skeleton for structuring the C+G rich DNA. We showed that start/stops strongly enrich the A+T rich noncoding DNA through frequently extended forms. From the evolutionary viewpoint the start/stops are chief creators of prevailing A+T rich noncoding DNA, and of more stable coding DNA. We propose that start/stops have basic role as "seeds" in trinucleotide evolution of noncoding and coding sequences and lead to asymmetry between A+T and C+G rich DNA. By dynamical transformations during evolution they enabled pronounced phylogenetic broadness, keeping the regulator function.


Subject(s)
Codon, Initiator/genetics , Codon, Terminator/genetics , Codon/classification , Genetic Code/physiology , Regulatory Sequences, Nucleic Acid/physiology , Base Sequence/physiology , Codon/genetics , DNA/genetics , Evolution, Molecular , Fossils , Humans , Molecular Sequence Data , Saccharomyces cerevisiae/genetics
3.
Article in English | MEDLINE | ID: mdl-19179707

ABSTRACT

A novel approach for gene classification, which adopts codon usage bias as input feature vector for classification by support vector machines (SVM) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector where each element corresponds to the relative synonymous usage frequency of a codon. As the input to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be classified are of different lengths, a condition that homology-based methods tend to fail. The method is demonstrated by using 1,841 Human Leukocyte Antigen (HLA) sequences which are classified into two major classes: HLA-I and HLA-II; each major class is further subdivided into sub-groups of HLA-I and HLA-II molecules. Using codon usage frequencies, binary SVM achieved accuracy rate of 99.3% for HLA major class classification and multi-class SVM achieved accuracy rates of 99.73% and 98.38% for sub-class classification of HLA-I and HLA-II molecules, respectively. The results show that gene classification based on codon usage bias is consistent with the molecular structures and biological functions of HLA molecules.


Subject(s)
Artificial Intelligence , Codon/classification , Genes, MHC Class II , Genes, MHC Class I , Genes , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Algorithms , Codon/genetics , Databases, Genetic , Discriminant Analysis , Genetic Code , HLA Antigens/classification , HLA Antigens/genetics , Humans , Major Histocompatibility Complex/genetics , Normal Distribution , Reproducibility of Results
4.
Genomics ; 89(5): 596-601, 2007 May.
Article in English | MEDLINE | ID: mdl-17234378

ABSTRACT

Based on the huge variety of different genomes, one may expect a correspondingly large variety of the frequency distribution of their trinucleotides ("triplet profiles"). Yet, this article reports the unexpected finding that there are essentially only three kinds of triplet profiles among the large number of genomes examined here. None of the classes included random profiles, all of them contained members from vastly different taxa and species. Since the three classes of genomes do not reflect the phylogeny of their member organisms, I propose that these classes may reflect species-independent mechanisms of genome evolution.


Subject(s)
Classification/methods , Codon/classification , Genome , Genomics/classification , Phylogeny , Animals , Evolution, Molecular , Mitochondria/genetics , Organelles/genetics , Plants/genetics
5.
Mol Plant Microbe Interact ; 19(12): 1322-8, 2006 Dec.
Article in English | MEDLINE | ID: mdl-17153916

ABSTRACT

In all, 238 and 155 transfer (t)RNA genes were predicted from the genomes of Phytophthora sojae and P. ramorum, respectively. After omitting pseudogenes and undetermined types of tRNA genes, there remained 208 P. sojae tRNA genes and 140 P. ramorum tRNA genes. There were 45 types of tRNA genes, with distinct anticodons, in each species. Fourteen common anticodon types of tRNAs are missing altogether from the genome in the two species; however, these appear to be compensated by wobbling of other tRNA anticodons in a manner which is tied to the codon bias in Phytophthora genes. The most abundant tRNA class was arginine in both P. sojae and P. ramorum. A codon usage table was generated for these two organisms from a total of 9,803,525 codons in P. sojae and 7,496,598 codons in P. ramorum. The most abundant codon type detected from the codon usage tables was GAG (encoding glutamic acid), whereas the most numerous tRNA gene had a methionine anticodon (CAT). The correlation between the frequencies of tRNA genes and the codon frequencies in protein-coding genes was very low (0.12 in P. sojae and 0.19 in P. ramorum); however, the correlation between amino acid tRNA gene frequency and the corresponding amino acid codon frequency in P. sojae and P. ramorum was substantially higher (0.53 in P. sojae and 0.77 in P. ramorum). The codon usage frequencies of P. sojae and P ramorum were very strongly correlated (0.99), as were tRNA gene frequencies (0.77). Approximately 60% of orthologous tRNA gene pairs in P sojae and P. ramorum are located in regions that have conserved synteny in the two species.


Subject(s)
Codon/classification , Genome , Phytophthora/genetics , RNA, Transfer/genetics , Anticodon/classification , Codon/physiology , Gene Dosage , RNA, Transfer/classification , Synteny
6.
Orig Life Evol Biosph ; 35(3): 275-95, 2005 Jun.
Article in English | MEDLINE | ID: mdl-16228642

ABSTRACT

To explore how chemical structures of both nucleobases and amino acids may have played a role in shaping the genetic code, numbers of sp2 hybrid nitrogen atoms in nucleobases were taken as a determinative measure for empirical stereo-electronic property to analyze the genetic code. Results revealed that amino acid hydropathy correlates strongly with the sp2 nitrogen atom numbers in nucleobases rather than with the overall electronic property such as redox potentials of the bases, reflecting that stereo-electronic property of bases may play a role. In the rearranged code, five simple but stereo-structurally distinctive amino acids (Gly, Pro, Val, Thr and Ala) and their codon quartets form a crossed intersection "core". Secondly, a re-categorization of the amino acids according to their beta-carbon stereochemistry, verified by charge density (at beta-carbon) calculation, results in five groups of stereo-structurally distinctive amino acids, the group leaders of which are Gly, Pro, Val, Thr and Ala, remarkably overlapping the above "core". These two lines of independent observations provide empirical arguments for a contention that a seemingly "frozen" "core" could have formed at a certain evolutionary stage. The possible existence of this codon "core" is in conformity with a previous evolutionary model whereby stereochemical interactions may have shaped the code. Moreover, the genetic code listed in UCGA succession together with this codon "core" has recently facilitated an identification of the unprecedented icosikaioctagon symmetry and bi-pyramidal nature of the genetic code.


Subject(s)
Amino Acids/chemistry , Codon/chemistry , Evolution, Molecular , Amino Acids/classification , Codon/classification
7.
J Mol Evol ; 59(5): 598-605, 2004 Nov.
Article in English | MEDLINE | ID: mdl-15693616

ABSTRACT

Since the early days of the discovery of the genetic code nonrandom patterns have been searched for in the code in the hope of providing information about its origin and early evolution. Here we present a new classification scheme of the genetic code that is based on a binary representation of the purines and pyrimidines. This scheme reveals known patterns more clearly than the common one, for instance, the classification of strong, mixed, and weak codons as well as the ordering of codon families. Furthermore, new patterns have been found that have not been described before: Nearly all quantitative amino acid properties, such as Woese's polarity and the specific volume, show a perfect correlation to Lagerkvist's codon-anticodon binding strength. Our new scheme leads to new ideas about the evolution of the genetic code. It is hypothesized that it started with a binary doublet code and developed via a quaternary doublet code into the contemporary triplet code. Furthermore, arguments are presented against suggestions that a "simpler" code, where only the midbase was informational, was at the origin of the genetic code.


Subject(s)
Codon/classification , Codon/genetics , Genetic Code/genetics , Amino Acids/genetics , Anticodon/genetics , Base Sequence , Evolution, Molecular , Models, Genetic , Nucleotides/genetics
8.
Biochem Biophys Res Commun ; 312(2): 285-91, 2003 Dec 12.
Article in English | MEDLINE | ID: mdl-14637134

ABSTRACT

In species having a strong correlation of expressivity and codon bias it has been shown that heterologous expression can be optimized by changing codons of the introduced gene towards the set of codons that the host organism naturally uses in its highly expressed genes. Even though two lactic acid bacteria are fully sequenced, there are no reports on attempts of codon optimization in the literature. In this report it is demonstrated that codons used in highly expressed genes tend to differ from the codons in lowly expressed genes, and that there is a strong correlation of codon bias and empirical expressivity (codon adaptation index) in Lactococcus lactis and Lactobacillus plantarum. This strongly suggests that codon optimization strategies could be applied to expression systems with lactic acid bacteria as producer strains. A good example of a candidate for codon optimization is the mouse interleukin-2 gene, which in its natural form has an extremely low codon adaptation index for expression in Lc. lactis.


Subject(s)
Codon/genetics , Gene Expression Profiling/methods , Gene Expression Regulation, Bacterial/genetics , Lactobacillus/genetics , Lactococcus lactis/genetics , Models, Genetic , Protein Engineering/methods , Recombinant Proteins/genetics , Algorithms , Animals , Codon/classification , Computer Simulation , Genetic Variation , Interleukin-2/genetics , Interleukin-2/metabolism , Lactobacillus/metabolism , Lactococcus lactis/metabolism , Mice , Quality Control , Recombinant Proteins/metabolism , Species Specificity
9.
Bioinformatics ; 19(8): 987-98, 2003 May 22.
Article in English | MEDLINE | ID: mdl-12761062

ABSTRACT

MOTIVATION: The effect of two neighboring codons (codon pairs) on gene expression is mediated via the interaction of their cognate tRNAs occupying the two functional ribosomal sites during the translation elongation step. For steric reasons it is reasonable to assume that not all combinations of codons and therefore of tRNAs are equally favorable when situated on the ribosome surface. Aiming of identifying preferential and rare codon pairs, we have determined the frequency of occurrence of all possible combinations of codon pairs in the entire genome of Escherichia coli (E.coli). RESULTS: The frequency of occurrence of the 3904 codon pairs comprising both sense:sense and sense:stop codon pairs in the full set of E.coli 4289 ORFs was found to vary from zero to 4913 times. For most of the pairs we have observed a significant difference between the real and statistically predicted frequency of occurrence. The analysis of 334 highly expressed and 303 poorly expressed E.coli genes showed that codon pair usage is different for the two gene categories. Using an especially defined criterion (Delta(REG)), the codon pairs are classified as 'hypothetically attenuating' (HAP) and 'hypothetically non-attenuating' (HNAP) and their possible effect on translation is discussed. AVAILABILITY: The program used in this study is available at http://www.bio21.bas.bg/codonpairs/


Subject(s)
Codon/classification , Codon/genetics , Escherichia coli/genetics , Genome, Bacterial , Protein Biosynthesis/genetics , Sequence Analysis, DNA/methods , Algorithms , Databases, Nucleic Acid , Gene Expression Regulation, Bacterial/genetics , Gene Frequency/genetics
10.
Phys Rev E Stat Nonlin Soft Matter Phys ; 65(2 Pt 1): 021912, 2002 Feb.
Article in English | MEDLINE | ID: mdl-11863568

ABSTRACT

Group theoretical concepts are invoked in a specific model to explain how only twenty amino acids occur in nature out of a possible sixty four. The methods we use enable us to justify the occurrence of the recently discovered 21st amino acid selenocysteine, and also enables us to predict the possible existence of two more, as yet undiscovered amino acids.


Subject(s)
Amino Acids/genetics , Codon/genetics , Models, Genetic , Biophysical Phenomena , Biophysics , Codon/classification , Protein Biosynthesis , Proteins/genetics
11.
J Mol Biol ; 285(5): 1977-91, 1999 Feb 05.
Article in English | MEDLINE | ID: mdl-9925779

ABSTRACT

While genomic sequences are accumulating, finding the location of the genes remains a major issue that can be solved only for about a half of them by homology searches. Prediction methods are thus required, but unfortunately are not fully satisfying. Most prediction methods implicitly assume a unique model for genes. This is an oversimplification as demonstrated by the possibility to group coding sequences into several classes in Escherichia coli and other genomes. As no classification existed for Arabidopsis thaliana, we classified genes according to the statistical features of their coding sequences. A clustering algorithm using a codon usage model was developed and applied to coding sequences from A. thaliana, E. coli, and a mixture of both. By using it, Arabidopsis sequences were clustered into two classes. The CU1 and CU2 classes differed essentially by the choice of pyrimidine bases at the codon silent sites: CU2 genes often use C whereas CU1 genes prefer T. This classification discriminated the Arabidopsis genes according to their expressiveness, highly expressed genes being clustered in CU2 and genes expected to have a lower expression, such as the regulatory genes, in CU1. The algorithm separated the sequences of the Escherichia-Arabidopsis mixed data set into five classes according to the species, except for one class. This mixed class contained 89 % Arabidopsis genes from CU1 and 11 % E. coli genes, mostly horizontally transferred. Interestingly, most genes encoding organelle-targeted proteins, except the photosynthetic and photoassimilatory ones, were clustered in CU1. By tailoring the GeneMark CDS prediction algorithm to the observed coding sequence classes, its quality of prediction was greatly improved. Similar improvement can be expected with other prediction systems.


Subject(s)
Arabidopsis/genetics , Codon/classification , Genes, Plant , Models, Genetic , Algorithms , Arabidopsis/classification , Cell Nucleus/genetics , Classification/methods , Exons , Gene Expression , Organelles/genetics
12.
Cladistics ; 12: 65-82, 1996.
Article in English | MEDLINE | ID: mdl-11541749

ABSTRACT

Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic hypothesis while the other character set may impact information to a hypothesis. These two possibilities are cases of non-independence, however, we argue that congruence in such cases can be thought of as increasing the weight of the particular phylogenetic hypothesis that is supported by those characters. In the third case, the two sources of character information for a particular codon may be entirely incongruent with respect to phylogenetic hypotheses concerning the taxa examined. In this last case the two character sets are independent in that information from neither can predict the character states of the other. Examples of these possibilities are discussed and the general applicability of combining these two sources of information for protein coding genes is presented using sequences from the homeobox region of 46 homeobox genes from Drosophila melanogaster to develop a hypothesis of genealogical relationship of these genes in this large multigene family.


Subject(s)
Drosophila melanogaster/genetics , Genes, Homeobox/genetics , Genes, Insect , Phylogeny , Sequence Homology, Amino Acid , Sequence Homology, Nucleic Acid , Algorithms , Amino Acid Sequence/genetics , Animals , Base Sequence/genetics , Codon/classification , Codon/genetics , Drosophila melanogaster/classification , Evolution, Molecular , Models, Genetic
14.
Microbiol Rev ; 56(1): 229-64, 1992 Mar.
Article in English | MEDLINE | ID: mdl-1579111

ABSTRACT

The genetic code, formerly thought to be frozen, is now known to be in a state of evolution. This was first shown in 1979 by Barrell et al. (G. Barrell, A. T. Bankier, and J. Drouin, Nature [London] 282:189-194, 1979), who found that the universal codons AUA (isoleucine) and UGA (stop) coded for methionine and tryptophan, respectively, in human mitochondria. Subsequent studies have shown that UGA codes for tryptophan in Mycoplasma spp. and in all nonplant mitochondria that have been examined. Universal stop codons UAA and UAG code for glutamine in ciliated protozoa (except Euplotes octacarinatus) and in a green alga, Acetabularia. E. octacarinatus uses UAA for stop and UGA for cysteine. Candida species, which are yeasts, use CUG (leucine) for serine. Other departures from the universal code, all in nonplant mitochondria, are CUN (leucine) for threonine (in yeasts), AAA (lysine) for asparagine (in platyhelminths and echinoderms), UAA (stop) for tyrosine (in planaria), and AGR (arginine) for serine (in several animal orders) and for stop (in vertebrates). We propose that the changes are typically preceded by loss of a codon from all coding sequences in an organism or organelle, often as a result of directional mutation pressure, accompanied by loss of the tRNA that translates the codon. The codon reappears later by conversion of another codon and emergence of a tRNA that translates the reappeared codon with a different assignment. Changes in release factors also contribute to these revised assignments. We also discuss the use of UGA (stop) as a selenocysteine codon and the early history of the code.


Subject(s)
Biological Evolution , Codon/classification , Genetic Code , Base Sequence , Cell Nucleus/chemistry , Mitochondria/chemistry , Molecular Sequence Data , Nucleic Acid Conformation
15.
Physiologie ; 23(3): 209-12, 1986.
Article in English | MEDLINE | ID: mdl-3095864

ABSTRACT

According to a criterion of symmetry-asymmetry, the triplets of the genetic code can be divided into four classes. In the genes of viruses and human mitochondria, the frequency by which a codon is followed by a codon of the same class is higher than that theoretically estimated. This is the consequence of the fact that in an initial stage of evolution many codons were duplicated.


Subject(s)
Codon/classification , RNA, Messenger/classification , Base Sequence , Genes, Viral , Genetic Code , Humans , Mitochondria/analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...