Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 189
Filter
1.
J Bacteriol ; 183(17): 5025-40, 2001 Sep.
Article in English | MEDLINE | ID: mdl-11489855

ABSTRACT

Predicted highly expressed (PHX) genes are characterized for the completely sequenced genomes of the four fast-growing bacteria Escherichia coli, Haemophilus influenzae, Vibrio cholerae, and Bacillus subtilis. Our approach to ascertaining gene expression levels relates to codon usage differences among certain gene classes: the collection of all genes (average gene), the ensemble of ribosomal protein genes, major translation/transcription processing factors, and genes for polypeptides of chaperone/degradation complexes. A gene is predicted highly expressed (PHX) if its codon frequencies are close to those of the ribosomal proteins, major translation/transcription processing factor, and chaperone/degradation standards but strongly deviant from the average gene codon frequencies. PHX genes identified by their codon usage frequencies among prokaryotic genomes commonly include those for ribosomal proteins, major transcription/translation processing factors (several occurring in multiple copies), and major chaperone/degradation proteins. Also PHX genes generally include those encoding enzymes of essential energy metabolism pathways of glycolysis, pyruvate oxidation, and respiration (aerobic and anaerobic), genes of fatty acid biosynthesis, and the principal genes of amino acid and nucleotide biosyntheses. Gene classes generally not PHX include most repair protein genes, virtually all vitamin biosynthesis genes, genes of two-component sensor systems, most regulatory genes, and most genes expressed in stationary phase or during starvation. Members of the set of PHX aminoacyl-tRNA synthetase genes contrast sharply between genomes. There are also subtle differences among the PHX energy metabolism genes between E. coli and B. subtilis, particularly with respect to genes of the tricarboxylic acid cycle. The good agreement of PHX genes of E. coli and B. subtilis with high protein abundances, as assessed by two-dimensional gel determination, is verified. Relationships of PHX genes with stoichiometry, multifunctionality, and operon structures are also examined. The spatial distribution of PHX genes within each genome reveals clusters and significantly long regions without PHX genes.


Subject(s)
Bacillus subtilis/genetics , Escherichia coli/genetics , Genes, Bacterial/genetics , Haemophilus influenzae/genetics , Vibrio cholerae/genetics , Amino Acyl-tRNA Synthetases/genetics , Codon , Energy Metabolism/genetics , Gene Expression Regulation, Bacterial , Gene Frequency , Internet , Models, Genetic , Open Reading Frames , Protein Biosynthesis , Signal Transduction/genetics , Transcription, Genetic , Vitamins/biosynthesis
2.
Trends Microbiol ; 9(7): 335-43, 2001 Jul.
Article in English | MEDLINE | ID: mdl-11435108

ABSTRACT

A gene in a genome is defined as putative alien (pA) if its codon usage difference from the average gene exceeds a high threshold and codon usage differences from ribosomal protein genes, chaperone genes and protein-synthesis-processing factors are also high. pA gene clusters in bacterial genomes are relevant for detecting genomic islands (GIs), including pathogenicity islands (PAIs). Four other analyses appropriate to this task are G+C genome variation (the standard method); genomic signature divergences (dinucleotide bias); extremes of codon bias; and anomalies of amino acid usage. For example, the cagA domain of Helicobacter pylori is highly deviant in its genome signature and codon bias from the rest of the genome. Using these methods we can detect two potential PAIs in the Neisseria meningitidis genome, which contain hemagglutinin and/or hemolysin-related genes. Additionally, G+C variation and genome signature differences of the Mycobacterium tuberculosis genome indicate two pA gene clusters.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Multigene Family , Bacteria/pathogenicity , Base Sequence , DNA, Bacterial , Sequence Analysis, DNA
4.
Genome Res ; 11(4): 540-6, 2001 Apr.
Article in English | MEDLINE | ID: mdl-11282969

ABSTRACT

We examined dinucleotide relative abundances and their biases in recent sequences of eukaryotic genomes and chromosomes, including human chromosomes 21 and 22, Saccharomyces cerevisiae, Arabidopsis thaliana, and Drosophila melanogaster. We found that dinucleotide relative abundances are remarkably constant across human chromosomes and within the DNA of a particular species. The dinucleotide biases differ between species, providing a genome signature that is characteristic of the bulk properties of an organism's DNA. We detail the relations between species genome signatures and suggest possible mechanisms for their origin and maintenance.


Subject(s)
Dinucleotide Repeats/genetics , Eukaryotic Cells/chemistry , Genes, Helminth/genetics , Genes, Insect/genetics , Genome , Animals , Base Composition , Base Pairing , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Genome, Fungal , Genome, Human , Genome, Protozoan , Humans , Leishmania major/genetics , Mice , Plasmodium falciparum/genetics , Saccharomyces cerevisiae/genetics , Species Specificity
5.
Proc Natl Acad Sci U S A ; 98(9): 5240-5, 2001 Apr 24.
Article in English | MEDLINE | ID: mdl-11296249

ABSTRACT

Predicted highly expressed (PHX) and putative alien genes determined by codon usages are characterized in the genome of Deinococcus radiodurans (strain R1). Deinococcus radiodurans (DEIRA) can survive very high doses of ionizing radiation that are lethal to virtually all other organisms. It has been argued that DEIRA is endowed with enhanced repair systems that provide protection and stability. However, predicted expression levels of DNA repair proteins with the exception of RecA tend to be low and do not distinguish DEIRA from other prokaryotes. In this paper, the capability of DEIRA to resist extreme doses of ionizing and UV radiation is attributed to an unusually high number of PHX chaperone/degradation, protease, and detoxification genes. Explicitly, compared with all current complete prokaryotic genomes, DEIRA contains the greatest number of PHX detoxification and protease proteins. Other sources of environmental protection against severe conditions of UV radiation, desiccation, and thermal effects for DEIRA are the several S-layer (surface structure) PHX proteins. The top PHX gene of DEIRA is the multifunctional tricarboxylic acid (TCA) gene aconitase, which, apart from its role in respiration, also alerts the cell to oxidative damage.


Subject(s)
DNA Damage/radiation effects , Genes, Bacterial/genetics , Radiation Tolerance/genetics , Thermus/genetics , Thermus/radiation effects , Chromosomes, Bacterial/genetics , Codon/genetics , DNA Damage/genetics , DNA Repair/genetics , Desiccation , Endopeptidases/metabolism , Escherichia coli/enzymology , Escherichia coli/genetics , Genetic Code/genetics , Molecular Chaperones/metabolism , Multigene Family/genetics , Physical Chromosome Mapping , Radiation, Ionizing , Thermus/enzymology , Thermus/metabolism , Ultraviolet Rays
6.
Nucleic Acids Res ; 29(7): 1590-601, 2001 Apr 01.
Article in English | MEDLINE | ID: mdl-11266562

ABSTRACT

Comparisons of codon frequencies of genes to several gene classes are used to characterize highly expressed and alien genes on the SYNECHOCYSTIS: PCC6803 genome. The primary gene classes include the ensemble of all genes (average gene), ribosomal protein (RP) genes, translation processing factors (TF) and genes encoding chaperone/degradation proteins (CH). A gene is predicted highly expressed (PHX) if its codon usage is close to that of the RP/TF/CH standards but strongly deviant from the average gene. Putative alien (PA) genes are those for which codon usage is significantly different from all four classes of gene standards. In SYNECHOCYSTIS:, 380 genes were identified as PHX. The genes with the highest predicted expression levels include many that encode proteins vital for photosynthesis. Nearly all of the genes of the RP/TF/CH gene classes are PHX. The principal glycolysis enzymes, which may also function in CO(2) fixation, are PHX, while none of the genes encoding TCA cycle enzymes are PHX. The PA genes are mostly of unknown function or encode transposases. Several PA genes encode polypeptides that function in lipopolysaccharide biosynthesis. Both PHX and PA genes often form significant clusters (operons). The proteins encoded by PHX and PA genes are described with respect to functional classifications, their organization in the genome and their stoichiometry in multi-subunit complexes.


Subject(s)
Cyanobacteria/genetics , Gene Expression , Genome , Codon/genetics , Open Reading Frames/genetics , Peptide Elongation Factors/genetics , Photosynthesis/genetics , Protein Biosynthesis , Ribosomal Proteins/genetics , Transcription Factors/genetics
7.
Proc Natl Acad Sci U S A ; 97(21): 11348-53, 2000 Oct 10.
Article in English | MEDLINE | ID: mdl-11027334

ABSTRACT

Heat shock proteins 60 (GroEL) are highly expressed essential proteins in eubacterial genomes and in eukaryotic organelles. These chaperone proteins have been advanced as propitious marker sequences for tracing the evolution of mitochondrial (Mt) genomes. Similarities among HSP60 sequences based on significant segment pair alignment calculations are used to deduce associations of sequences taking into account GroEL functional/structural domain differences and to relate HSP60 duplications pervasive in alpha-proteobacterial lineages to the dynamics of lateral transfer and plasmid integration. Multiple alignments with consensuses are determined for 10 natural groups. The group consensuses sharpen the similarity contrasts among individual sequences. In particular, the Mt group matches best with the classical alpha-proteobacteria and closely with Rickettsia but significantly worse with the rickettsial groups Ehrlichia and Orientia. However, across broad protein sequence comparisons, there appears to be no consistent prokaryote whose protein sequences align best with animal Mt genomes. There are plausible scenarios indicating that the nuclear-encoded HSP60 (and HSP70) sequences functioning in Mt are results of lateral transfer and are probably derived from an alpha-proteobacterium. This hypothesis relates to the plethora of duplicated HSP60 sequences among the classical alpha-proteobacteria contrasted with no duplications of HSP60 among other clades of proteobacterial genomes. Evolutionary relations are confounded by differential selection pressures, convergence, variable mutational rates, site variability, and lateral gene transfer.


Subject(s)
Chaperonin 60/genetics , Evolution, Molecular , Gene Duplication , Gene Transfer Techniques , Mitochondria/genetics , Sequence Homology, Nucleic Acid , Bacteria/genetics , Fungi/genetics , Plants/genetics
8.
J Bacteriol ; 182(18): 5238-50, 2000 Sep.
Article in English | MEDLINE | ID: mdl-10960111

ABSTRACT

Our approach in predicting gene expression levels relates to codon usage differences among gene classes. In prokaryotic genomes, genes that deviate strongly in codon usage from the average gene but are sufficiently similar in codon usage to ribosomal protein genes, to translation and transcription processing factors, and to chaperone-degradation proteins are predicted highly expressed (PHX). By these criteria, PHX genes in most prokaryotic genomes include those encoding ribosomal proteins, translation and transcription processing factors, and chaperone proteins and genes of principal energy metabolism. In particular, for the fast-growing species Escherichia coli, Vibrio cholerae, Bacillus subtilis, and Haemophilus influenzae, major glycolysis and tricarboxylic acid cycle genes are PHX. In Synechocystis, prime genes of photosynthesis are PHX, and in methanogens, PHX genes include those essential for methanogenesis. Overall, the three protein families-ribosomal proteins, protein synthesis factors, and chaperone complexes-are needed at many stages of the life cycle, and apparently bacteria have evolved codon usage to maintain appropriate growth, stability, and plasticity. New interpretations of the capacity of Deinococcus radiodurans for resistance to high doses of ionizing radiation is based on an excess of PHX chaperone-degradation genes and detoxification genes. Expression levels of selected classes of genes, including those for flagella, electron transport, detoxification, histidine kinases, and others, are analyzed. Flagellar PHX genes are conspicuous among spirochete genomes. PHX genes are positively correlated with strong Shine-Dalgarno signal sequences. Specific regulatory proteins, e.g., two-component sensor proteins, are rarely PHX. Genes involved in pathways for the synthesis of vitamins record low predicted expression levels. Several distinctive PHX genes of the available complete prokaryotic genomes are highlighted. Relationships of PHX genes with stoichiometry, multifunctionality, and operon structures are discussed. Our methodology may be used complementary to experimental expression analysis.


Subject(s)
Archaea/genetics , Archaeal Proteins/genetics , Bacteria/genetics , Bacterial Proteins/genetics , Gene Expression Regulation, Archaeal , Gene Expression Regulation, Bacterial , Escherichia coli/genetics , Protein Biosynthesis , Ribosomal Proteins/genetics , Transcription Factors/genetics , Transcription, Genetic
9.
Protein Sci ; 9(3): 476-86, 2000 Mar.
Article in English | MEDLINE | ID: mdl-10752609

ABSTRACT

The chaperonin HSP60 (GroEL) proteins are essential in eubacterial genomes and in eukaryotic organelles. Functional regions inferred from mutation studies and the Escherichia coli GroEL 3D crystal complexes are evaluated in a multiple alignment across 43 diverse HSP60 sequences, centering on ATP/ADP and Mg2+ binding sites, on residues interacting with substrate, on GroES contact positions, on interface regions between monomers and domains, and on residues important in allosteric conformational changes. The most evolutionary conserved residues relate to the ATP/ADP and Mg2+ binding sites. Hydrophobic residues that contribute in substrate binding are also significantly conserved. A large number of charged residues line the central cavity of the GroEL-GroES complex in the substrate-releasing conformation. These span statistically significant intra- and inter-monomer three-dimensional (3D) charge clusters that are highly conserved among sequences and presumably play an important role interacting with the substrate. Unaligned short segments between blocks of alignment are generally exposed at the outside wall of the Anfinsen cage complex. The multiple alignment reveals regions of divergence common to specific evolutionary groups. For example, rickettsial sequences diverge in the ATP/ADP binding domain and gram-positive sequences diverge in the allosteric transition domain. The evolutionary information of the multiple alignment proffers attractive sites for mutational studies.


Subject(s)
Chaperonin 60/chemistry , Adenosine Triphosphate/chemistry , Amino Acid Sequence , Binding Sites , Conserved Sequence , Evolution, Molecular , Magnesium/chemistry , Models, Molecular , Molecular Sequence Data , Protein Structure, Quaternary
10.
Proc Natl Acad Sci U S A ; 96(22): 12500-5, 1999 Oct 26.
Article in English | MEDLINE | ID: mdl-10535951

ABSTRACT

The residue environment in protein structures is studied with respect to the density of carbon (C), oxygen (O), and nitrogen (N) atoms within a certain distance (say 5 A) of each residue. Two types of environments are evaluated: one based on side-chain atom contacts (abbreviated S-S) and the other based on all atom (side-chain + backbone) contacts (abbreviated A-A). Different atom counts are observed about nine-residue structural categories defined by three solvent accessibility levels and three secondary structure states. Among the structural categories, the S-S atom count ratios generally vary more than the A-A atom count ratios because of the fact that the backbone (O) and (N) atoms contribute equal counts. Secondary structure affects the (C) density for the A-A contacts whereas secondary structure has little influence on the (C) density for the S-S contacts. For S-S contacts, a greater density of (O) over (N) atom neighbors stands out in the environment of most amino acid types. By contrast, for A-A contacts, independent of the solvent accessibility levels, the ratio (O)/(N) is approximately 1 in helical states, consistent with the geometry of alpha-helical residues whose side-chains tilt oppositely to the amino to carboxy alpha-helical axis. The highest ratio of neighbor (O)/(N) is achieved under solvent exposed conditions. This (O) vs. (N) prevalence is advantageous at the protein surface that generally exhibits an acid excess that helps to enhance protein solubility in the cell and to avoid nonspecific interactions with phosphate groups of DNA, RNA, and other plasma constituents.


Subject(s)
Protein Conformation , Proteins/chemistry
11.
Proc Natl Acad Sci U S A ; 96(22): 12494-9, 1999 Oct 26.
Article in English | MEDLINE | ID: mdl-10535950

ABSTRACT

A hierarchy of residue density assessments and packing properties in protein structures are contrasted, including a regular density, a variety of charge densities, a hydrophobic density, a polar density, and an aromatic density. These densities are investigated by alternative distance measures and also at the interface of multiunit structures. Amino acids are divided into nine structural categories according to three secondary structure states and three solvent accessibility levels. To take account of amino acid abundance differences across protein structures, we normalize the observed density by the expected density defining a density index. Solvent accessibility levels exert the predominant influence in determinations of the regular residue density. Explicitly, the regular density values vary approximately linearly with respect to solvent accessibility levels, the linearity parameters depending on the amino acid. The charge index reveals pronounced inequalities between lysine and arginine in their interactions with acidic residues. The aromatic density calculations in all structural categories parallel the regular density calculations, indicating that the aromatic residues are distributed as a random sample of all residues. Moreover, aromatic residues are found to be over-represented in the neighborhood of all amino acids. This result might be attributed to nucleation sites and protein stability being substantially associated with aromatic residues.


Subject(s)
Protein Conformation , Amino Acids/chemistry
12.
Proc Natl Acad Sci U S A ; 96(16): 9184-9, 1999 Aug 03.
Article in English | MEDLINE | ID: mdl-10430917

ABSTRACT

Our basic observation is that each genome has a characteristic "signature" defined as the ratios between the observed dinucleotide frequencies and the frequencies expected if neighbors were chosen at random (dinucleotide relative abundances). The remarkable fact is that the signature is relatively constant throughout the genome; i.e. , the patterns and levels of dinucleotide relative abundances of every 50-kb segment of the genome are about the same. Comparison of the signatures of different genomes provides a measure of similarity which has the advantage that it looks at all the DNA of an organism and does not depend on the ability to align homologous sequences of specific genes. Genome signature comparisons show that plasmids, both specialized and broad-range, and their hosts have substantially compatible (similar) genome signatures. Mammalian mitochondrial (Mt) genomes are very similar, and animal and fungal Mt are generally moderately similar, but they diverge significantly from plant and protist Mt sets. Moreover, Mt genome signature differences between species parallel the corresponding nuclear genome signature differences, despite large differences between Mt and host nuclear signatures. In signature terms, we find that the archaea are not a coherent clade. For example, Sulfolobus and Halobacterium are extremely divergent. There is no consistent pattern of signature differences among thermophiles. More generally, grouping prokaryotes by environmental criteria (e.g., habitat propensities, osmolarity tolerance, chemical conditions) reveals no correlations in genome signature.


Subject(s)
Archaea/genetics , Bacteria/genetics , DNA, Mitochondrial/genetics , DNA/chemistry , Evolution, Molecular , Fungi/genetics , Genome , Plants/genetics , Plasmids/genetics , Animals , Base Sequence , DNA/genetics , Humans
13.
Proc Natl Acad Sci U S A ; 96(16): 9190-5, 1999 Aug 03.
Article in English | MEDLINE | ID: mdl-10430918

ABSTRACT

We provide data and analysis to support the hypothesis that the ancestor of animal mitochondria (Mt) and many primitive amitochondrial (a-Mt) eukaryotes was a fusion microbe composed of a Clostridium-like eubacterium and a Sulfolobus-like archaebacterium. The analysis is based on several observations: (i) The genome signatures (dinucleotide relative abundance values) of Clostridium and Sulfolobus are compatible (sufficiently similar) and each has significantly more similarity in genome signatures with animal Mt sequences than do all other available prokaryotes. That stable fusions may require compatibility in genome signatures is suggested by the compatibility of plasmids and hosts. (ii) The expanded energy metabolism of the fusion organism was strongly selective for cementing such a fusion. (iii) The molecular apparatus of endospore formation in Clostridium serves as raw material for the development of the nucleus and cytoplasm of the eukaryotic cell.


Subject(s)
Archaea/genetics , Bacteria/genetics , Biological Evolution , Chimera , DNA, Mitochondrial/genetics , Mitochondria/genetics , Models, Genetic , Amino Acid Sequence , Animals , Clostridium/genetics , Energy Metabolism/genetics , Eukaryotic Cells , Heat-Shock Proteins/genetics , Humans , Proteins/chemistry , Proteins/genetics , Sulfolobus/genetics , Vertebrates
15.
Ann N Y Acad Sci ; 870: 314-29, 1999 May 18.
Article in English | MEDLINE | ID: mdl-10415493

ABSTRACT

We present new methods for calculating codon bias of a group of genes or an individual gene relative to a standard gene class. This method is suitable for identifying alien (e.g., horizontally transferred) and highly expressed genes. In yeast and several bacterial genomes, highly expressed genes typically include ribosomal protein genes, elongation factors, chaperonins (heat shock proteins), and a subset of genes involved in glycolysis generally essential in exponential growth. Highly expressed genes of the Synechocystis genome feature several photosystem II genes, and highly expressed genes in several methanogens (Methanococcus jannaschii, M. thermoautotrophicum) are essential for methanogenesis. Alien genes mostly consist of ORFs of unknown function, transposases, prophage genes, and restriction/modification enzymes. Notably, nuclear ribosomal proteins of yeast are highly expressed, whereas mitochondrial ribosomal protein genes appear to be alien genes. Alien genes often occur in clusters, suggesting in these cases that transfer events entail several genes.


Subject(s)
Genome, Bacterial , Borrelia burgdorferi Group/genetics , Codon , Cyanobacteria/genetics , Genes, Bacterial , Haemophilus influenzae/genetics , Methanococcus/genetics
16.
Proc Natl Acad Sci U S A ; 96(12): 7011-6, 1999 Jun 08.
Article in English | MEDLINE | ID: mdl-10359830

ABSTRACT

The severity of Helicobacter pylori-related disease is correlated with a pathogenicity island (the Cag region of about 26 genes) whose presence is associated with the up-regulation of an IL-8 cytokine inflammatory response in gastric epithelial cells. Statistical analysis of the Cag gene sequences calculated from the complete genome of strain 26695 revealed several unusual features. The Cag7 sequence (1,927 aa) has two repeat regions. Repeat region I runs 317 aa in a form of AAA proximal to the protein N terminal; repeat region II extends 907 aa in the middle of the protein sequence consisting of 74 contiguous segments composed from selections among six consensus sequences and includes 58 regularly distributed cysteine residues with consecutive cysteines mostly 12, 18, or 24 aa apart. This "regular" cysteine arrangement may provide a scaffolding of linker elements stabilized by disulfide bridges. When Cag7 homologues from different strains are compared, differences were found almost exclusively in the repeat regions, resulting from deletion and/or insertion of repeating units. These observations suggest that the anomalous repetitive structure of the sequence plays an important role in the conformation of Cag7 gene product and potentially in the function of the pathogenicity island. Other facets of the Cag7 sequence show significant charge clusters, high multiplet count, and extremes of amino acid usage.


Subject(s)
Bacterial Proteins/genetics , Genes, Bacterial , Genome, Bacterial , Helicobacter pylori/genetics , Amino Acid Sequence , Base Sequence , Gastric Mucosa/microbiology , Helicobacter Infections/microbiology , Helicobacter pylori/pathogenicity , Humans , Molecular Sequence Data , Multigene Family , Repetitive Sequences, Nucleic Acid , Sequence Deletion , Virulence/genetics
18.
J Mol Evol ; 47(5): 565-77, 1998 Nov.
Article in English | MEDLINE | ID: mdl-9797407

ABSTRACT

The heat shock protein 70 kDa sequences (HSP70) are of great importance as molecular chaperones in protein folding and transport. They are abundant under conditions of cellular stress. They are highly conserved in all domains of life: Archaea, eubacteria, eukaryotes, and organelles (mitochondria, chloroplasts). A multiple alignment of a large collection of these sequences was obtained employing our symmetric-iterative ITERALIGN program (Brocchieri and Karlin 1998). Assessments of conservation are interpreted in evolutionary terms and with respect to functional implications. Many archaeal sequences (methanogens and halophiles) tend to align best with the Gram-positive sequences. These two groups also miss a signature segment [about 25 amino acids (aa) long] present in all other HSP70 species (Gupta and Golding 1993). We observed a second signature sequence of about 4 aa absent from all eukaryotic homologues, significantly aligned in all prokaryotic sequences. Consensus sequences were developed for eight groups [Archaea, Gram-positive, proteobacterial Gram-negative, singular bacteria, mitochondria, plastids, eukaryotic endoplasmic reticulum (ER) isoforms, eukaryotic cytoplasmic isoforms]. All group consensus comparisons tend to summarize better the alignments than do the individual sequence comparisons. The global individual consensus "matches" 87% with the consensus of consensuses sequence. A functional analysis of the global consensus identifies a (new) highly significant mixed charge cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used). The individual Archaea and Gram-positive sequences contain a corresponding significant mixed charge cluster in the location of the charge cluster of the consensus sequence. In contrast, the four Gram-negative proteobacterial sequences of the alignment do not have a charge cluster (even at the 5% significance level). All eukaryotic HSP70 sequences have the analogous charge cluster. Strikingly, several of the eukaryotic isoforms show multiple mixed charged clusters. These clusters were interpreted with supporting data related to HSP70 activity in facilitating chaperone, transport, and secretion function. We observed that the consensus contains only a single tryptophan residue and a single conserved cysteine. This is interpreted with respect to the target rule for disaggregating misfolded proteins. The mitochondrial HSP70 connections to bacterial HSP70 are analyzed, suggesting a polyphyletic split of Trypanosoma and Leishmania protist mitochondrial (Mt) homologues separated from Mt-animal/fungal/plant homologues. Moreover, the HSP70 sequences from the amitochondrial Entamoeba histolytica and Trichomonas vaginalis species were analyzed. The E. histolytica HSP70 is most similar to the higher eukaryotic cytoplasmic sequences, with significantly weaker alignments to ER sequences and much diminished matching to all eubacterial, mitochondrial, and chloroplast sequences. This appears to be at variance with the hypothesis that E. histolytica rather recently lost its mitochondrial organelle. T. vaginalis contains two HSP70 sequences, one Mt-like and the second similar to eukaryotic cytoplasmic sequences suggesting two diverse origins.


Subject(s)
Evolution, Molecular , HSP70 Heat-Shock Proteins/genetics , Amino Acid Sequence , Consensus Sequence , Conserved Sequence , HSP70 Heat-Shock Proteins/physiology , Molecular Sequence Data , Sequence Homology, Amino Acid , Species Specificity
19.
Mol Microbiol ; 29(6): 1341-55, 1998 Sep.
Article in English | MEDLINE | ID: mdl-9781873

ABSTRACT

A new measure for assessing codon bias of one group of genes with respect to a second group of genes is introduced. In this formulation, codon bias correlations for Escherichia coli genes are evaluated for level of expression, for contrasts along genes, for genes in different 200 kb (or longer) contigs around the genome, for effects of gene size, for variation over different function classes, for codon bias in relation to possible lateral transfer and for dicodon bias for some gene classes. Among the function classes, codon biases of ribosomal proteins are the most deviant from the codon frequencies of the average E. coli gene. Other classes of 'highly expressed genes' (e.g. amino acyl tRNA synthetases, chaperonins, modification genes essential to translation activities) show less extreme codon biases. Consistently for genes with experimentally determined expression rates in the exponential growth phase, those of highest molar abundances are more deviant from the average gene codon frequencies and are more similar in codon frequencies to the average ribosomal protein gene. Independent of gene size, the codon biases in the 5' third of genes deviate by more than a factor of two from those in the middle and 3' thirds. In this context, there appear to be conflicting selection pressures imposed by the constraints of ribosomal binding, or more generally the early phase of protein synthesis (about the first 50 codons) may be more biased than the complete nascent polypeptide. In partitioning the E. coli genome into 10 equal lengths, pronounced differences in codon site 3 G+C frequencies accumulate. Genes near to oriC have 5% greater codon site 3 G+C frequencies than do genes from the ter region. This difference also is observed between small (100-300 codons) and large (>800 codons) genes. This result contrasts with that for eukaryotic genomes (including human, Caenorhabditis elegans and yeast) where long genes tend to have site 3 more AT rich than short genes. Many of the above results are special for E. coli genes and do not apply to genes of most bacterial genomes. A gene is defined as alien (possibly horizontally transferred) if its codon bias relative to the average gene exceeds a high threshold and the codon bias relative to ribosomal proteins is also appropriately high. These are identified, including four clusters (operons). The bulk of these genes have no known function.


Subject(s)
Codon/genetics , Escherichia coli/genetics , Genes, Bacterial , Genome, Bacterial , Amino Acyl-tRNA Synthetases/genetics , Animals , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Base Composition , Chromosomes, Bacterial/genetics , Coliphages/genetics , DNA, Bacterial/genetics , DNA, Viral/genetics , Escherichia coli/enzymology , Escherichia coli/growth & development , Gene Expression , Humans , Operon , Protein Biosynthesis , Protein Folding , Ribosomal Proteins/genetics , Species Specificity
20.
Curr Opin Struct Biol ; 8(3): 346-54, 1998 Jun.
Article in English | MEDLINE | ID: mdl-9666331

ABSTRACT

Genome sequencing efforts will soon generate hundreds of millions of bases of human genomic DNA containing thousands of novel genes. In the past year, the accuracy of computational gene-finding methods has improved significantly, to the point where a reasonable approximation of the gene structures within an extended genomic region can often be predicted in advance of more detailed experimental studies.


Subject(s)
DNA/chemistry , DNA/genetics , Genes , Animals , Exons , Genetic Techniques , Genome, Human , Humans , Markov Chains , Models, Genetic , Protein Biosynthesis , Repetitive Sequences, Nucleic Acid , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...