Search | VHL Regional Portal

1.

Profiling expression strategies for a type III polyketide synthase in a lysate-based, cell-free system.

Sword, Tien T; Dinglasan, Jaime Lorenzo N; Abbas, Ghaeath S K; Barker, J William; Spradley, Madeline E; Greene, Elijah R; Gooden, Damian S; Emrich, Scott J; Gilchrist, Michael A; Doktycz, Mitchel J; Bailey, Constance B.

Sci Rep ; 14(1): 12983, 2024 06 06.

Article in English | MEDLINE | ID: mdl-38839808

ABSTRACT

Some of the most metabolically diverse species of bacteria (e.g., Actinobacteria) have higher GC content in their DNA, differ substantially in codon usage, and have distinct protein folding environments compared to tractable expression hosts like Escherichia coli. Consequentially, expressing biosynthetic gene clusters (BGCs) from these bacteria in E. coli often results in a myriad of unpredictable issues with regard to protein expression and folding, delaying the biochemical characterization of new natural products. Current strategies to achieve soluble, active expression of these enzymes in tractable hosts can be a lengthy trial-and-error process. Cell-free expression (CFE) has emerged as a valuable expression platform as a testbed for rapid prototyping expression parameters. Here, we use a type III polyketide synthase from Streptomyces griseus, RppA, which catalyzes the formation of the red pigment flaviolin, as a reporter to investigate BGC refactoring techniques. We applied a library of constructs with different combinations of promoters and rppA coding sequences to investigate the synergies between promoter and codon usage. Subsequently, we assess the utility of cell-free systems for prototyping these refactoring tactics prior to their implementation in cells. Overall, codon harmonization improves natural product synthesis more than traditional codon optimization across cell-free and cellular environments. More importantly, the choice of coding sequences and promoters impact protein expression synergistically, which should be considered for future efforts to use CFE for high-yield protein expression. The promoter strategy when applied to RppA was not completely correlated with that observed with GFP, indicating that different promoter strategies should be applied for different proteins. In vivo experiments suggest that there is correlation, but not complete alignment between expressing in cell free and in vivo. Refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs, which advances CFE as a tool for natural product research.

Subject(s)

Cell-Free System , Promoter Regions, Genetic , Streptomyces griseus/enzymology , Streptomyces griseus/genetics , Streptomyces griseus/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Multigene Family , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Polyketide Synthases/genetics , Polyketide Synthases/metabolism , Codon/genetics , Acyltransferases

2.

Profiling Expression Strategies for a Type III Polyketide Synthase in a Lysate-Based, Cell-free System.

Sword, Tien T; Dinglasan, Jaime Lorenzo N; Abbas, Ghaeath S K; William Barker, J; Spradley, Madeline E; Greene, Elijah R; Gooden, Damian S; Emrich, Scott J; Gilchrist, Michael A; Doktycz, Mitchel J; Bailey, Constance B.

bioRxiv ; 2023 Dec 01.

Article in English | MEDLINE | ID: mdl-38077034

ABSTRACT

Some of the most metabolically diverse species of bacteria (e.g., Actinobacteria) have higher GC content in their DNA, differ substantially in codon usage, and have distinct protein folding environments compared to tractable expression hosts like Escherichia coli. Consequentially, expressing biosynthetic gene clusters (BGCs) from these bacteria in E. coli frequently results in a myriad of unpredictable issues with protein expression and folding, delaying the biochemical characterization of new natural products. Current strategies to achieve soluble, active expression of these enzymes in tractable hosts, such as BGC refactoring, can be a lengthy trial-and-error process. Cell-free expression (CFE) has emerged as 1) a valuable expression platform for enzymes that are challenging to synthesize in vivo, and as 2) a testbed for rapid prototyping that can improve cellular expression. Here, we use a type III polyketide synthase from Streptomyces griseus, RppA, which catalyzes the formation of the red pigment flaviolin, as a reporter to investigate BGC refactoring techniques. We synergistically tune promoter and codon usage to improve flaviolin production from cell-free expressed RppA. We then assess the utility of cell-free systems for prototyping these refactoring tactics prior to their implementation in cells. Overall, codon harmonization improves natural product synthesis more than traditional codon optimization across cell-free and cellular environments. Refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs. By showing the coordinators between CFE versus in vivo expression, this work advances CFE as a tool for natural product research.

3.

Quantifying shifts in natural selection on codon usage between protein regions: a population genetics approach.

Cope, Alexander L; Gilchrist, Michael A.

BMC Genomics ; 23(1): 408, 2022 May 30.

Article in English | MEDLINE | ID: mdl-35637464

ABSTRACT

BACKGROUND: Codon usage bias (CUB), the non-uniform usage of synonymous codons, occurs across all domains of life. Adaptive CUB is hypothesized to result from various selective pressures, including selection for efficient ribosome elongation, accurate translation, mRNA secondary structure, and/or protein folding. Given the critical link between protein folding and protein function, numerous studies have analyzed the relationship between codon usage and protein structure. The results from these studies have often been contradictory, likely reflecting the differing methods used for measuring codon usage and the failure to appropriately control for confounding factors, such as differences in amino acid usage between protein structures and changes in the frequency of different structures with gene expression. RESULTS: Here we take an explicit population genetics approach to quantify codon-specific shifts in natural selection related to protein structure in S. cerevisiae and E. coli. Unlike other metrics of codon usage, our approach explicitly separates the effects of natural selection, scaled by gene expression, and mutation bias while naturally accounting for a region's amino acid usage. Bayesian model comparisons suggest selection on codon usage varies only slightly between helix, sheet, and coil secondary structures and, similarly, between structured and intrinsically-disordered regions. Similarly, in contrast to prevous findings, we find selection on codon usage only varies slightly at the termini of helices in E. coli. Using simulated data, we show this previous work indicating "non-optimal" codons are enriched at the beginning of helices in S. cerevisiae was due to failure to control for various confounding factors (e.g. amino acid biases, gene expression, etc.), and rather than selection to modulate cotranslational folding. CONCLUSIONS: Our results reveal a weak relationship between codon usage and protein structure, indicating that differences in selection on codon usage between structures are slight. In addition to the magnitude of differences in selection between protein structures being slight, the observed shifts appear to be idiosyncratic and largely codon-specific rather than systematic reversals in the nature of selection. Overall, our work demonstrates the statistical power and benefits of studying selective shifts on codon usage or other genomic features from an explicitly evolutionary approach. Limitations of this approach and future potential research avenues are discussed.

Subject(s)

Codon Usage , Saccharomyces cerevisiae , Amino Acids/genetics , Bayes Theorem , Codon/genetics , Escherichia coli/genetics , Genetics, Population , Saccharomyces cerevisiae/genetics , Selection, Genetic

4.

A Spatially Explicit Model of Stabilizing Selection for Improving Phylogenetic Inference.

Beaulieu, Jeremy M; O'Meara, Brian C; Gilchrist, Michael A.

Mol Biol Evol ; 38(4): 1641-1652, 2021 04 13.

Article in English | MEDLINE | ID: mdl-33306127

ABSTRACT

Ultraconserved elements (UCEs) are stretches of hundreds of nucleotides with highly conserved cores flanked by variable regions. Although the selective forces responsible for the preservation of UCEs are unknown, they are nonetheless believed to contain phylogenetically meaningful information from deep to shallow divergence events. Phylogenetic applications of UCEs assume the same degree of rate heterogeneity applies across the entire locus, including variable flanking regions. We present a Wright-Fisher model of selection on nucleotides (SelON) which includes the effects of mutation, drift, and spatially varying, stabilizing selection for an optimal nucleotide sequence. The SelON model assumes the strength of stabilizing selection follows a position-dependent Gaussian function whose exact shape can vary between UCEs. We evaluate SelON by comparing its performance to a simpler and spatially invariant GTR+Γ model using an empirical data set of 400 vertebrate UCEs used to determine the phylogenetic position of turtles. We observe much improvement in model fit of SelON over the GTR+Γ model, and support for turtles as sister to lepidosaurs. Overall, the UCE-specific parameters SelON estimates provide a compact way of quantifying the strength and variation in selection within and across UCEs. SelON can also be extended to include more realistic mapping functions between sequence and stabilizing selection as well as allow for greater levels of rate heterogeneity. By more explicitly modeling the nature of selection on UCEs, SelON and similar approaches can be used to better understand the biological mechanisms responsible for their preservation across highly divergent taxa and long evolutionary time scales.

Subject(s)

Models, Genetic , Selection, Genetic , Base Sequence , Conserved Sequence , Phylogeny

5.

Unlocking a signal of introgression from codons in Lachancea kluyveri using a mutation-selection model.

Landerer, Cedric; O'Meara, Brian C; Zaretzki, Russell; Gilchrist, Michael A.

BMC Evol Biol ; 20(1): 109, 2020 08 26.

Article in English | MEDLINE | ID: mdl-32842959

ABSTRACT

BACKGROUND: For decades, codon usage has been used as a measure of adaptation for translational efficiency and translation accuracy of a gene's coding sequence. These patterns of codon usage reflect both the selective and mutational environment in which the coding sequences evolved. Over this same period, gene transfer between lineages has become widely recognized as an important biological phenomenon. Nevertheless, most studies of codon usage implicitly assume that all genes within a genome evolved under the same selective and mutational environment, an assumption violated when introgression occurs. In order to better understand the effects of introgression on codon usage patterns and vice versa, we examine the patterns of codon usage in Lachancea kluyveri, a yeast which has experienced a large introgression. We quantify the effects of mutation bias and selection for translation efficiency on the codon usage pattern of the endogenous and introgressed exogenous genes using a Bayesian mixture model, ROC SEMPPR, which is built on mechanistic assumptions about protein synthesis and grounded in population genetics. RESULTS: We find substantial differences in codon usage between the endogenous and exogenous genes, and show that these differences can be largely attributed to differences in mutation bias favoring A/T ending codons in the endogenous genes while favoring C/G ending codons in the exogenous genes. Recognizing the two different signatures of mutation bias and selection improves our ability to predict protein synthesis rate by 42% and allowed us to accurately assess the decaying signal of endogenous codon mutation and preferences. In addition, using our estimates of mutation bias and selection, we identify Eremothecium gossypii as the closest relative to the exogenous genes, providing an alternative hypothesis about the origin of the exogenous genes, estimate that the introgression occurred â¼6×108 generation ago, and estimate its historic and current selection against mismatched codon usage. CONCLUSIONS: Our work illustrates how mechanistic, population genetic models like ROC SEMPPR can separate the effects of mutation and selection on codon usage and provide quantitative estimates from sequence data.

Subject(s)

Codon Usage , Genetics, Population , Models, Genetic , Saccharomycetales/genetics , Selection, Genetic , Bayes Theorem , Mutation

6.

Gene expression of functionally-related genes coevolves across fungal species: detecting coevolution of gene expression using phylogenetic comparative methods.

Cope, Alexander L; O'Meara, Brian C; Gilchrist, Michael A.

BMC Genomics ; 21(1): 370, 2020 May 20.

Article in English | MEDLINE | ID: mdl-32434474

ABSTRACT

BACKGROUND: Researchers often measure changes in gene expression across conditions to better understand the shared functional roles and regulatory mechanisms of different genes. Analogous to this is comparing gene expression across species, which can improve our understanding of the evolutionary processes shaping the evolution of both individual genes and functional pathways. One area of interest is determining genes showing signals of coevolution, which can also indicate potential functional similarity, analogous to co-expression analysis often performed across conditions for a single species. However, as with any trait, comparing gene expression across species can be confounded by the non-independence of species due to shared ancestry, making standard hypothesis testing inappropriate. RESULTS: We compared RNA-Seq data across 18 fungal species using a multivariate Brownian Motion phylogenetic comparative method (PCM), which allowed us to quantify coevolution between protein pairs while directly accounting for the shared ancestry of the species. Our work indicates proteins which physically-interact show stronger signals of coevolution than randomly-generated pairs. Interactions with stronger empirical and computational evidence also showing stronger signals of coevolution. We examined the effects of number of protein interactions and gene expression levels on coevolution, finding both factors are overall poor predictors of the strength of coevolution between a protein pair. Simulations further demonstrate the potential issues of analyzing gene expression coevolution without accounting for shared ancestry in a standard hypothesis testing framework. Furthermore, our simulations indicate the use of a randomly-generated null distribution as a means of determining statistical significance for detecting coevolving genes with phylogenetically-uncorrected correlations, as has previously been done, is less accurate than PCMs, although is a significant improvement over standard hypothesis testing. These methods are further improved by using a phylogenetically-corrected correlation metric. CONCLUSIONS: Our work highlights potential benefits of using PCMs to detect gene expression coevolution from high-throughput omics scale data. This framework can be built upon to investigate other evolutionary hypotheses, such as changes in transcription regulatory mechanisms across species.

Subject(s)

Evolution, Molecular , Fungal Proteins/genetics , Fungi/genetics , Gene Expression , Fungal Proteins/metabolism , Fungi/classification , Fungi/metabolism , Models, Genetic , Phenotype , Phylogeny , Protein Binding

7.

Population Genetics Based Phylogenetics Under Stabilizing Selection for an Optimal Amino Acid Sequence: A Nested Modeling Approach.

Beaulieu, Jeremy M; O'Meara, Brian C; Zaretzki, Russell; Landerer, Cedric; Chai, Juanjuan; Gilchrist, Michael A.

Mol Biol Evol ; 36(4): 834-851, 2019 04 01.

Article in English | MEDLINE | ID: mdl-30521036

ABSTRACT

We present a new phylogenetic approach, selection on amino acids and codons (SelAC), whose substitution rates are based on a nested model linking protein expression to population genetics. Unlike simpler codon models that assume a single substitution matrix for all sites, our model more realistically represents the evolution of protein-coding DNA under the assumption of consistent, stabilizing selection using a cost-benefit approach. This cost-benefit approach allows us to generate a set of 20 optimal amino acid-specific matrix families using just a handful of parameters and naturally links the strength of stabilizing selection to protein synthesis levels, which we can estimate. Using a yeast data set of 100 orthologs for 6 taxa, we find SelAC fits the data much better than popular models by 104-105 Akike information criterion units adjusted for small sample bias. Our results also indicated that nested, mechanistic models better predict observed data patterns highlighting the improvement in biological realism in amino acid sequence evolution that our model provides. Additional parameters estimated by SelAC indicate that a large amount of nonphylogenetic, but biologically meaningful, information can be inferred from existing data. For example, SelAC prediction of gene-specific protein synthesis rates correlates well with both empirical (r=0.33-0.48) and other theoretical predictions (r=0.45-0.64) for multiple yeast species. SelAC also provides estimates of the optimal amino acid at each site. Finally, because SelAC is a nested approach based on clearly stated biological assumptions, future modifications, such as including shifts in the optimal amino acid sequence within or across lineages, are possible.

Subject(s)

Amino Acid Substitution , Genetic Techniques , Models, Genetic , Phylogeny , Selection, Genetic , Genetics, Population/methods

8.

Quantifying codon usage in signal peptides: Gene expression and amino acid usage explain apparent selection for inefficient codons.

Cope, Alexander L; Hettich, Robert L; Gilchrist, Michael A.

Biochim Biophys Acta Biomembr ; 1860(12): 2479-2485, 2018 12.

Article in English | MEDLINE | ID: mdl-30279149

ABSTRACT

The Sec secretion pathway is found across all domains of life. A critical feature of Sec secreted proteins is the signal peptide, a short peptide with distinct physicochemical properties located at the N-terminus of the protein. Previous work indicates signal peptides are biased towards translationally inefficient codons, which is hypothesized to be an adaptation driven by selection to improve the efficacy and efficiency of the protein secretion mechanisms. We investigate codon usage in the signal peptides of E. coli using the Codon Adaptation Index (CAI), the tRNA Adaptation Index (tAI), and the ribosomal overhead cost formulation of the stochastic evolutionary model of protein production rates (ROC-SEMPPR). Comparisons between signal peptides and 5'-end of cytoplasmic proteins using CAI and tAI are consistent with a preference for inefficient codons in signal peptides. Simulations reveal these differences are due to amino acid usage and gene expression - we find these differences disappear when accounting for both factors. In contrast, ROC-SEMPPR, a mechanistic population genetics model capable of separating the effects of selection and mutation bias, shows codon usage bias (CUB) of the signal peptides is indistinguishable from the 5'-ends of cytoplasmic proteins. Additionally, we find CUB at the 5'-ends is weaker than later segments of the gene. Results illustrate the value in using models grounded in population genetics to interpret genetic data. We show failure to account for mutation bias and the effects of gene expression on the efficacy of selection against translation inefficiency can lead to a misinterpretation of codon usage patterns.

Subject(s)

Amino Acids/metabolism , Codon , Escherichia coli/genetics , Gene Expression , Protein Sorting Signals/genetics , Genes, Bacterial , Mutation , Protein Biosynthesis , RNA, Transfer/genetics

9.

AnaCoDa: analyzing codon data with Bayesian mixture models.

Landerer, Cedric; Cope, Alexander; Zaretzki, Russell; Gilchrist, Michael A.

Bioinformatics ; 34(14): 2496-2498, 2018 07 15.

Article in English | MEDLINE | ID: mdl-29522124

ABSTRACT

Summary: AnaCoDa is an R package for estimating biologically relevant parameters of mixture models, such as selection against translation inefficiency, non-sense errors and ribosome pausing time, from genomic and high throughput datasets. AnaCoDa provides an adaptive Bayesian MCMC algorithm, fully implemented in C++ for high performance with an ergonomic R interface to improve usability. AnaCoDa employs a generic object-oriented design to allow users to extend the framework and implement their own models. Current models implemented in AnaCoDa can accurately estimate biologically relevant parameters given either protein coding sequences or ribosome foot-printing data. Optionally, AnaCoDa can utilize additional data sources, such as gene expression measurements, to aid model fitting and parameter estimation. By utilizing a hierarchical object structure, some parameters can vary between sets of genes while others can be shared. Genes may be assigned to clusters or membership may be estimated by AnaCoDa. This flexibility allows users to estimate the same model parameter under different biological conditions and categorize genes into different sets based on shared model properties embedded within the data. AnaCoDa also allows users to generate simulated data which can be used to aid model development and model analysis as well as evaluate model adequacy. Finally, AnaCoDa contains a set of visualization routines and the ability to revisit or re-initiate previous model fitting, providing researchers with a well rounded easy to use framework to analyze genome scale data. Availability and implementation: AnaCoDa is freely available under the Mozilla Public License 2.0 on CRAN (https://cran.r-project.org/web/packages/AnaCoDa/).

Subject(s)

Codon , Genomics/methods , Models, Genetic , Sequence Analysis, DNA/methods , Software , Algorithms , Bayes Theorem

10.

A codon model of nucleotide substitution with selection on synonymous codon usage.

Kubatko, Laura; Shah, Premal; Herbei, Radu; Gilchrist, Michael A.

Mol Phylogenet Evol ; 94(Pt A): 290-7, 2016 Jan.

Article in English | MEDLINE | ID: mdl-26358614

ABSTRACT

The quality of phylogenetic inference made from protein-coding genes depends, in part, on the realism with which the codon substitution process is modeled. Here we propose a new mechanistic model that combines the standard M0 substitution model of Yang (1997) with a simplified model from Gilchrist (2007) that includes selection on synonymous substitutions as a function of codon-specific nonsense error rates. We tested the newly proposed model by applying it to 104 protein-coding genes in brewer's yeast, and compared the fit of the new model to the standard M0 model and to the mutation-selection model of Yang and Nielsen (2008) using the AIC. Our new model provided significantly better fit in approximately 85% of the cases considered for the basic M0 model and in approximately 25% of the cases for the M0 model with estimated codon frequencies, but only in a few cases when the mutation-selection model was considered. However, our model includes a parameter that can be interpreted as a measure of the rate of protein production, and the estimates of this parameter were highly correlated with an independent measure of protein production for the yeast genes considered here. Finally, we found that in some cases the new model led to the preference of a different phylogeny for a subset of the genes considered, indicating that substitution model choice may have an impact on the estimated phylogeny.

Subject(s)

Codon/genetics , Genetic Code , Models, Genetic , Selection, Genetic , Genes, Fungal , Nucleotides/genetics , Phylogeny , Point Mutation , Saccharomyces cerevisiae/classification , Saccharomyces cerevisiae Proteins

11.

Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone.

Gilchrist, Michael A; Chen, Wei-Chen; Shah, Premal; Landerer, Cedric L; Zaretzki, Russell.

Genome Biol Evol ; 7(6): 1559-79, 2015 May 14.

Article in English | MEDLINE | ID: mdl-25977456

ABSTRACT

Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid's "optimal" codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.

Subject(s)

Codon , Evolution, Molecular , Genomics/methods , Mutation , Protein Biosynthesis , Selection, Genetic , Gene Expression , Models, Genetic , Saccharomyces cerevisiae/genetics

12.

Evidence for finely-regulated asynchronous growth of Toxoplasma gondii cysts based on data-driven model selection.

Sullivan, Adam M; Zhao, Xiaopeng; Suzuki, Yasuhiro; Ochiai, Eri; Crutcher, Stephen; Gilchrist, Michael A.

PLoS Comput Biol ; 9(11): e1003283, 2013.

Article in English | MEDLINE | ID: mdl-24244117

ABSTRACT

Toxoplasma gondii establishes a chronic infection by forming cysts preferentially in the brain. This chronic infection is one of the most common parasitic infections in humans and can be reactivated to develop life-threatening toxoplasmic encephalitis in immunocompromised patients. Host-pathogen interactions during the chronic infection include growth of the cysts and their removal by both natural rupture and elimination by the immune system. Analyzing these interactions is important for understanding the pathogenesis of this common infection. We developed a differential equation framework of cyst growth and employed Akaike Information Criteria (AIC) to determine the growth and removal functions that best describe the distribution of cyst sizes measured from the brains of chronically infected mice. The AIC strongly support models in which T. gondii cysts grow at a constant rate such that the per capita growth rate of the parasite is inversely proportional to the number of parasites within a cyst, suggesting finely-regulated asynchronous replication of the parasites. Our analyses were also able to reject the models where cyst removal rate increases linearly or quadratically in association with increase in cyst size. The modeling and analysis framework may provide a useful tool for understanding the pathogenesis of infections with other cyst producing parasites.

Subject(s)

Cysts/parasitology , Host-Pathogen Interactions/physiology , Models, Biological , Models, Statistical , Toxoplasma/growth & development , Animals , Brain/parasitology , Computational Biology , Female , Mice , Toxoplasma/pathogenicity

13.

Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift.

Shah, Premal; Gilchrist, Michael A.

Proc Natl Acad Sci U S A ; 108(25): 10231-6, 2011 Jun 21.

Article in English | MEDLINE | ID: mdl-21646514

ABSTRACT

The genetic code is redundant with most amino acids using multiple codons. In many organisms, codon usage is biased toward particular codons. Understanding the adaptive and nonadaptive forces driving the evolution of codon usage bias (CUB) has been an area of intense focus and debate in the fields of molecular and evolutionary biology. However, their relative importance in shaping genomic patterns of CUB remains unsolved. Using a nested model of protein translation and population genetics, we show that observed gene level variation of CUB in Saccharomyces cerevisiae can be explained almost entirely by selection for efficient ribosomal usage, genetic drift, and biased mutation. The correlation between observed codon counts within individual genes and our model predictions is 0.96. Although a variety of factors shape patterns of CUB at the level of individual sites within genes, our results suggest that selection for efficient ribosome usage is a central force in shaping codon usage at the genomic scale. In addition, our model allows direct estimation of codon-specific mutation rates and elongation times and can be readily applied to any organism with high-throughput expression datasets. More generally, we have developed a natural framework for integrating models of molecular processes to population genetics models to quantitatively estimate parameters underlying fundamental biological processes, such a protein translation.

Subject(s)

Codon , Genetic Code , Genetic Drift , Mutation , Protein Biosynthesis , Biological Evolution , Genome, Fungal , Models, Genetic , Saccharomyces cerevisiae/genetics

14.

Memory T cells are enriched in lymph nodes of selectin-ligand-deficient mice.

Harp, John R; Gilchrist, Michael A; Onami, Thandi M.

J Immunol ; 185(10): 5751-61, 2010 Nov 15.

Article in English | MEDLINE | ID: mdl-20937846

ABSTRACT

Fucosyltransferase-IV and -VII double knockout (FtDKO) mice reveal profound impairment in T cell trafficking to lymph nodes (LNs) due to an inability to synthesize selectin ligands. We observed an increase in the proportion of memory/effector (CD44(high)) T cells in LNs of FtDKO mice. We infected FtDKO mice with lymphocytic choriomeningitis virus to generate and track Ag-specific CD44(high)CD8 T cells in secondary lymphoid organs. Although frequencies were similar, total Ag-specific effector CD44(high)CD8 T cells were significantly reduced in LNs, but not blood, of FtDKO mice at day 8. In contrast, frequencies of Ag-specific memory CD44(high)CD8 T cells were up to 8-fold higher in LNs of FtDKO mice at day 60. Because wild-type mice treated with anti-CD62L treatment also showed increased frequencies of CD44(high) T cells in LNs, we hypothesized that memory T cells were preferentially retained in, or preferentially migrated to, FtDKO LNs. We analyzed T cell entry and egress in LNs using adoptive transfer of bone fide naive or memory T cells. Memory T cells were not retained longer in LNs compared with naive T cells; however, T cell exit slowed significantly as T cell numbers declined. Memory T cells were profoundly impaired in entering LNs of FtDKO mice; however, memory T cells exhibited greater homeostatic proliferation in FtDKO mice. These results suggest that memory T cells are enriched in LNs with T cell deficits by several mechanisms, including longer T cell retention and increased homeostatic proliferation.

Subject(s)

CD8-Positive T-Lymphocytes/cytology , Chemotaxis, Leukocyte/immunology , Immunologic Memory , Lymph Nodes/cytology , Selectins/immunology , T-Lymphocyte Subsets/cytology , Animals , CD8-Positive T-Lymphocytes/immunology , Cell Proliferation , Cell Separation , Flow Cytometry , Fucosyltransferases/deficiency , Hyaluronan Receptors/immunology , Ligands , Lymph Nodes/immunology , Mice , Mice, Inbred C57BL , Mice, Knockout , T-Lymphocyte Subsets/immunology

15.

Effect of correlated tRNA abundances on translation errors and evolution of codon usage bias.

Shah, Premal; Gilchrist, Michael A.

PLoS Genet ; 6(9): e1001128, 2010 Sep 16.

Article in English | MEDLINE | ID: mdl-20862306

ABSTRACT

Despite the fact that tRNA abundances are thought to play a major role in determining translation error rates, their distribution across the genetic code and the resulting implications have received little attention. In general, studies of codon usage bias (CUB) assume that codons with higher tRNA abundance have lower missense error rates. Using a model of protein translation based on tRNA competition and intra-ribosomal kinetics, we show that this assumption can be violated when tRNA abundances are positively correlated across the genetic code. Examining the distribution of tRNA abundances across 73 bacterial genomes from 20 different genera, we find a consistent positive correlation between tRNA abundances across the genetic code. This work challenges one of the fundamental assumptions made in over 30 years of research on CUB that codons with higher tRNA abundances have lower missense error rates and that missense errors are the primary selective force responsible for CUB.

Subject(s)

Codon/genetics , Evolution, Molecular , Protein Biosynthesis/genetics , RNA, Transfer/genetics , Bias , Escherichia coli/genetics , Genetic Variation , Genome, Bacterial/genetics , Models, Genetic , Prokaryotic Cells/metabolism , Species Specificity

16.

Is thermosensing property of RNA thermometers unique?

Shah, Premal; Gilchrist, Michael A.

PLoS One ; 5(7): e11308, 2010 Jul 02.

Article in English | MEDLINE | ID: mdl-20625392

ABSTRACT

A large number of studies have been dedicated to identify the structural and sequence based features of RNA thermometers, mRNAs that regulate their translation initiation rate with temperature. It has been shown that the melting of the ribosome-binding site (RBS) plays a prominent role in this thermosensing process. However, little is known as to how widespread this melting phenomenon is as earlier studies on the subject have worked with a small sample of known RNA thermometers. We have developed a novel method of studying the melting of RNAs with temperature by computationally sampling the distribution of the RNA structures at various temperatures using the RNA folding software Vienna. In this study, we compared the thermosensing property of 100 randomly selected mRNAs and three well known thermometers--rpoH, ibpA and agsA sequences from E. coli. We also compared the rpoH sequences from 81 mesophilic proteobacteria. Although both rpoH and ibpA show a higher rate of melting at their RBS compared with the mean of non-thermometers, contrary to our expectations these higher rates are not significant. Surprisingly, we also do not find any significant differences between rpoH thermometers from other gamma-proteobacteria and E. coli non-thermometers.

Subject(s)

RNA, Bacterial/chemistry , RNA, Bacterial/metabolism , Temperature , Thermometers , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Gammaproteobacteria/genetics , Gammaproteobacteria/metabolism , Heat-Shock Proteins/chemistry , Heat-Shock Proteins/metabolism , Nucleic Acid Conformation , Sigma Factor/chemistry , Sigma Factor/metabolism , Software

17.

Bias correction and Bayesian analysis of aggregate counts in SAGE libraries.

Zaretzki, Russell L; Gilchrist, Michael A; Briggs, William M; Armagan, Artin.

BMC Bioinformatics ; 11: 72, 2010 Feb 03.

Article in English | MEDLINE | ID: mdl-20128916

ABSTRACT

BACKGROUND: Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power. RESULTS: Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context. CONCLUSIONS: Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.

Subject(s)

Bayes Theorem , Bias , Gene Expression Profiling , RNA, Messenger/genetics

18.

The h subunit of eIF3 promotes reinitiation competence during translation of mRNAs harboring upstream open reading frames.

Roy, Bijoyita; Vaughn, Justin N; Kim, Byung-Hoon; Zhou, Fujun; Gilchrist, Michael A; Von Arnim, Albrecht G.

RNA ; 16(4): 748-61, 2010 Apr.

Article in English | MEDLINE | ID: mdl-20179149

ABSTRACT

Upstream open reading frames (uORFs) are protein coding elements in the 5' leader of messenger RNAs. uORFs generally inhibit translation of the main ORF because ribosomes that perform translation elongation suffer either permanent or conditional loss of reinitiation competence. After conditional loss, reinitiation competence may be regained by, at the minimum, reacquisition of a fresh methionyl-tRNA. The conserved h subunit of Arabidopsis eukaryotic initiation factor 3 (eIF3) mitigates the inhibitory effects of certain uORFs. Here, we define more precisely how this occurs, by combining gene expression data from mutated 5' leaders of Arabidopsis AtbZip11 (At4g34590) and yeast GCN4 with a computational model of translation initiation in wild-type and eif3h mutant plants. Of the four phylogenetically conserved uORFs in AtbZip11, three are inhibitory to translation, while one is anti-inhibitory. The mutation in eIF3h has no major effect on uORF start codon recognition. Instead, eIF3h supports efficient reinitiation after uORF translation. Modeling suggested that the permanent loss of reinitiation competence during uORF translation occurs at a faster rate in the mutant than in the wild type. Thus, eIF3h ensures that a fraction of uORF-translating ribosomes retain their competence to resume scanning. Experiments using the yeast GCN4 leader provided no evidence that eIF3h fosters tRNA reaquisition. Together, these results attribute a specific molecular function in translation initiation to an individual eIF3 subunit in a multicellular eukaryote.

Subject(s)

5' Untranslated Regions , Eukaryotic Initiation Factor-3/metabolism , Open Reading Frames , Peptide Chain Initiation, Translational , Protein Subunits/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Basic-Leucine Zipper Transcription Factors/genetics , Basic-Leucine Zipper Transcription Factors/metabolism , Codon, Initiator , Eukaryotic Initiation Factor-3/genetics , Mutation , Protein Biosynthesis , Protein Subunits/genetics , RNA, Messenger/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism

19.

Measuring and detecting molecular adaptation in codon usage against nonsense errors during protein translation.

Gilchrist, Michael A; Shah, Premal; Zaretzki, Russell.

Genetics ; 183(4): 1493-505, 2009 Dec.

Article in English | MEDLINE | ID: mdl-19822731

ABSTRACT

Codon usage bias (CUB) has been documented across a wide range of taxa and is the subject of numerous studies. While most explanations of CUB invoke some type of natural selection, most measures of CUB adaptation are heuristically defined. In contrast, we present a novel and mechanistic method for defining and contextualizing CUB adaptation to reduce the cost of nonsense errors during protein translation. Using a model of protein translation, we develop a general approach for measuring the protein production cost in the face of nonsense errors of a given allele as well as the mean and variance of these costs across its coding synonyms. We then use these results to define the nonsense error adaptation index (NAI) of the allele or a contiguous subset thereof. Conceptually, the NAI value of an allele is a relative measure of its elevation on a specific and well-defined adaptive landscape. To illustrate its utility, we calculate NAI values for the entire coding sequence and across a set of nonoverlapping windows for each gene in the Saccharomyces cerevisiae S288c genome. Our results provide clear evidence of adaptation to reduce the cost of nonsense errors and increasing adaptation with codon position and expression. The magnitude and nature of this adaptation are also largely consistent with simulation results in which nonsense errors are the only selective force driving CUB evolution. Because NAI is derived from mechanistic models, it is both easier to interpret and more amenable to future refinement than other commonly used measures of codon bias. Further, our approach can also be used as a starting point for developing other mechanistically derived measures of adaptation such as for translational accuracy.

Subject(s)

Adaptation, Physiological , Codon, Nonsense , Codon/genetics , Codon/metabolism , Protein Biosynthesis/genetics , Alleles , Genome, Fungal/genetics , Models, Genetic , Saccharomyces cerevisiae/genetics

20.

Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework.

Gilchrist, Michael A; Qin, Hong; Zaretzki, Russell.

BMC Bioinformatics ; 8: 403, 2007 Oct 18.

Article in English | MEDLINE | ID: mdl-17945026

ABSTRACT

BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome. RESULTS: Using the yeast Saccharomyces cerevisiae as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA. CONCLUSION: With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.

Subject(s)

Expressed Sequence Tags , Gene Expression Profiling/methods , Models, Genetic , Sequence Analysis, DNA/methods , Transcription Factors/genetics , Bayes Theorem , Computer Simulation , Data Interpretation, Statistical , Databases, Genetic , Reproducibility of Results , Sensitivity and Specificity

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL