Search | VHL Regional Portal

Generating information-dense promoter sequences with optimal string packing.

Andreani, Virgile; South, Eric J; Dunlop, Mary J.

bioRxiv ; 2024 Feb 02.

Article in English | MEDLINE | ID: mdl-37961203

ABSTRACT

Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then leverage modern integer linear programming solvers. Our method optimally packs libraries of 20-100 binding sites into dense nucleotide arrays of 50-300 base pairs in 0.05-10 seconds. Unlike approximation algorithms or meta-heuristics, our approach finds provably optimal solutions. We demonstrate how our method can generate large sets of diverse sequences suitable for library generation, where the frequency of binding site usage across the returned sequences can be controlled by modulating the objective function. As an example, we then show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The nucleotide string packing approach we present can accelerate the design of sequences with complex DNA-protein interactions. When used in combination with synthesis and high-throughput screening, this design strategy could help interrogate how complex binding site arrangements impact either gene expression or biomolecular mechanisms in varied cellular contexts. Author Summary: The way protein binding sites are arranged on DNA can control the regulation and transcription of downstream genes. Areas with a high concentration of binding sites can enable complex interplay between transcription factors, a feature that is exploited by natural promoters. However, designing synthetic promoters that contain dense arrangements of binding sites is a challenge. The task involves overlapping many binding sites, each typically about 10 nucleotides long, within a constrained sequence area, which becomes increasingly difficult as sequence length decreases, and binding site variety increases. We introduce an approach to design nucleotide sequences with optimally packed protein binding sites, which we call the nucleotide String Packing Problem (SPP). We show that the SPP can be solved efficiently using integer linear programming to identify the densest arrangements of binding sites for a specified sequence length. We show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The presented approach enables the rapid design and study of nucleotide sequences with complex, dense binding site architectures.

Comprehensive Screening of a Light-Inducible Split Cre Recombinase with Domain Insertion Profiling.

Tague, Nathan; Andreani, Virgile; Fan, Yunfan; Timp, Winston; Dunlop, Mary J.

ACS Synth Biol ; 12(10): 2834-2842, 2023 10 20.

Article in English | MEDLINE | ID: mdl-37788288

ABSTRACT

Splitting proteins with light- or chemically inducible dimers provides a mechanism for post-translational control of protein function. However, current methods for engineering stimulus-responsive split proteins often require significant protein engineering expertise and the laborious screening of individual constructs. To address this challenge, we use a pooled library approach that enables rapid generation and screening of nearly all possible split protein constructs in parallel, where results can be read out by using sequencing. We perform our method on Cre recombinase with optogenetic dimers as a proof of concept, resulting in comprehensive data on the split sites throughout the protein. To improve the accuracy in predicting split protein behavior, we develop a Bayesian computational approach to contextualize errors inherent to experimental procedures. Overall, our method provides a streamlined approach for achieving inducible post-translational control of a protein of interest.

Subject(s)

Genetic Engineering , Integrases , Genetic Engineering/methods , Bayes Theorem , Integrases/genetics , Integrases/metabolism , Protein Engineering , Proteins

PyMC: a modern, and comprehensive probabilistic programming framework in Python.

Abril-Pla, Oriol; Andreani, Virgile; Carroll, Colin; Dong, Larry; Fonnesbeck, Christopher J; Kochurov, Maxim; Kumar, Ravin; Lao, Junpeng; Luhmann, Christian C; Martin, Osvaldo A; Osthege, Michael; Vieira, Ricardo; Wiecki, Thomas; Zinkov, Robert.

PeerJ Comput Sci ; 9: e1516, 2023.

Article in English | MEDLINE | ID: mdl-37705656

ABSTRACT

PyMC is a probabilistic programming library for Python that provides tools for constructing and fitting Bayesian models. It offers an intuitive, readable syntax that is close to the natural syntax statisticians use to describe models. PyMC leverages the symbolic computation library PyTensor, allowing it to be compiled into a variety of computational backends, such as C, JAX, and Numba, which in turn offer access to different computational architectures including CPU, GPU, and TPU. Being a general modeling framework, PyMC supports a variety of models including generalized hierarchical linear regression and classification, time series, ordinary differential equations (ODEs), and non-parametric models such as Gaussian processes (GPs). We demonstrate PyMC's versatility and ease of use with examples spanning a range of common statistical models. Additionally, we discuss the positive role of PyMC in the development of the open-source ecosystem for probabilistic programming.

Comprehensive screening of a light-inducible split Cre recombinase with domain insertion profiling.

Tague, Nathan; Andreani, Virgile; Fan, Yunfan; Timp, Winston; Dunlop, Mary J.

bioRxiv ; 2023 May 26.

Article in English | MEDLINE | ID: mdl-37293111

ABSTRACT

Splitting proteins with light- or chemically-inducible dimers provides a mechanism for post-translational control of protein function. However, current methods for engineering stimulus-responsive split proteins often require significant protein engineering expertise and laborious screening of individual constructs. To address this challenge, we use a pooled library approach that enables rapid generation and screening of nearly all possible split protein constructs in parallel, where results can be read out using sequencing. We perform our method on Cre recombinase with optogenetic dimers as a proof of concept, resulting in comprehensive data on split sites throughout the protein. To improve accuracy in predicting split protein behavior, we develop a Bayesian computational approach to contextualize errors inherent to experimental procedures. Overall, our method provides a streamlined approach for achieving inducible post-translational control of a protein of interest.

Dynamic gene expression and growth underlie cell-to-cell heterogeneity in Escherichia coli stress response.

Sampaio, Nadia M V; Blassick, Caroline M; Andreani, Virgile; Lugagne, Jean-Baptiste; Dunlop, Mary J.

Proc Natl Acad Sci U S A ; 119(14): e2115032119, 2022 04 05.

Article in English | MEDLINE | ID: mdl-35344432

ABSTRACT

Cell-to-cell heterogeneity in gene expression and growth can have critical functional consequences, such as determining whether individual bacteria survive or die following stress. Although phenotypic variability is well documented, the dynamics that underlie it are often unknown. This information is important because dramatically different outcomes can arise from gradual versus rapid changes in expression and growth. Using single-cell time-lapse microscopy, we measured the temporal expression of a suite of stress-response reporters in Escherichia coli, while simultaneously monitoring growth rate. In conditions without stress, we found several examples of pulsatile expression. Single-cell growth rates were often anticorrelated with reporter levels, with changes in growth preceding changes in expression. These dynamics have functional consequences, which we demonstrate by measuring survival after challenging cells with the antibiotic ciprofloxacin. Our results suggest that fluctuations in both gene expression and growth dynamics in stress-response networks have direct consequences on survival.

Subject(s)

Escherichia coli , Gene Expression Regulation, Bacterial , Stress, Physiological , Escherichia coli/genetics , Escherichia coli/growth & development , Gene Expression , Phenotype , Single-Cell Analysis , Stress, Physiological/genetics

Applying ecological resistance and resilience to dissect bacterial antibiotic responses.

Meredith, Hannah R; Andreani, Virgile; Ma, Helena R; Lopatkin, Allison J; Lee, Anna J; Anderson, Deverick J; Batt, Gregory; You, Lingchong.

Sci Adv ; 4(12): eaau1873, 2018 12.

Article in English | MEDLINE | ID: mdl-30525104

ABSTRACT

An essential property of microbial communities is the ability to survive a disturbance. Survival can be achieved through resistance, the ability to absorb effects of a disturbance without a notable change, or resilience, the ability to recover after being perturbed by a disturbance. These concepts have long been applied to the analysis of ecological systems, although their interpretations are often subject to debate. Here, we show that this framework readily lends itself to the dissection of the bacterial response to antibiotic treatment, where both terms can be unambiguously defined. The ability to tolerate the antibiotic treatment in the short term corresponds to resistance, which primarily depends on traits associated with individual cells. In contrast, the ability to recover after being perturbed by an antibiotic corresponds to resilience, which primarily depends on traits associated with the population. This framework effectively reveals the phenotypic signatures of bacterial pathogens expressing extended-spectrum ß-lactamases (ESBLs) when treated by a ß-lactam antibiotic. Our analysis has implications for optimizing treatment of these pathogens using a combination of a ß-lactam and a ß-lactamase (Bla) inhibitor. In particular, our results underscore the need to dynamically optimize combination treatments based on the quantitative features of the bacterial response to the antibiotic or the Bla inhibitor.

Subject(s)

Anti-Bacterial Agents/pharmacology , Bacteria/drug effects , Bacterial Physiological Phenomena , Drug Resistance, Bacterial , Bacteria/genetics , Humans , Microbial Viability/drug effects , Models, Biological , Phenotype , beta-Lactamases/genetics , beta-Lactamases/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL