Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 87
Filter
1.
PLoS Comput Biol ; 20(7): e1012224, 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-38995959

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) has become a popular experimental method to study variation of gene expression within a population of cells. However, obtaining an accurate picture of the diversity of distinct gene expression states that are present in a given dataset is highly challenging because the sparsity of the scRNA-seq data and its inhomogeneous measurement noise properties. Although a vast number of different methods is applied in the literature for clustering cells into subsets with 'similar' expression profiles, these methods generally lack rigorously specified objectives, involve multiple complex layers of normalization, filtering, feature selection, dimensionality-reduction, employ ad hoc measures of distance or similarity between cells, often ignore the known measurement noise properties of scRNA-seq measurements, and include a large number of tunable parameters. Consequently, it is virtually impossible to assign concrete biophysical meaning to the clusterings that result from these methods. Here we address the following problem: Given raw unique molecule identifier (UMI) counts of an scRNA-seq dataset, partition the cells into subsets such that the gene expression states of the cells in each subset are statistically indistinguishable, and each subset corresponds to a distinct gene expression state. That is, we aim to partition cells so as to maximally reduce the complexity of the dataset without removing any of its meaningful structure. We show that, given the known measurement noise structure of scRNA-seq data, this problem is mathematically well-defined and derive its unique solution from first principles. We have implemented this solution in a tool called Cellstates which operates directly on the raw data and automatically determines the optimal partition and cluster number, with zero tunable parameters. We show that, on synthetic datasets, Cellstates almost perfectly recovers optimal partitions. On real data, Cellstates robustly identifies subtle substructure within groups of cells that are traditionally annotated as a common cell type. Moreover, we show that the diversity of gene expression states that Cellstates identifies systematically depends on the tissue of origin and not on technical features of the experiments such as the total number of cells and total UMI count per cell. In addition to the Cellstates tool we also provide a small toolbox of software to place the identified cellstates into a hierarchical tree of higher-order clusters, to identify the most important differentially expressed genes at each branch of this hierarchy, and to visualize these results.

2.
Nat Commun ; 15(1): 4110, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38750024

ABSTRACT

Maturation of eukaryotic pre-mRNAs via splicing and polyadenylation is modulated across cell types and conditions by a variety of RNA-binding proteins (RBPs). Although there exist over 1,500 RBPs in human cells, their binding motifs and functions still remain to be elucidated, especially in the complex environment of tissues and in the context of diseases. To overcome the lack of methods for the systematic and automated detection of sequence motif-guided pre-mRNA processing regulation from RNA sequencing (RNA-Seq) data we have developed MAPP (Motif Activity on Pre-mRNA Processing). Applying MAPP to RBP knock-down experiments reveals that many RBPs regulate both splicing and polyadenylation of nascent transcripts by acting on similar sequence motifs. MAPP not only infers these sequence motifs, but also unravels the position-dependent impact of the RBPs on pre-mRNA processing. Interestingly, all investigated RBPs that act on both splicing and 3' end processing exhibit a consistently repressive or activating effect on both processes, providing a first glimpse on the underlying mechanism. Applying MAPP to normal and malignant brain tissue samples unveils that the motifs bound by the PTBP1 and RBFOX RBPs coordinately drive the oncogenic splicing program active in glioblastomas demonstrating that MAPP paves the way for characterizing pre-mRNA processing regulators under physiological and pathological conditions.


Subject(s)
Polyadenylation , RNA Precursors , RNA Splicing , RNA-Binding Proteins , Humans , RNA-Binding Proteins/metabolism , RNA-Binding Proteins/genetics , RNA Precursors/metabolism , RNA Precursors/genetics , Gene Expression Regulation, Neoplastic , Neoplasms/genetics , Neoplasms/metabolism , Nucleotide Motifs , Polypyrimidine Tract-Binding Protein/metabolism , Polypyrimidine Tract-Binding Protein/genetics , RNA Splicing Factors/metabolism , RNA Splicing Factors/genetics , Heterogeneous-Nuclear Ribonucleoproteins/metabolism , Heterogeneous-Nuclear Ribonucleoproteins/genetics , RNA, Messenger/metabolism , RNA, Messenger/genetics
4.
Genome Biol ; 24(1): 77, 2023 04 17.
Article in English | MEDLINE | ID: mdl-37069586

ABSTRACT

We present RCRUNCH, an end-to-end solution to CLIP data analysis for identification of binding sites and sequence specificity of RNA-binding proteins. RCRUNCH can analyze not only reads that map uniquely to the genome but also those that map to multiple genome locations or across splice boundaries and can consider various types of background in the estimation of read enrichment. By applying RCRUNCH to the eCLIP data from the ENCODE project, we have constructed a comprehensive and homogeneous resource of in-vivo-bound RBP sequence motifs. RCRUNCH automates the reproducible analysis of CLIP data, enabling studies of post-transcriptional control of gene expression.


Subject(s)
RNA-Binding Proteins , RNA , RNA/metabolism , Sequence Analysis, RNA , Binding Sites/genetics , Protein Binding , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism
5.
Proc Natl Acad Sci U S A ; 120(8): e2211091120, 2023 02 21.
Article in English | MEDLINE | ID: mdl-36780518

ABSTRACT

Microbes in the wild face highly variable and unpredictable environments and are naturally selected for their average growth rate across environments. Apart from using sensory regulatory systems to adapt in a targeted manner to changing environments, microbes employ bet-hedging strategies where cells in an isogenic population switch stochastically between alternative phenotypes. Yet, bet-hedging suffers from a fundamental trade-off: Increasing the phenotype-switching rate increases the rate at which maladapted cells explore alternative phenotypes but also increases the rate at which cells switch out of a well-adapted state. Consequently, it is currently believed that bet-hedging strategies are effective only when the number of possible phenotypes is limited and when environments last for sufficiently many generations. However, recent experimental results show that gene expression noise generally decreases with growth rate, suggesting that phenotype-switching rates may systematically decrease with growth rate. Such growth rate dependent stability (GRDS) causes cells to be more explorative when maladapted and more phenotypically stable when well-adapted, and we show that GRDS can almost completely overcome the trade-off that limits bet-hedging, allowing for effective adaptation even when environments are diverse and change rapidly. We further show that even a small decrease in switching rates of faster-growing phenotypes can substantially increase long-term fitness of bet-hedging strategies. Together, our results suggest that stochastic strategies may play an even bigger role for microbial adaptation than hitherto appreciated.


Subject(s)
Acclimatization , Biological Evolution , Phenotype , Adaptation, Physiological/genetics
6.
EMBO J ; 41(24): e111132, 2022 12 15.
Article in English | MEDLINE | ID: mdl-36345783

ABSTRACT

The cerebral cortex contains billions of neurons, and their disorganization or misspecification leads to neurodevelopmental disorders. Understanding how the plethora of projection neuron subtypes are generated by cortical neural stem cells (NSCs) is a major challenge. Here, we focused on elucidating the transcriptional landscape of murine embryonic NSCs, basal progenitors (BPs), and newborn neurons (NBNs) throughout cortical development. We uncover dynamic shifts in transcriptional space over time and heterogeneity within each progenitor population. We identified signature hallmarks of NSC, BP, and NBN clusters and predict active transcriptional nodes and networks that contribute to neural fate specification. We find that the expression of receptors, ligands, and downstream pathway components is highly dynamic over time and throughout the lineage implying differential responsiveness to signals. Thus, we provide an expansive compendium of gene expression during cortical development that will be an invaluable resource for studying neural developmental processes and neurodevelopmental disorders.


Subject(s)
Neural Stem Cells , Neurons , Animals , Mice , Cell Differentiation , Cell Lineage/genetics , Cerebral Cortex , Embryonic Stem Cells , Neurogenesis/genetics , Neurons/metabolism
7.
Nat Genet ; 54(7): 1037-1050, 2022 07.
Article in English | MEDLINE | ID: mdl-35789323

ABSTRACT

Zebrafish, a popular organism for studying embryonic development and for modeling human diseases, has so far lacked a systematic functional annotation program akin to those in other animal models. To address this, we formed the international DANIO-CODE consortium and created a central repository to store and process zebrafish developmental functional genomic data. Our data coordination center ( https://danio-code.zfin.org ) combines a total of 1,802 sets of unpublished and re-analyzed published genomic data, which we used to improve existing annotations and show its utility in experimental design. We identified over 140,000 cis-regulatory elements throughout development, including classes with distinct features dependent on their activity in time and space. We delineated the distinct distance topology and chromatin features between regulatory elements active during zygotic genome activation and those active during organogenesis. Finally, we matched regulatory elements and epigenomic landscapes between zebrafish and mouse and predicted functional relationships between them beyond sequence similarity, thus extending the utility of zebrafish developmental genomics to mammals.


Subject(s)
Databases, Genetic , Gene Expression Regulation, Developmental , Genome , Genomics , Regulatory Sequences, Nucleic Acid , Zebrafish Proteins , Zebrafish , Animals , Chromatin/genetics , Genome/genetics , Humans , Mice , Molecular Sequence Annotation , Organogenesis/genetics , Regulatory Sequences, Nucleic Acid/genetics , Zebrafish/embryology , Zebrafish/genetics , Zebrafish Proteins/genetics
8.
PLoS Biol ; 19(12): e3001491, 2021 12.
Article in English | MEDLINE | ID: mdl-34919538

ABSTRACT

Although it is well appreciated that gene expression is inherently noisy and that transcriptional noise is encoded in a promoter's sequence, little is known about the extent to which noise levels of individual promoters vary across growth conditions. Using flow cytometry, we here quantify transcriptional noise in Escherichia coli genome-wide across 8 growth conditions and find that noise levels systematically decrease with growth rate, with a condition-dependent lower bound on noise. Whereas constitutive promoters consistently exhibit low noise in all conditions, regulated promoters are both more noisy on average and more variable in noise across conditions. Moreover, individual promoters show highly distinct variation in noise across conditions. We show that a simple model of noise propagation from regulators to their targets can explain a significant fraction of the variation in relative noise levels and identifies TFs that most contribute to both condition-specific and condition-independent noise propagation. In addition, analysis of the genome-wide correlation structure of various gene properties shows that gene regulation, expression noise, and noise plasticity are all positively correlated genome-wide and vary independently of variations in absolute expression, codon bias, and evolutionary rate. Together, our results show that while absolute expression noise tends to decrease with growth rate, relative noise levels of genes are highly condition-dependent and determined by the propagation of noise through the gene regulatory network.


Subject(s)
Escherichia coli/genetics , Gene Expression Regulation, Bacterial/genetics , Promoter Regions, Genetic/genetics , Escherichia coli Proteins/genetics , Gene Expression/genetics , Gene Expression Profiling/methods , Gene Regulatory Networks/genetics , Genes, Reporter/genetics , Transcriptome/genetics
10.
Nat Biotechnol ; 39(8): 1008-1016, 2021 08.
Article in English | MEDLINE | ID: mdl-33927416

ABSTRACT

Despite substantial progress in single-cell RNA-seq (scRNA-seq) data analysis methods, there is still little agreement on how to best normalize such data. Starting from the basic requirements that inferred expression states should correct for both biological and measurement sampling noise and that changes in expression should be measured in terms of fold changes, we here derive a Bayesian normalization procedure called Sanity (SAmpling-Noise-corrected Inference of Transcription activitY) from first principles. Sanity estimates expression values and associated error bars directly from raw unique molecular identifier (UMI) counts without any tunable parameters. Using simulated and real scRNA-seq datasets, we show that Sanity outperforms other normalization methods on downstream tasks, such as finding nearest-neighbor cells and clustering cells into subtypes. Moreover, we show that by systematically overestimating the expression variability of genes with low expression and by introducing spurious correlations through mapping the data to a lower-dimensional representation, other methods yield severely distorted pictures of the data.


Subject(s)
RNA-Seq/methods , Single-Cell Analysis/methods , Transcriptome/genetics , Animals , Bayes Theorem , Cells, Cultured , Cluster Analysis , Databases, Genetic , Humans , Mice , Models, Statistical
11.
Elife ; 102021 01 08.
Article in English | MEDLINE | ID: mdl-33416498

ABSTRACT

Although recombination is accepted to be common in bacteria, for many species robust phylogenies with well-resolved branches can be reconstructed from whole genome alignments of strains, and these are generally interpreted to reflect clonal relationships. Using new methods based on the statistics of single-nucleotide polymorphism (SNP) splits, we show that this interpretation is incorrect. For many species, each locus has recombined many times along its line of descent, and instead of many loci supporting a common phylogeny, the phylogeny changes many thousands of times along the genome alignment. Analysis of the patterns of allele sharing among strains shows that bacterial populations cannot be approximated as either clonal or freely recombining but are structured such that recombination rates between lineages vary over several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect distributions of recombination rates.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Phylogeny , Recombination, Genetic , Bacillus subtilis/classification , Bacillus subtilis/genetics , Bacteria/classification , Escherichia coli/classification , Escherichia coli/genetics , Evolution, Molecular , Helicobacter pylori/classification , Helicobacter pylori/genetics , Mycobacterium tuberculosis/classification , Mycobacterium tuberculosis/genetics , Polymorphism, Single Nucleotide , Salmonella enterica/classification , Salmonella enterica/genetics , Sequence Analysis, DNA , Staphylococcus aureus/classification , Staphylococcus aureus/genetics , Whole Genome Sequencing
12.
PLoS Biol ; 18(12): e3000952, 2020 12.
Article in English | MEDLINE | ID: mdl-33270631

ABSTRACT

Populations of bacteria often undergo a lag in growth when switching conditions. Because growth lags can be large compared to typical doubling times, variations in growth lag are an important but often overlooked component of bacterial fitness in fluctuating environments. We here explore how growth lag variation is determined for the archetypical switch from glucose to lactose as a carbon source in Escherichia coli. First, we show that single-cell lags are bimodally distributed and controlled by a single-molecule trigger. That is, gene expression noise causes the population before the switch to divide into subpopulations with zero and nonzero lac operon expression. While "sensorless" cells with zero preexisting lac expression at the switch have long lags because they are unable to sense the lactose signal, any nonzero lac operon expression suffices to ensure a short lag. Second, we show that the growth lag at the population level depends crucially on the fraction of sensorless cells and that this fraction in turn depends sensitively on the growth condition before the switch. Consequently, even small changes in basal expression can significantly affect the fraction of sensorless cells, thereby population lags and fitness under switching conditions, and may thus be subject to significant natural selection. Indeed, we show that condition-dependent population lags vary across wild E. coli isolates. Since many sensory genes are naturally low expressed in conditions where their inducer is not present, bimodal responses due to subpopulations of sensorless cells may be a general mechanism inducing phenotypic heterogeneity and controlling population lags in switching environments. This mechanism also illustrates how gene expression noise can turn even a simple sensory gene circuit into a bet hedging module and underlines the profound role of gene expression noise in regulatory responses.


Subject(s)
Escherichia coli/metabolism , Gene Expression Regulation, Bacterial/genetics , Genetic Fitness/physiology , Bacteria/genetics , Bacteria/metabolism , Environment , Escherichia coli/genetics , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial/physiology , Gene Regulatory Networks/genetics , Gene-Environment Interaction , Genetic Fitness/genetics , Glucose/metabolism , Lac Operon , Lactose/metabolism , Phenotype
13.
PLoS One ; 15(10): e0240233, 2020.
Article in English | MEDLINE | ID: mdl-33045012

ABSTRACT

Fluorescence flow cytometry is increasingly being used to quantify single-cell expression distributions in bacteria in high-throughput. However, there has been no systematic investigation into the best practices for quantitative analysis of such data, what systematic biases exist, and what accuracy and sensitivity can be obtained. We investigate these issues by measuring the same E. coli strains carrying fluorescent reporters using both flow cytometry and microscopic setups and systematically comparing the resulting single-cell expression distributions. Using these results, we develop methods for rigorous quantitative inference of single-cell expression distributions from fluorescence flow cytometry data. First, we present a Bayesian mixture model to separate debris from viable cells using all scattering signals. Second, we show that cytometry measurements of fluorescence are substantially affected by autofluorescence and shot noise, which can be mistaken for intrinsic noise in gene expression, and present methods to correct for these using calibration measurements. Finally, we show that because forward- and side-scatter signals scale non-linearly with cell size, and are also affected by a substantial shot noise component that cannot be easily calibrated unless independent measurements of cell size are available, it is not possible to accurately estimate the variability in the sizes of individual cells using flow cytometry measurements alone. To aid other researchers with quantitative analysis of flow cytometry expression data in bacteria, we distribute E-Flow, an open-source R package that implements our methods for filtering debris and for estimating true biological expression means and variances from the fluorescence signal. The package is available at https://github.com/vanNimwegenLab/E-Flow.


Subject(s)
Escherichia coli/genetics , Flow Cytometry , Genes, Bacterial , Single-Cell Analysis , Transcriptome , Flow Cytometry/methods , Fluorescence , Green Fluorescent Proteins/genetics , Microscopy, Fluorescence
14.
Sci Rep ; 10(1): 4625, 2020 03 13.
Article in English | MEDLINE | ID: mdl-32170161

ABSTRACT

Neural stem cells (NSCs) generate neurons of the cerebral cortex with distinct morphologies and functions. How specific neuron production, differentiation and migration are orchestrated is unclear. Hippo signaling regulates gene expression through Tead transcription factors (TFs). We show that Hippo transcriptional coactivators Yap1/Taz and the Teads have distinct functions during cortical development. Yap1/Taz promote NSC maintenance and Satb2+ neuron production at the expense of Tbr1+ neuron generation. However, Teads have moderate effects on NSC maintenance and do not affect Satb2+ neuron differentiation. Conversely, whereas Tead2 blocks Tbr1+ neuron formation, Tead1 and Tead3 promote this early fate. In addition, we found that Hippo effectors regulate neuronal migration to the cortical plate (CP) in a reciprocal fashion, that ApoE, Dab2 and Cyr61 are Tead targets, and these contribute to neuronal fate determination and migration. Our results indicate that multifaceted Hippo signaling is pivotal in different aspects of cortical development.


Subject(s)
Cerebral Cortex/growth & development , DNA-Binding Proteins/genetics , Signal Transduction , Transcription Factors/metabolism , Animals , Cell Adhesion Molecules, Neuronal/genetics , Cell Line , Cerebral Cortex/metabolism , Chromatin Immunoprecipitation , DNA-Binding Proteins/metabolism , Extracellular Matrix Proteins/genetics , Female , Hippo Signaling Pathway , Humans , Mice , Nerve Tissue Proteins/genetics , Neural Stem Cells , Organ Specificity , Protein Serine-Threonine Kinases/genetics , Reelin Protein , Serine Endopeptidases/genetics , TEA Domain Transcription Factors , Transcription Factors/genetics
15.
Elife ; 82019 11 11.
Article in English | MEDLINE | ID: mdl-31710292

ABSTRACT

Living cells proliferate by completing and coordinating two cycles, a division cycle controlling cell size and a DNA replication cycle controlling the number of chromosomal copies. It remains unclear how bacteria such as Escherichia coli tightly coordinate those two cycles across a wide range of growth conditions. Here, we used time-lapse microscopy in combination with microfluidics to measure growth, division and replication in single E. coli cells in both slow and fast growth conditions. To compare different phenomenological cell cycle models, we introduce a statistical framework assessing their ability to capture the correlation structure observed in the data. In combination with stochastic simulations, our data indicate that the cell cycle is driven from one initiation event to the next rather than from birth to division and is controlled by two adder mechanisms: the added volume since the last initiation event determines the timing of both the next division and replication initiation events.


Subject(s)
Cell Cycle/genetics , Chromosomes, Bacterial/genetics , DNA Replication/genetics , DNA, Bacterial/genetics , Escherichia coli/genetics , Cell Division/genetics , Escherichia coli/cytology , Escherichia coli/growth & development , Microfluidic Analytical Techniques/methods , Microscopy, Fluorescence , Microscopy, Phase-Contrast , Models, Genetic , Single-Cell Analysis/methods , Time-Lapse Imaging/methods
16.
Genome Res ; 29(7): 1164-1177, 2019 07.
Article in English | MEDLINE | ID: mdl-31138617

ABSTRACT

Although ChIP-seq has become a routine experimental approach for quantitatively characterizing the genome-wide binding of transcription factors (TFs), computational analysis procedures remain far from standardized, making it difficult to compare ChIP-seq results across experiments. In addition, although genome-wide binding patterns must ultimately be determined by local constellations of DNA-binding sites, current analysis is typically limited to identifying enriched motifs in ChIP-seq peaks. Here we present Crunch, a completely automated computational method that performs all ChIP-seq analysis from quality control through read mapping and peak detecting and that integrates comprehensive modeling of the ChIP signal in terms of known and novel binding motifs, quantifying the contribution of each motif and annotating which combinations of motifs explain each binding peak. By applying Crunch to 128 data sets from the ENCODE Project, we show that Crunch outperforms current peak finders and find that TFs naturally separate into "solitary TFs," for which a single motif explains the ChIP-peaks, and "cobinding TFs," for which multiple motifs co-occur within peaks. Moreover, for most data sets, the motifs that Crunch identified de novo outperform known motifs, and both the set of cobinding motifs and the top motif of solitary TFs are consistent across experiments and cell lines. Crunch is implemented as a web server, enabling standardized analysis of any collection of ChIP-seq data sets by simply uploading raw sequencing data. Results are provided both in a graphical web interface and as downloadable files.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Computational Biology/methods , Transcription Factors/metabolism , Amino Acid Motifs , Animals , Binding Sites , Datasets as Topic , Humans , Nucleotide Motifs , Quality Control , Regulatory Sequences, Nucleic Acid
17.
Mol Syst Biol ; 14(8): e8266, 2018 08 27.
Article in English | MEDLINE | ID: mdl-30150282

ABSTRACT

miRNAs are small RNAs that regulate gene expression post-transcriptionally. By repressing the translation and promoting the degradation of target mRNAs, miRNAs may reduce the cell-to-cell variability in protein expression, induce correlations between target expression levels, and provide a layer through which targets can influence each other's expression as "competing RNAs" (ceRNAs). However, experimental evidence for these behaviors is limited. Combining mathematical modeling with RNA sequencing of individual human embryonic kidney cells in which the expression of two distinct miRNAs was induced over a wide range, we have inferred parameters describing the response of hundreds of miRNA targets to miRNA induction. Individual targets have widely different response dynamics, and only a small proportion of predicted targets exhibit high sensitivity to miRNA induction. Our data reveal for the first time the response parameters of the entire network of endogenous miRNA targets to miRNA induction, demonstrating that miRNAs correlate target expression and at the same time increase the variability in expression of individual targets across cells. The approach is generalizable to other miRNAs and post-transcriptional regulators to improve the understanding of gene expression dynamics in individual cell types.


Subject(s)
Gene Regulatory Networks/genetics , MicroRNAs/genetics , RNA, Messenger/genetics , Single-Cell Analysis , Computational Biology , Gene Expression Profiling , Gene Expression Regulation/genetics , HEK293 Cells , Humans , Models, Theoretical , Sequence Analysis, RNA
18.
Genome Biol ; 19(1): 44, 2018 03 28.
Article in English | MEDLINE | ID: mdl-29592812

ABSTRACT

3' Untranslated regions (3' UTRs) length is regulated in relation to cellular state. To uncover key regulators of poly(A) site use in specific conditions, we have developed PAQR, a method for quantifying poly(A) site use from RNA sequencing data and KAPAC, an approach that infers activities of oligomeric sequence motifs on poly(A) site choice. Application of PAQR and KAPAC to RNA sequencing data from normal and tumor tissue samples uncovers motifs that can explain changes in cleavage and polyadenylation in specific cancers. In particular, our analysis points to polypyrimidine tract binding protein 1 as a regulator of poly(A) site choice in glioblastoma.


Subject(s)
3' Untranslated Regions , Polyadenylation , Sequence Analysis, RNA , Glioblastoma/genetics , Glioblastoma/metabolism , Humans , Male , Nucleotide Motifs , Polypyrimidine Tract-Binding Protein/metabolism , Prostatic Neoplasms/genetics , Prostatic Neoplasms/metabolism , RNA-Binding Proteins/metabolism , mRNA Cleavage and Polyadenylation Factors/metabolism
19.
Nat Commun ; 9(1): 212, 2018 01 15.
Article in English | MEDLINE | ID: mdl-29335514

ABSTRACT

Much is still not understood about how gene regulatory interactions control cell fate decisions in single cells, in part due to the difficulty of directly observing gene regulatory processes in vivo. We introduce here a novel integrated setup consisting of a microfluidic chip and accompanying analysis software that enable long-term quantitative tracking of growth and gene expression in single cells. The dual-input Mother Machine (DIMM) chip enables controlled and continuous variation of external conditions, allowing direct observation of gene regulatory responses to changing conditions in single cells. The Mother Machine Analyzer (MoMA) software achieves unprecedented accuracy in segmenting and tracking cells, and streamlines high-throughput curation with a novel leveraged editing procedure. We demonstrate the power of the method by uncovering several novel features of an iconic gene regulatory program: the induction of Escherichia coli's lac operon in response to a switch from glucose to lactose.


Subject(s)
Gene Expression Regulation, Bacterial , Microfluidic Analytical Techniques/methods , Single-Cell Analysis/methods , Software , Algorithms , Cell Tracking/instrumentation , Cell Tracking/methods , Escherichia coli/cytology , Escherichia coli/drug effects , Escherichia coli/genetics , Glucose/pharmacology , Lac Operon/genetics , Lactose/pharmacology , Single-Cell Analysis/instrumentation
20.
PLoS Comput Biol ; 13(7): e1005176, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28753602

ABSTRACT

Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform 'motif finding', i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT 'dilogo' motifs.


Subject(s)
Binding Sites/genetics , Computational Biology/methods , DNA , Nucleotide Motifs/genetics , Transcription Factors , DNA/chemistry , DNA/genetics , DNA/metabolism , Models, Statistical , RNA/chemistry , RNA/genetics , RNA/metabolism , Sequence Analysis, DNA , Transcription Factors/chemistry , Transcription Factors/genetics , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...