Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Proc Natl Acad Sci U S A ; 120(4): e2213264120, 2023 01 24.
Article in English | MEDLINE | ID: mdl-36649423

ABSTRACT

Adaptive immunity is driven by specific binding of hypervariable receptors to diverse molecular targets. The sequence diversity of receptors and targets are both individually known but because multiple receptors can recognize the same target, a measure of the effective "functional" diversity of the human immune system has remained elusive. Here, we show that sequence near-coincidences within T cell receptors that bind specific epitopes provide a new window into this problem and allow the quantification of how binding probability covaries with sequence. We find that near-coincidence statistics within epitope-specific repertoires imply a measure of binding degeneracy to amino acid changes in receptor sequence that is consistent across disparate experiments. Paired data on both chains of the heterodimeric receptor are particularly revealing since simultaneous near-coincidences are rare and we show how they can be exploited to estimate the number of epitope responses that created the memory compartment. In addition, we find that paired-chain coincidences are strongly suppressed across donors with different human leukocyte antigens, evidence for a central role of antigen-driven selection in making paired chain receptors public. These results demonstrate the power of coincidence analysis to reveal the sequence determinants of epitope binding in receptor repertoires.


Subject(s)
Epitopes, T-Lymphocyte , Receptors, Antigen, T-Cell , Humans , Amino Acid Sequence
2.
Bioinformatics ; 35(17): 2974-2981, 2019 09 01.
Article in English | MEDLINE | ID: mdl-30657870

ABSTRACT

MOTIVATION: High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem. RESULTS: We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/zsethna/OLGA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Receptors, Antigen, T-Cell , Software , Algorithms , Amino Acid Sequence , Humans , Immunoglobulins , Likelihood Functions , V(D)J Recombination
3.
Immunol Rev ; 284(1): 167-179, 2018 07.
Article in English | MEDLINE | ID: mdl-29944757

ABSTRACT

Despite the extreme diversity of T-cell repertoires, many identical T-cell receptor (TCR) sequences are found in a large number of individual mice and humans. These widely shared sequences, often referred to as "public," have been suggested to be over-represented due to their potential immune functionality or their ease of generation by V(D)J recombination. Here, we show that even for large cohorts, the observed degree of sharing of TCR sequences between individuals is well predicted by a model accounting for the known quantitative statistical biases in the generation process, together with a simple model of thymic selection. Whether a sequence is shared by many individuals is predicted to depend on the number of queried individuals and the sampling depth, as well as on the sequence itself, in agreement with the data. We introduce the degree of publicness conditional on the queried cohort size and the size of the sampled repertoires. Based on these observations, we propose a public/private sequence classifier, "PUBLIC" (Public Universal Binary Likelihood Inference Classifier), based on the generation probability, which performs very well even for small cohort sizes.


Subject(s)
Receptors, Antigen, T-Cell/genetics , T-Lymphocytes/immunology , V(D)J Recombination/genetics , Algorithms , Animals , Humans , Mice , Receptors, Antigen, T-Cell/immunology , V(D)J Recombination/immunology
4.
Proc Natl Acad Sci U S A ; 114(9): 2253-2258, 2017 02 28.
Article in English | MEDLINE | ID: mdl-28196891

ABSTRACT

The ability of the adaptive immune system to respond to arbitrary pathogens stems from the broad diversity of immune cell surface receptors. This diversity originates in a stochastic DNA editing process (VDJ recombination) that acts on the surface receptor gene each time a new immune cell is created from a stem cell. By analyzing T-cell receptor (TCR) sequence repertoires taken from the blood and thymus of mice of different ages, we quantify the changes in the VDJ recombination process that occur from embryo to young adult. We find a rapid increase with age in the number of random insertions and a dramatic increase in diversity. Because the blood accumulates thymic output over time, blood repertoires are mixtures of different statistical recombination processes, and we unravel the mixture statistics to obtain a picture of the time evolution of the early immune system. Sequence repertoire analysis also allows us to detect the statistical impact of selection on the output of the VDJ recombination process. The effects we find are nearly identical between thymus and blood, suggesting that our analysis mainly detects selection for proper folding of the TCR receptor protein. We further find that selection is weaker in laboratory mice than in humans and it does not affect the diversity of the repertoire.


Subject(s)
Adaptive Immunity , Receptors, Antigen, T-Cell , T-Lymphocytes/immunology , V(D)J Recombination , Adaptive Immunity/genetics , Adaptive Immunity/immunology , Aging , Animals , Genetic Variation/genetics , Genetic Variation/immunology , Humans , Mice , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/immunology , Thymus Gland/immunology , V(D)J Recombination/genetics , V(D)J Recombination/immunology , VDJ Exons/genetics , VDJ Exons/immunology
5.
Philos Trans R Soc Lond B Biol Sci ; 370(1676)2015 Sep 05.
Article in English | MEDLINE | ID: mdl-26194757

ABSTRACT

We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.


Subject(s)
Antibody Diversity , B-Lymphocytes/immunology , Algorithms , Clonal Selection, Antigen-Mediated , Humans , Models, Genetic , Models, Immunological , Receptors, Antigen, B-Cell/genetics , Somatic Hypermutation, Immunoglobulin , V(D)J Recombination
6.
Proc Natl Acad Sci U S A ; 111(27): 9875-80, 2014 Jul 08.
Article in English | MEDLINE | ID: mdl-24941953

ABSTRACT

The efficient recognition of pathogens by the adaptive immune system relies on the diversity of receptors displayed at the surface of immune cells. T-cell receptor diversity results from an initial random DNA editing process, called VDJ recombination, followed by functional selection of cells according to the interaction of their surface receptors with self and foreign antigenic peptides. Using high-throughput sequence data from the ß-chain of human T-cell receptors, we infer factors that quantify the overall effect of selection on the elements of receptor sequence composition: the V and J gene choice and the length and amino acid composition of the variable region. We find a significant correlation between biases induced by VDJ recombination and our inferred selection factors together with a reduction of diversity during selection. Both effects suggest that natural selection acting on the recombination process has anticipated the selection pressures experienced during somatic evolution. The inferred selection factors differ little between donors or between naive and memory repertoires. The number of sequences shared between donors is well-predicted by our model, indicating a stochastic origin of such public sequences. Our approach is based on a probabilistic maximum likelihood method, which is necessary to disentangle the effects of selection from biases inherent in the recombination process.


Subject(s)
Receptors, Antigen, T-Cell, alpha-beta/genetics , Selection, Genetic , CD4-Positive T-Lymphocytes/immunology , Humans
7.
Proc Natl Acad Sci U S A ; 109(40): 16161-6, 2012 Oct 02.
Article in English | MEDLINE | ID: mdl-22988065

ABSTRACT

Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.


Subject(s)
Adaptive Immunity/genetics , Antibody Diversity/genetics , CD4-Positive T-Lymphocytes/metabolism , Genes, T-Cell Receptor beta/genetics , Models, Biological , V(D)J Recombination/genetics , Algorithms , Base Sequence , Computational Biology/methods , Humans , Likelihood Functions , Molecular Sequence Data , Sequence Alignment , Sequence Analysis, DNA
8.
Nat Biotechnol ; 30(3): 271-7, 2012 Feb 26.
Article in English | MEDLINE | ID: mdl-22371084

ABSTRACT

Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here we describe a massively parallel reporter assay (MPRA) that facilitates the systematic dissection of transcriptional regulatory elements. In MPRA, microarray-synthesized DNA regulatory elements and unique sequence tags are cloned into plasmids to generate a library of reporter constructs. These constructs are transfected into cells and tag expression is assayed by high-throughput sequencing. We apply MPRA to compare >27,000 variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon-ß enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to design enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.


Subject(s)
Biological Assay/methods , Enhancer Elements, Genetic , Genes, Reporter , Transcription Factors/genetics , Base Sequence , Binding Sites , Humans , Models, Genetic , Molecular Sequence Data , Mutagenesis , Sequence Alignment , Transcription Factors/metabolism , Transcription, Genetic
9.
Proc Natl Acad Sci U S A ; 107(20): 9158-63, 2010 May 18.
Article in English | MEDLINE | ID: mdl-20439748

ABSTRACT

Cells use protein-DNA and protein-protein interactions to regulate transcription. A biophysical understanding of this process has, however, been limited by the lack of methods for quantitatively characterizing the interactions that occur at specific promoters and enhancers in living cells. Here we show how such biophysical information can be revealed by a simple experiment in which a library of partially mutated regulatory sequences are partitioned according to their in vivo transcriptional activities and then sequenced en masse. Computational analysis of the sequence data produced by this experiment can provide precise quantitative information about how the regulatory proteins at a specific arrangement of binding sites work together to regulate transcription. This ability to reliably extract precise information about regulatory biophysics in the face of experimental noise is made possible by a recently identified relationship between likelihood and mutual information. Applying our experimental and computational techniques to the Escherichia coli lac promoter, we demonstrate the ability to identify regulatory protein binding sites de novo, determine the sequence-dependent binding energy of the proteins that bind these sites, and, importantly, measure the in vivo interaction energy between RNA polymerase and a DNA-bound transcription factor. Our approach provides a generally applicable method for characterizing the biophysical basis of transcriptional regulation by a specified regulatory sequence. The principles of our method can also be applied to a wide range of other problems in molecular biology.


Subject(s)
Gene Expression Regulation/genetics , Models, Biological , Mutation/genetics , Promoter Regions, Genetic/genetics , Base Sequence , Binding Sites/genetics , Biophysics , Computational Biology/methods , Escherichia coli , Flow Cytometry , Gene Expression Regulation/physiology , Green Fluorescent Proteins/metabolism , Lac Operon/genetics , Likelihood Functions , Molecular Sequence Data , Monte Carlo Method , Sequence Analysis, DNA , Thermodynamics
10.
Proc Natl Acad Sci U S A ; 107(12): 5405-10, 2010 Mar 23.
Article in English | MEDLINE | ID: mdl-20212159

ABSTRACT

Recognition of pathogens relies on families of proteins showing great diversity. Here we construct maximum entropy models of the sequence repertoire, building on recent experiments that provide a nearly exhaustive sampling of the IgM sequences in zebrafish. These models are based solely on pairwise correlations between residue positions but correctly capture the higher order statistical properties of the repertoire. By exploiting the interpretation of these models as statistical physics problems, we make several predictions for the collective properties of the sequence ensemble: The distribution of sequences obeys Zipf's law, the repertoire decomposes into several clusters, and there is a massive restriction of diversity because of the correlations. These predictions are completely inconsistent with models in which amino acid substitutions are made independently at each site and are in good agreement with the data. Our results suggest that antibody diversity is not limited by the sequences encoded in the genome and may reflect rapid adaptation to antigenic challenges. This approach should be applicable to the study of the global properties of other protein families.


Subject(s)
Antibody Diversity , Models, Immunological , Zebrafish/genetics , Zebrafish/immunology , Amino Acid Sequence , Animals , Base Sequence , Biophysical Phenomena , DNA/chemistry , DNA/genetics , Entropy , Evolution, Molecular , Immunoglobulin M/chemistry , Immunoglobulin M/genetics , Zebrafish Proteins/chemistry , Zebrafish Proteins/genetics
11.
Phys Rev E Stat Nonlin Soft Matter Phys ; 78(1 Pt 1): 011910, 2008 Jul.
Article in English | MEDLINE | ID: mdl-18763985

ABSTRACT

Changes in a cell's external or internal conditions are usually reflected in the concentrations of the relevant transcription factors. These proteins in turn modulate the expression levels of the genes under their control and sometimes need to perform nontrivial computations that integrate several inputs and affect multiple genes. At the same time, the activities of the regulated genes would fluctuate even if the inputs were held fixed, as a consequence of the intrinsic noise in the system, and such noise must fundamentally limit the reliability of any genetic computation. Here we use information theory to formalize the notion of information transmission in simple genetic regulatory elements in the presence of physically realistic noise sources. The dependence of this "channel capacity" on noise parameters, cooperativity and cost of making signaling molecules is explored systematically. We find that, in the range of parameters probed by recent in vivo measurements, capacities higher than one bit should be achievable. It is of course generally accepted that gene regulatory elements must, in order to function properly, have a capacity of at least one bit. The central point of our analysis is the demonstration that simple physical models of noisy gene transcription, with realistic parameters, can indeed achieve this capacity: it was not self-evident that this should be so. We also demonstrate that capacities significantly greater than one bit are possible, so that transcriptional regulation need not be limited to simple "on-off" components. The question whether real systems actually exploit this richer possibility is beyond the scope of this investigation.


Subject(s)
Biophysics/methods , Regulatory Sequences, Nucleic Acid , Algorithms , Animals , Computational Biology , Diffusion , Humans , Models, Biological , Models, Genetic , Models, Statistical , Transcription Factors/genetics , Transcription, Genetic
12.
Proc Natl Acad Sci U S A ; 105(34): 12376-81, 2008 Aug 26.
Article in English | MEDLINE | ID: mdl-18723669

ABSTRACT

We present a genomewide cross-species analysis of regulation for broad-acting transcription factors in yeast. Our model for binding site evolution is founded on biophysics: the binding energy between transcription factor and site is a quantitative phenotype of regulatory function, and selection is given by a fitness landscape that depends on this phenotype. The model quantifies conservation, as well as loss and gain, of functional binding sites in a coherent way. Its predictions are supported by direct cross-species comparison between four yeast species. We find ubiquitous compensatory mutations within functional sites, such that the energy phenotype and the function of a site evolve in a significantly more constrained way than does its sequence. We also find evidence for substantial evolution of regulatory function involving point mutations as well as sequence insertions and deletions within binding sites. Genes lose their regulatory link to a given transcription factor at a rate similar to the neutral point mutation rate, from which we infer a moderate average fitness advantage of functional over nonfunctional sites. In a wider context, this study provides an example of inference of selection acting on a quantitative molecular trait.


Subject(s)
Binding Sites/genetics , Evolution, Molecular , Fungal Proteins/genetics , Models, Genetic , Selection, Genetic , Thermodynamics , Transcription Factors/genetics , Epistasis, Genetic , Genome, Fungal , Mutation , Quantitative Trait, Heritable , Saccharomyces/genetics , Stochastic Processes
13.
Proc Natl Acad Sci U S A ; 105(34): 12265-70, 2008 Aug 26.
Article in English | MEDLINE | ID: mdl-18719112

ABSTRACT

In the simplest view of transcriptional regulation, the expression of a gene is turned on or off by changes in the concentration of a transcription factor (TF). We use recent data on noise levels in gene expression to show that it should be possible to transmit much more than just one regulatory bit. Realizing this optimal information capacity would require that the dynamic range of TF concentrations used by the cell, the input/output relation of the regulatory module, and the noise in gene expression satisfy certain matching relations, which we derive. These results provide parameter-free, quantitative predictions connecting independently measurable quantities. Although we have considered only the simplified problem of a single gene responding to a single TF, we find that these predictions are in surprisingly good agreement with recent experiments on the Bicoid/Hunchback system in the early Drosophila embryo and that this system achieves approximately 90% of its theoretical maximum information transmission.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Information Theory , Transcription, Genetic , Animals , DNA-Binding Proteins , Drosophila/genetics , Drosophila Proteins , Homeodomain Proteins , Transcription Factors
14.
Proc Natl Acad Sci U S A ; 104(2): 501-6, 2007 Jan 09.
Article in English | MEDLINE | ID: mdl-17197415

ABSTRACT

A cell's ability to regulate gene transcription depends in large part on the energy with which transcription factors (TFs) bind their DNA regulatory sites. Obtaining accurate models of this binding energy is therefore an important goal for quantitative biology. In this article, we present a principled likelihood-based approach for inferring physical models of TF-DNA binding energy from the data produced by modern high-throughput binding assays. Central to our analysis is the ability to assess the relative likelihood of different model parameters given experimental observations. We take a unique approach to this problem and show how to compute likelihood without any explicit assumptions about the noise that inevitably corrupts such measurements. Sampling possible choices for model parameters according to this likelihood function, we can then make probabilistic predictions for the identities of binding sites and their physical binding energies. Applying this procedure to previously published data on the Saccharomyces cerevisiae TF Abf1p, we find models of TF binding whose parameters are determined with remarkable precision. Evidence for the accuracy of these models is provided by an astonishing level of phylogenetic conservation in the predicted energies of putative binding sites. Results from in vivo and in vitro experiments also provide highly consistent characterizations of Abf1p, a result that contrasts with a previous analysis of the same data.


Subject(s)
DNA/chemistry , DNA/metabolism , Transcription Factors/chemistry , Transcription Factors/metabolism , Binding Sites , Biophysical Phenomena , Biophysics , DNA, Fungal/chemistry , DNA, Fungal/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , Likelihood Functions , Models, Chemical , Protein Array Analysis , Protein Binding , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...