Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Cell Syst ; 14(11): 968-978.e3, 2023 11 15.
Article in English | MEDLINE | ID: mdl-37909046

ABSTRACT

Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial-intelligence-driven protein design. However, we lack a sufficient understanding of how very large-scale models and data play a role in effective protein model development. We introduce a suite of protein language models, named ProGen2, that are scaled up to 6.4B parameters and trained on different sequence datasets drawn from over a billion proteins from genomic, metagenomic, and immune repertoire databases. ProGen2 models show state-of-the-art performance in capturing the distribution of observed evolutionary sequences, generating novel viable sequences, and predicting protein fitness without additional fine-tuning. As large model sizes and raw numbers of protein sequences continue to become more widely accessible, our results suggest that a growing emphasis needs to be placed on the data distribution provided to a protein sequence model. Our models and code are open sourced for widespread adoption in protein engineering. A record of this paper's Transparent Peer Review process is included in the supplemental information.


Subject(s)
Artificial Intelligence , Proteins , Proteins/genetics , Amino Acid Sequence , Language , Databases, Factual
2.
J Mach Learn Res ; 24(23)2023.
Article in English | MEDLINE | ID: mdl-37206375

ABSTRACT

Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic-such as a subset of variables-that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing data selection, the "Stein volume criterion (SVC)", that does not require fitting a nonparametric model. The SVC takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback-Leibler divergence. We prove that the SVC is consistent for data selection, and establish consistency and asymptotic normality of the corresponding generalized posterior on parameters. We apply the SVC to the analysis of single-cell RNA sequencing data sets using probabilistic principal components analysis and a spin glass model of gene regulation.

3.
Nat Ecol Evol ; 6(5): 590-603, 2022 05.
Article in English | MEDLINE | ID: mdl-35361892

ABSTRACT

Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutations. Here, through the comprehensive analysis of a bacterial toxin-antitoxin system, we identified all possible single substitutions in the toxin that enable it to tolerate otherwise interface-disrupting mutations in its antitoxin. Strikingly, the majority of enabling mutations in the toxin do not contact and promote tolerance non-specifically to many different antitoxin mutations, despite covariation in homologues occurring primarily between specific pairs of contacting residues across the interface. In addition, the enabling mutations we identified expand future mutational paths that both maintain old toxin-antitoxin interactions and form new ones. These non-specific mutations are missed by widely used covariation and machine learning methods. Identifying such enabling mutations will be critical for ensuring continued binding of therapeutically relevant proteins, such as antibodies, aimed at evolving targets.


Subject(s)
Antitoxins , Bacterial Toxins , Amino Acid Sequence , Antitoxins/chemistry , Antitoxins/genetics , Antitoxins/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Bacterial Toxins/chemistry , Bacterial Toxins/genetics , Bacterial Toxins/metabolism , Mutation
4.
Stem Cell Reports ; 10(6): 1991-2004, 2018 06 05.
Article in English | MEDLINE | ID: mdl-29779896

ABSTRACT

Human induced pluripotent stem cell (iPSC)-derived neurons are an attractive substrate for modeling disease, yet the heterogeneity of these cultures presents a challenge for functional characterization by manual patch-clamp electrophysiology. Here, we describe an optimized all-optical electrophysiology, "Optopatch," pipeline for high-throughput functional characterization of human iPSC-derived neuronal cultures. We demonstrate the method in a human iPSC-derived motor neuron (iPSC-MN) model of amyotrophic lateral sclerosis (ALS). In a comparison of iPSC-MNs with an ALS-causing mutation (SOD1 A4V) with their genome-corrected controls, the mutants showed elevated spike rates under weak or no stimulus and greater likelihood of entering depolarization block under strong optogenetic stimulus. We compared these results with numerical simulations of simple conductance-based neuronal models and with literature results in this and other iPSC-based models of ALS. Our data and simulations suggest that deficits in slowly activating potassium channels may underlie the changes in electrophysiology in the SOD1 A4V mutation.


Subject(s)
Electrophysiological Phenomena , Induced Pluripotent Stem Cells/cytology , Motor Neurons/cytology , Motor Neurons/physiology , Action Potentials , Amyotrophic Lateral Sclerosis , Biomarkers , Gene Editing , Gene Expression , Humans , Molecular Imaging , Mutation , Superoxide Dismutase-1/genetics , Superoxide Dismutase-1/metabolism
5.
J Neurosci ; 36(43): 11059-11073, 2016 10 26.
Article in English | MEDLINE | ID: mdl-27798186

ABSTRACT

Recent advances in optogenetics have enabled simultaneous optical perturbation and optical readout of membrane potential in diverse cell types. Here, we develop and characterize a Cre-dependent transgenic Optopatch2 mouse line that we call Floxopatch. The animals expressed a blue-shifted channelrhodopsin, CheRiff, and a near infrared Archaerhodopsin-derived voltage indicator, QuasAr2, via targeted knock-in at the rosa26 locus. In Optopatch-expressing animals, we tested for overall health, genetically targeted expression, and function of the optogenetic components. In offspring of Floxopatch mice crossed with a variety of Cre driver lines, we observed spontaneous and optically evoked activity in vitro in acute brain slices and in vivo in somatosensory ganglia. Cell-type-specific expression allowed classification and characterization of neuronal subtypes based on their firing patterns. The Floxopatch mouse line is a useful tool for fast and sensitive characterization of neural activity in genetically specified cell types in intact tissue. SIGNIFICANCE STATEMENT: Optical recordings of neural activity offer the promise of rapid and spatially resolved mapping of neural function. Calcium imaging has been widely applied in this mode, but is insensitive to the details of action potential waveforms and subthreshold events. Simultaneous optical perturbation and optical readout of single-cell electrical activity ("Optopatch") has been demonstrated in cultured neurons and in organotypic brain slices, but not in acute brain slices or in vivo Here, we describe a transgenic mouse in which expression of Optopatch constructs is controlled by the Cre-recombinase enzyme. This animal enables fast and robust optical measurements of single-cell electrical excitability in acute brain slices and in somatosensory ganglia in vivo, opening the door to rapid optical mapping of neuronal excitability.


Subject(s)
Action Potentials/physiology , Integrases/genetics , Neurons/physiology , Optogenetics/methods , Voltage-Sensitive Dye Imaging/methods , Animals , Cells, Cultured , Gene Targeting , Luminescent Proteins/genetics , Male , Mice , Mice, Transgenic , Neurons/cytology , Recombinant Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...