Search | VHL Regional Portal

Optimal trade-off control in machine learning-based library design, with application to adeno-associated virus (AAV) for gene therapy.

Zhu, Danqing; Brookes, David H; Busia, Akosua; Carneiro, Ana; Fannjiang, Clara; Popova, Galina; Shin, David; Donohue, Kevin C; Lin, Li F; Miller, Zachary M; Williams, Evan R; Chang, Edward F; Nowakowski, Tomasz J; Listgarten, Jennifer; Schaffer, David V.

Sci Adv ; 10(4): eadj3786, 2024 Jan 26.

Article in English | MEDLINE | ID: mdl-38266077

ABSTRACT

Adeno-associated viruses (AAVs) hold tremendous promise as delivery vectors for gene therapies. AAVs have been successfully engineered-for instance, for more efficient and/or cell-specific delivery to numerous tissues-by creating large, diverse starting libraries and selecting for desired properties. However, these starting libraries often contain a high proportion of variants unable to assemble or package their genomes, a prerequisite for any gene delivery goal. Here, we present and showcase a machine learning (ML) method for designing AAV peptide insertion libraries that achieve fivefold higher packaging fitness than the standard NNK library with negligible reduction in diversity. To demonstrate our ML-designed library's utility for downstream engineering goals, we show that it yields approximately 10-fold more successful variants than the NNK library after selection for infection of human brain tissue, leading to a promising glial-specific variant. Moreover, our design approach can be applied to other types of libraries for AAV and beyond.

Subject(s)

Dependovirus , Genetic Therapy , Humans , Dependovirus/genetics , Peptide Library , Brain , Machine Learning

On the sparsity of fitness functions and implications for learning.

Brookes, David H; Aghazadeh, Amirali; Listgarten, Jennifer.

Proc Natl Acad Sci U S A ; 119(1)2022 01 04.

Article in English | MEDLINE | ID: mdl-34937698

ABSTRACT

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model's interpretable parameters-sequence length, alphabet size, and assumed interactions between sequence positions-on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.

Subject(s)

Epistasis, Genetic , Learning/physiology , Algorithms , Models, Theoretical

Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions.

Aghazadeh, Amirali; Nisonoff, Hunter; Ocal, Orhan; Brookes, David H; Huang, Yijie; Koyluoglu, O Ozan; Listgarten, Jennifer; Ramchandran, Kannan.

Nat Commun ; 12(1): 5225, 2021 09 01.

Article in English | MEDLINE | ID: mdl-34471113

ABSTRACT

Despite recent advances in high-throughput combinatorial mutagenesis assays, the number of labeled sequences available to predict molecular functions has remained small for the vastness of the sequence space combined with the ruggedness of many fitness functions. While deep neural networks (DNNs) can capture high-order epistatic interactions among the mutational sites, they tend to overfit to the small number of labeled sequences available for training. Here, we developed Epistatic Net (EN), a method for spectral regularization of DNNs that exploits evidence that epistatic interactions in many fitness functions are sparse. We built a scalable extension of EN, usable for larger sequences, which enables spectral regularization using fast sparse recovery algorithms informed by coding theory. Results on several biological landscapes show that EN consistently improves the prediction accuracy of DNNs and enables them to outperform competing models which assume other priors. EN estimates the higher-order epistatic interactions of DNNs trained on massive sequence spaces-a computational problem that otherwise takes years to solve.

Subject(s)

Algorithms , Neural Networks, Computer , Bacteria , Green Fluorescent Proteins

Improvements to the APBS biomolecular solvation software suite.

Jurrus, Elizabeth; Engel, Dave; Star, Keith; Monson, Kyle; Brandi, Juan; Felberg, Lisa E; Brookes, David H; Wilson, Leighton; Chen, Jiahui; Liles, Karina; Chun, Minju; Li, Peter; Gohara, David W; Dolinsky, Todd; Konecny, Robert; Koes, David R; Nielsen, Jens Erik; Head-Gordon, Teresa; Geng, Weihua; Krasny, Robert; Wei, Guo-Wei; Holst, Michael J; McCammon, J Andrew; Baker, Nathan A.

Protein Sci ; 27(1): 112-128, 2018 01.

Article in English | MEDLINE | ID: mdl-28836357

ABSTRACT

The Adaptive Poisson-Boltzmann Solver (APBS) software was developed to solve the equations of continuum electrostatics for large biomolecular assemblages that have provided impact in the study of a broad range of chemical, biological, and biomedical applications. APBS addresses the three key technology challenges for understanding solvation and electrostatics in biomedical applications: accurate and efficient models for biomolecular solvation and electrostatics, robust and scalable software for applying those theories to biomolecular systems, and mechanisms for sharing and analyzing biomolecular electrostatics data in the scientific community. To address new research applications and advancing computational capabilities, we have continually updated APBS and its suite of accompanying software since its release in 2001. In this article, we discuss the models and capabilities that have recently been implemented within the APBS software package including a Poisson-Boltzmann analytical and a semi-analytical solver, an optimized boundary element solver, a geometry-based geometric flow solvation model, a graph theory-based algorithm for determining pKa values, and an improved web-based visualization tool for viewing electrostatics.

Subject(s)

Models, Molecular , Software , Static Electricity

PB-AM: An open-source, fully analytical linear poisson-boltzmann solver.

Felberg, Lisa E; Brookes, David H; Yap, Eng-Hui; Jurrus, Elizabeth; Baker, Nathan A; Head-Gordon, Teresa.

J Comput Chem ; 38(15): 1275-1282, 2017 06 05.

Article in English | MEDLINE | ID: mdl-27804145

ABSTRACT

We present the open source distributed software package Poisson-Boltzmann Analytical Method (PB-AM), a fully analytical solution to the linearized PB equation, for molecules represented as non-overlapping spherical cavities. The PB-AM software package includes the generation of outputs files appropriate for visualization using visual molecular dynamics, a Brownian dynamics scheme that uses periodic boundary conditions to simulate dynamics, the ability to specify docking criteria, and offers two different kinetics schemes to evaluate biomolecular association rate constants. Given that PB-AM defines mutual polarization completely and accurately, it can be refactored as a many-body expansion to explore 2- and 3-body polarization. Additionally, the software has been integrated into the Adaptive Poisson-Boltzmann Solver (APBS) software package to make it more accessible to a larger group of scientists, educators, and students that are more familiar with the APBS framework. © 2016 Wiley Periodicals, Inc.

Subject(s)

Molecular Dynamics Simulation , Proteins/chemistry , Software , Algorithms , Kinetics , Static Electricity

Finding Our Way in the Dark Proteome.

Bhowmick, Asmit; Brookes, David H; Yost, Shane R; Dyson, H Jane; Forman-Kay, Julie D; Gunter, Daniel; Head-Gordon, Martin; Hura, Gregory L; Pande, Vijay S; Wemmer, David E; Wright, Peter E; Head-Gordon, Teresa.

J Am Chem Soc ; 138(31): 9730-42, 2016 08 10.

Article in English | MEDLINE | ID: mdl-27387657

ABSTRACT

The traditional structure-function paradigm has provided significant insights for well-folded proteins in which structures can be easily and rapidly revealed by X-ray crystallography beamlines. However, approximately one-third of the human proteome is comprised of intrinsically disordered proteins and regions (IDPs/IDRs) that do not adopt a dominant well-folded structure, and therefore remain "unseen" by traditional structural biology methods. This Perspective considers the challenges raised by the "Dark Proteome", in which determining the diverse conformational substates of IDPs in their free states, in encounter complexes of bound states, and in complexes retaining significant disorder requires an unprecedented level of integration of multiple and complementary solution-based experiments that are analyzed with state-of-the art molecular simulation, Bayesian probabilistic models, and high-throughput computation. We envision how these diverse experimental and computational tools can work together through formation of a "computational beamline" that will allow key functional features to be identified in IDP structural ensembles.

Subject(s)

Computational Biology , Intrinsically Disordered Proteins/chemistry , Proteome , Bayes Theorem , Chromatography, Gel , Crystallography, X-Ray , Genome, Human , Humans , Kinetics , Magnetic Resonance Spectroscopy , Molecular Dynamics Simulation , Probability , Protein Conformation , Protein Folding , Proteomics/methods , Software

Experimental Inferential Structure Determination of Ensembles for Intrinsically Disordered Proteins.

Brookes, David H; Head-Gordon, Teresa.

J Am Chem Soc ; 138(13): 4530-8, 2016 Apr 06.

Article in English | MEDLINE | ID: mdl-26967199

ABSTRACT

We develop a Bayesian approach to determine the most probable structural ensemble model from candidate structures for intrinsically disordered proteins (IDPs) that takes full advantage of NMR chemical shifts and J-coupling data, their known errors and variances, and the quality of the theoretical back-calculation from structure to experimental observables. Our approach differs from previous formulations in the optimization of experimental and back-calculation nuisance parameters that are treated as random variables with known distributions, as opposed to structural or ensemble weight optimization or use of a reference ensemble. The resulting experimental inferential structure determination (EISD) method is size extensive with O(N) scaling, with N = number of structures, that allows for the rapid ranking of large ensemble data comprising tens of thousands of conformations. We apply the EISD approach on singular folded proteins and a corresponding set of â¼25â¯000 misfolded states to illustrate the problems that can arise using Boltzmann weighted priors. We then apply the EISD method to rank IDP ensembles most consistent with the NMR data and show that the primary error for ranking or creating good IDP ensembles resides in the poor back-calculation from structure to simulated experimental observable. We show that a reduction by a factor of 3 in the uncertainty of the back-calculation error can improve the discrimination among qualitatively different IDP ensembles for the amyloid-beta peptide.

Subject(s)

Intrinsically Disordered Proteins/chemistry , Models, Molecular , Amyloid beta-Peptides/chemistry , Bayes Theorem , Nuclear Magnetic Resonance, Biomolecular , Protein Conformation , Protein Folding

Family of Oxygen-Oxygen Radial Distribution Functions for Water.

Brookes, David H; Head-Gordon, Teresa.

J Phys Chem Lett ; 6(15): 2938-43, 2015 Aug 06.

Article in English | MEDLINE | ID: mdl-26267185

ABSTRACT

In a typical X-ray diffraction experiment, the elastically scattered intensity, I(Q), is the experimental observable. I(Q) contains contributions from both intramolecular as well as intermolecular correlations embodied in the scattering factors, HOO(Q) and HOH(Q), with negligible contributions from HHH(Q). Thus, to accurately define the oxygen-oxygen radial distribution function, gOO(r), a model of the electron density is required to accurately weigh the HOO(Q) component relative to the intramolecular and oxygen-hydrogen correlations from the total intensity observable. In this work, we carefully define the electron density model and its underlying assumptions and more explicitly utilize two restraints on the allowable gOO(r) functions, which must conform to both very low experimental errors at high Q and the need to satisfy the isothermal compressibility at low Q. Although highly restrained by these conditions, the underdetermined nature of the problem is such that we present a family of gOO(r) values that provide equally good agreement with the high-Q intensity and compressibility restraints and with physically correct behavior at small r.

Subject(s)

Oxygen/chemistry , Water/chemistry , Models, Theoretical , X-Ray Diffraction

Role of hydrophilicity and length of diblock arms for determining star polymer physical properties.

Felberg, Lisa E; Brookes, David H; Head-Gordon, Teresa; Rice, Julia E; Swope, William C.

J Phys Chem B ; 119(3): 944-57, 2015 Jan 22.

Article in English | MEDLINE | ID: mdl-25254622

ABSTRACT

We present a molecular simulation study of star polymers consisting of 16 diblock copolymer arms bound to a small adamantane core by varying both arm length and the outer hydrophilic block when attached to the same hydrophobic block of poly-Î´-valerolactone. Here we consider two biocompatible star polymers in which the hydrophilic block is composed of polyethylene glycol (PEG) or polymethyloxazoline (POXA) in addition to a polycarbonate-based polymer with a pendant hydrophilic group (PC1). We find that the different hydrophilic blocks of the star polymers show qualitatively different trends in their interactions with aqueous solvent, orientational time correlation functions, and orientational correlation between pairs of monomers of their polymeric arms in solution, in which we find that the PEG polymers are more thermosensitive compared with the POXA and PC1 star polymers over the physiological temperature range we have investigated.

Subject(s)

Hydrophobic and Hydrophilic Interactions , Physical Phenomena , Polymers/chemistry , Drug Carriers/chemistry , Models, Molecular , Molecular Conformation , Temperature

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL