Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 40(7)2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38905502

ABSTRACT

SUMMARY: The design of two overlapping genes in a microbial genome is an emerging technique for adding more reliable control mechanisms in engineered organisms for increased stability. The design of functional overlapping gene pairs is a challenging procedure, and computational design tools are used to improve the efficiency to deploy successful designs in genetically engineered systems. GENTANGLE (Gene Tuples ArraNGed in overLapping Elements) is a high-performance containerized pipeline for the computational design of two overlapping genes translated in different reading frames of the genome. This new software package can be used to design and test gene entanglements for microbial engineering projects using arbitrary sets of user-specified gene pairs. AVAILABILITY AND IMPLEMENTATION: The GENTANGLE source code and its submodules are freely available on GitHub at https://github.com/BiosecSFA/gentangle. The DATANGLE (DATA for genTANGLE) repository contains related data and results and is freely available on GitHub at https://github.com/BiosecSFA/datangle. The GENTANGLE container is freely available on Singularity Cloud Library at https://cloud.sylabs.io/library/khyox/gentangle/gentangle.sif. The GENTANGLE repository wiki (https://github.com/BiosecSFA/gentangle/wiki), website (https://biosecsfa.github.io/gentangle/), and user manual contain detailed instructions on how to use the different components of software and data, including examples and reproducing the results. The code is licensed under the GNU Affero General Public License version 3 (https://www.gnu.org/licenses/agpl.html).


Subject(s)
Software , Computational Biology/methods , Genome, Microbial , Genetic Engineering/methods
2.
bioRxiv ; 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38645011

ABSTRACT

Rubisco is the primary CO2 fixing enzyme of the biosphere yet has slow kinetics. The roles of evolution and chemical mechanism in constraining the sequence landscape of rubisco remain debated. In order to map sequence to function, we developed a massively parallel assay for rubisco using an engineered E. coli where enzyme function is coupled to growth. By assaying >99% of single amino acid mutants across CO2 concentrations, we inferred enzyme velocity and CO2 affinity for thousands of substitutions. We identified many highly conserved positions that tolerate mutation and rare mutations that improve CO2 affinity. These data suggest that non-trivial kinetic improvements are readily accessible and provide a comprehensive sequence-to-function mapping for enzyme engineering efforts.

3.
bioRxiv ; 2024 Apr 06.
Article in English | MEDLINE | ID: mdl-38617247

ABSTRACT

Structured RNA lies at the heart of many central biological processes, from gene expression to catalysis. While advances in deep learning enable the prediction of accurate protein structural models, RNA structure prediction is not possible at present due to a lack of abundant high-quality reference data. Furthermore, available sequence data are generally not associated with organismal phenotypes that could inform RNA function. We created GARNET (Gtdb Acquired RNa with Environmental Temperatures), a new database for RNA structural and functional analysis anchored to the Genome Taxonomy Database (GTDB). GARNET links RNA sequences derived from GTDB genomes to experimental and predicted optimal growth temperatures of GTDB reference organisms. This enables construction of deep and diverse RNA sequence alignments to be used for machine learning. Using GARNET, we define the minimal requirements for a sequence- and structure-aware RNA generative model. We also develop a GPT-like language model for RNA in which triplet tokenization provides optimal encoding. Leveraging hyperthermophilic RNAs in GARNET and these RNA generative models, we identified mutations in ribosomal RNA that confer increased thermostability to the Escherichia coli ribosome. The GTDB-derived data and deep learning models presented here provide a foundation for understanding the connections between RNA sequence, structure, and function.

4.
Nucleic Acids Res ; 51(22): 12414-12427, 2023 Dec 11.
Article in English | MEDLINE | ID: mdl-37971304

ABSTRACT

RNA-guided endonucleases form the crux of diverse biological processes and technologies, including adaptive immunity, transposition, and genome editing. Some of these enzymes are components of insertion sequences (IS) in the IS200/IS605 and IS607 transposon families. Both IS families encode a TnpA transposase and a TnpB nuclease, an RNA-guided enzyme ancestral to CRISPR-Cas12s. In eukaryotes, TnpB homologs occur as two distinct types, Fanzor1s and Fanzor2s. We analyzed the evolutionary relationships between prokaryotic TnpBs and eukaryotic Fanzors, which revealed that both Fanzor1s and Fanzor2s stem from a single lineage of IS607 TnpBs with unusual active site arrangement. The widespread nature of Fanzors implies that the properties of this particular lineage of IS607 TnpBs were particularly suited to adaptation in eukaryotes. Biochemical analysis of an IS607 TnpB and Fanzor1s revealed common strategies employed by TnpBs and Fanzors to co-evolve with their cognate transposases. Collectively, our results provide a new model of sequential evolution from IS607 TnpBs to Fanzor2s, and Fanzor2s to Fanzor1s that details how genes of prokaryotic origin evolve to give rise to new protein families in eukaryotes.


Subject(s)
Bacteria , Endonucleases , Evolution, Molecular , Bacteria/enzymology , Bacteria/genetics , DNA Transposable Elements , Endonucleases/genetics , Endonucleases/metabolism , Prokaryotic Cells/enzymology , Transposases/metabolism , Eukaryotic Cells/enzymology
5.
ACS Synth Biol ; 12(11): 3242-3251, 2023 Nov 17.
Article in English | MEDLINE | ID: mdl-37888887

ABSTRACT

Predicting properties of proteins is of interest for basic biological understanding and protein engineering alike. Increasingly, machine learning (ML) approaches are being used for this task. However, the accuracy of such ML models typically degrades as test proteins stray further from the training data distribution. On the other hand, models that are more data-free, such as biophysics-based models, are typically uniformly accurate over all of the protein space, even if inferior for test points close to the training distribution. Consequently, being able to cohesively blend these two types of information within one model, as appropriate in different parts of the protein space, will improve overall importance. Herein, we tackle just this problem to yield a simple, practical, and scalable approach that can be easily implemented. In particular, we use a Bayesian formulation to integrate biophysical knowledge into neural networks. However, in doing so, a technical challenge arises: Bayesian neural networks (BNNs) enable the user to specify prior information only on the neural network weight parameters, rather than on the function values given to us from a typical biophysics-based model. Consequently, we devise a principled probabilistic method to overcome this challenge. Our approach yields intuitively pleasing results: predictions rely more heavily on the biophysical prior information when the BNN epistemic uncertainty─uncertainty arising from a lack of training data rather than sensor noise─is large and more heavily on the neural network when the epistemic uncertainty is small. We demonstrate this approach on an illustrative synthetic example, on two examples of protein property prediction (fluorescence and binding), and for generality on one small molecule property prediction problem.


Subject(s)
Machine Learning , Neural Networks, Computer , Bayes Theorem , Proteins
6.
J Med Chem ; 66(19): 13384-13399, 2023 Oct 12.
Article in English | MEDLINE | ID: mdl-37774359

ABSTRACT

Protein tyrosine phosphatase SHP2 mediates RAS-driven MAPK signaling and has emerged in recent years as a target of interest in oncology, both for treating with a single agent and in combination with a KRAS inhibitor. We were drawn to the pharmacological potential of SHP2 inhibition, especially following the initial observation that drug-like compounds could bind an allosteric site and enforce a closed, inactive state of the enzyme. Here, we describe the identification and characterization of GDC-1971 (formerly RLY-1971), a SHP2 inhibitor currently in clinical trials in combination with KRAS G12C inhibitor divarasib (GDC-6036) for the treatment of solid tumors driven by a KRAS G12C mutation.

7.
bioRxiv ; 2023 Aug 10.
Article in English | MEDLINE | ID: mdl-37609353

ABSTRACT

RNA-guided endonucleases form the crux of diverse biological processes and technologies, including adaptive immunity, transposition, and genome editing. Some of these enzymes are components of insertion sequences (IS) in the IS200/IS605 and IS607 transposon families. Both IS families encode a TnpA transposase and TnpB nuclease, an RNA-guided enzyme ancestral to CRISPR-Cas12. In eukaryotes and their viruses, TnpB homologs occur as two distinct types, Fanzor1 and Fanzor2. We analyzed the evolutionary relationships between prokaryotic TnpBs and eukaryotic Fanzors, revealing that a clade of IS607 TnpBs with unusual active site arrangement found primarily in Cyanobacteriota likely gave rise to both types of Fanzors. The wide-spread nature of Fanzors imply that the properties of this particular group of IS607 TnpBs were particularly suited to adaptation and evolution in eukaryotes and their viruses. Experimental characterization of a prokaryotic IS607 TnpB and virally encoded Fanzor1s uncovered features that may have fostered coevolution between TnpBs/Fanzors and their cognate transposases. Our results provide insight into the evolutionary origins of a ubiquitous family of RNA-guided proteins that shows remarkable conservation across domains of life.

8.
J Chem Inf Model ; 63(9): 2644-2650, 2023 05 08.
Article in English | MEDLINE | ID: mdl-37086179

ABSTRACT

Fragment-based drug discovery has led to six approved drugs, but the small sizes of the chemical fragments used in such methods typically result in only weak interactions between the fragment and its target molecule, which makes it challenging to experimentally determine the three-dimensional poses fragments assume in the bound state. One computational approach that could help address this difficulty is long-timescale molecular dynamics (MD) simulations, which have been used in retrospective studies to recover experimentally known binding poses of fragments. Here, we present the results of long-timescale MD simulations that we used to prospectively discover binding poses for two series of fragments in allosteric pockets on a difficult and important pharmaceutical target, protein tyrosine phosphatase 1b (PTP1b). Our simulations reversibly sampled the fragment association and dissociation process. One of the binding pockets found in the simulations has not to our knowledge been previously observed with a bound fragment, and the other pocket adopted a very rare conformation. We subsequently obtained high-resolution crystal structures of members of each fragment series bound to PTP1b, and the experimentally observed poses confirmed the simulation results. To the best of our knowledge, our findings provide the first demonstration that MD simulations can be used prospectively to determine fragment binding poses to previously unidentified pockets.


Subject(s)
Molecular Dynamics Simulation , Protein Tyrosine Phosphatase, Non-Receptor Type 1 , Protein Tyrosine Phosphatase, Non-Receptor Type 1/chemistry , Crystallography, X-Ray , Retrospective Studies , Drug Discovery/methods , Protein Binding , Binding Sites
9.
Nat Biotechnol ; 40(7): 1114-1122, 2022 07.
Article in English | MEDLINE | ID: mdl-35039677

ABSTRACT

Machine learning-based models of protein fitness typically learn from either unlabeled, evolutionarily related sequences or variant sequences with experimentally measured labels. For regimes where only limited experimental data are available, recent work has suggested methods for combining both sources of information. Toward that goal, we propose a simple combination approach that is competitive with, and on average outperforms more sophisticated methods. Our approach uses ridge regression on site-specific amino acid features combined with one probability density feature from modeling the evolutionary data. Within this approach, we find that a variational autoencoder-based probability density model showed the best overall performance, although any evolutionary density model can be used. Moreover, our analysis highlights the importance of systematic evaluations and sufficient baselines.


Subject(s)
Machine Learning , Proteins , Proteins/chemistry , Proteins/genetics
10.
Nat Commun ; 12(1): 5225, 2021 09 01.
Article in English | MEDLINE | ID: mdl-34471113

ABSTRACT

Despite recent advances in high-throughput combinatorial mutagenesis assays, the number of labeled sequences available to predict molecular functions has remained small for the vastness of the sequence space combined with the ruggedness of many fitness functions. While deep neural networks (DNNs) can capture high-order epistatic interactions among the mutational sites, they tend to overfit to the small number of labeled sequences available for training. Here, we developed Epistatic Net (EN), a method for spectral regularization of DNNs that exploits evidence that epistatic interactions in many fitness functions are sparse. We built a scalable extension of EN, usable for larger sequences, which enables spectral regularization using fast sparse recovery algorithms informed by coding theory. Results on several biological landscapes show that EN consistently improves the prediction accuracy of DNNs and enables them to outperform competing models which assume other priors. EN estimates the higher-order epistatic interactions of DNNs trained on massive sequence spaces-a computational problem that otherwise takes years to solve.


Subject(s)
Algorithms , Neural Networks, Computer , Bacteria , Green Fluorescent Proteins
11.
Sci Adv ; 7(17)2021 04.
Article in English | MEDLINE | ID: mdl-33883145

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) of tissues has revealed remarkable heterogeneity of cell types and states but does not provide information on the spatial organization of cells. To better understand how individual cells function within an anatomical space, we developed XYZeq, a workflow that encodes spatial metadata into scRNA-seq libraries. We used XYZeq to profile mouse tumor models to capture spatially barcoded transcriptomes from tens of thousands of cells. Analyses of these data revealed the spatial distribution of distinct cell types and a cell migration-associated transcriptomic program in tumor-associated mesenchymal stem cells (MSCs). Furthermore, we identify localized expression of tumor suppressor genes by MSCs that vary with proximity to the tumor core. We demonstrate that XYZeq can be used to map the transcriptome and spatial localization of individual cells in situ to reveal how cell composition and cell states can be affected by location within complex pathological tissue.


Subject(s)
Neoplasms , Single-Cell Analysis , Animals , Gene Expression Profiling , Mice , Neoplasms/genetics , Sequence Analysis, RNA , Transcriptome , Tumor Microenvironment/genetics , Exome Sequencing
12.
J Chem Inf Model ; 60(10): 4487-4496, 2020 10 26.
Article in English | MEDLINE | ID: mdl-32697578

ABSTRACT

Drug discovery projects entail cycles of design, synthesis, and testing that yield a series of chemically related small molecules whose properties, such as binding affinity to a given target protein, are progressively tailored to a particular drug discovery goal. The use of deep-learning technologies could augment the typical practice of using human intuition in the design cycle, and thereby expedite drug discovery projects. Here, we present DESMILES, a deep neural network model that advances the state of the art in machine learning approaches to molecular design. We applied DESMILES to a previously published benchmark that assesses the ability of a method to modify input molecules to inhibit the dopamine receptor D2, and DESMILES yielded a 77% lower failure rate compared to state-of-the-art models. To explain the ability of DESMILES to hone molecular properties, we visualize a layer of the DESMILES network, and further demonstrate this ability by using DESMILES to tailor the same molecules used in the D2 benchmark test to dock more potently against seven different receptors.


Subject(s)
Deep Learning , Drug Discovery , Humans , Machine Learning , Neural Networks, Computer
13.
J Comput Chem ; 39(30): 2494-2507, 2018 11 15.
Article in English | MEDLINE | ID: mdl-30368845

ABSTRACT

We present osprey 3.0, a new and greatly improved release of the osprey protein design software. Osprey 3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions of osprey when running the same algorithms on the same hardware. Moreover, osprey 3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions of osprey, osprey 3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically that osprey 3.0 accurately predicts the effect of mutations on protein-protein binding. Osprey 3.0 is available at http://www.cs.duke.edu/donaldlab/osprey.php as free and open-source software. © 2018 Wiley Periodicals, Inc.


Subject(s)
Protein Conformation , Proteins/chemistry , Software , Algorithms , Models, Molecular , Protein Binding
14.
Glob Public Health ; 13(4): 503-517, 2018 04.
Article in English | MEDLINE | ID: mdl-27416306

ABSTRACT

Although approximately one half of Guatemalans are indigenous, the Guatemalan Maya account for 72% of the extremely poor within the country. While some biomedical services are available in these communities, many Maya utilise traditional medicine as a significant, if not primary, source of health care. While existing medical anthropological research characterises these modes of medicine as medically dichotomous or pluralistic, our research in a Maya community of the Western Highlands, Concepción Huista, builds on previous studies and finds instead a syncretistic, imbricated local health system. We find significant overlap and interpenetration of the biomedical and traditional medical models that are described best as a framework where practitioners in both settings employ elements of the other in order to best meet community needs. By focusing on the practitioner's perspective, we demonstrate that in addition to patients' willingness to seek care across health systems, practitioners converse across seemingly distinct systems via incorporation of certain elements of the 'other'. Interventions to date have not accounted for this imbrication. Guatemalan governmental policies to support local healers have led to little practical change in the health-care landscape of the country. Therefore, understanding this complex imbrication is crucial for interventions and policy changes.


Subject(s)
Delivery of Health Care/organization & administration , Health Services, Indigenous/organization & administration , Medicine, Traditional , Rural Health Services/organization & administration , Adult , Aged , Aged, 80 and over , Anthropology, Medical , Female , Guatemala , Humans , Male , Middle Aged
15.
Curr Opin Struct Biol ; 39: 16-26, 2016 08.
Article in English | MEDLINE | ID: mdl-27086078

ABSTRACT

Computational structure-based protein design programs are becoming an increasingly important tool in molecular biology. These programs compute protein sequences that are predicted to fold to a target structure and perform a desired function. The success of a program's predictions largely relies on two components: first, the input biophysical model, and second, the algorithm that computes the best sequence(s) and structure(s) according to the biophysical model. Improving both the model and the algorithm in tandem is essential to improving the success rate of current programs, and here we review recent developments in algorithms for protein design, emphasizing how novel algorithms enable the use of more accurate biophysical models. We conclude with a list of algorithmic challenges in computational protein design that we believe will be especially important for the design of therapeutic proteins and protein assemblies.


Subject(s)
Algorithms , Computational Biology/methods , Protein Engineering/methods , Proteins/genetics , Heuristics , Proteins/chemistry , Proteins/metabolism , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...