Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
J Cheminform ; 15(1): 59, 2023 Jun 08.
Article in English | MEDLINE | ID: mdl-37291633

ABSTRACT

The vast size of chemical space necessitates computational approaches to automate and accelerate the design of molecular sequences to guide experimental efforts for drug discovery. Genetic algorithms provide a useful framework to incrementally generate molecules by applying mutations to known chemical structures. Recently, masked language models have been applied to automate the mutation process by leveraging large compound libraries to learn commonly occurring chemical sequences (i.e., using tokenization) and predict rearrangements (i.e., using mask prediction). Here, we consider how language models can be adapted to improve molecule generation for different optimization tasks. We use two different generation strategies for comparison, fixed and adaptive. The fixed strategy uses a pre-trained model to generate mutations; the adaptive strategy trains the language model on each new generation of molecules selected for target properties during optimization. Our results show that the adaptive strategy allows the language model to more closely fit the distribution of molecules in the population. Therefore, for enhanced fitness optimization, we suggest the use of the fixed strategy during an initial phase followed by the use of the adaptive strategy. We demonstrate the impact of adaptive training by searching for molecules that optimize both heuristic metrics, drug-likeness and synthesizability, as well as predicted protein binding affinity from a surrogate model. Our results show that the adaptive strategy provides a significant improvement in fitness optimization compared to the fixed pre-trained model, empowering the application of language models to molecular design tasks.

2.
Cancer Biomark ; 33(2): 185-198, 2022.
Article in English | MEDLINE | ID: mdl-35213361

ABSTRACT

BACKGROUND: With the use of artificial intelligence and machine learning techniques for biomedical informatics, security and privacy concerns over the data and subject identities have also become an important issue and essential research topic. Without intentional safeguards, machine learning models may find patterns and features to improve task performance that are associated with private personal information. OBJECTIVE: The privacy vulnerability of deep learning models for information extraction from medical textural contents needs to be quantified since the models are exposed to private health information and personally identifiable information. The objective of the study is to quantify the privacy vulnerability of the deep learning models for natural language processing and explore a proper way of securing patients' information to mitigate confidentiality breaches. METHODS: The target model is the multitask convolutional neural network for information extraction from cancer pathology reports, where the data for training the model are from multiple state population-based cancer registries. This study proposes the following schemes to collect vocabularies from the cancer pathology reports; (a) words appearing in multiple registries, and (b) words that have higher mutual information. We performed membership inference attacks on the models in high-performance computing environments. RESULTS: The comparison outcomes suggest that the proposed vocabulary selection methods resulted in lower privacy vulnerability while maintaining the same level of clinical task performance.


Subject(s)
Confidentiality , Deep Learning , Information Storage and Retrieval/methods , Natural Language Processing , Neoplasms/epidemiology , Artificial Intelligence , Deep Learning/standards , Humans , Neoplasms/pathology , Registries
3.
IEEE J Biomed Health Inform ; 26(6): 2796-2803, 2022 06.
Article in English | MEDLINE | ID: mdl-35020599

ABSTRACT

Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance. Here, we present a strategy for incorporating short sequences of text (i.e. keywords) into training to boost model accuracy on rare classes. In our approach, we assemble a set of keywords, including short phrases, associated with each class. The keywords are then used as additional data during each batch of model training, resulting in a training loss that has contributions from both raw data and keywords. We evaluate our approach on classification of cancer pathology reports, which shows a substantial increase in model performance for rare classes. Furthermore, we analyze the impact of keywords on model output probabilities for bigrams, providing a straightforward method to identify model difficulties for limited training data.


Subject(s)
Reproducibility of Results , Data Collection , Humans
4.
Int J High Perform Comput Appl ; 36(5-6): 587-602, 2022 Nov.
Article in English | MEDLINE | ID: mdl-38603308

ABSTRACT

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

5.
PLoS Comput Biol ; 17(3): e1008168, 2021 03.
Article in English | MEDLINE | ID: mdl-33735192

ABSTRACT

Spatial expansion of a population of cells can arise from growth of microorganisms, plant cells, and mammalian cells. It underlies normal or dysfunctional tissue development, and it can be exploited as the foundation for programming spatial patterns. This expansion is often driven by continuous growth and division of cells within a colony, which in turn pushes the peripheral cells outward. This process generates a repulsion velocity field at each location within the colony. Here we show that this process can be approximated as coarse-grained repulsive-expansion kinetics. This framework enables accurate and efficient simulation of growth and gene expression dynamics in radially symmetric colonies with homogenous z-directional distribution. It is robust even if cells are not spherical and vary in size. The simplicity of the resulting mathematical framework also greatly facilitates generation of mechanistic insights.


Subject(s)
Cell Proliferation , Gene Expression , Animals , Gene Regulatory Networks , Kinetics , Models, Biological , N-Acetylmuramoyl-L-alanine Amidase/genetics
6.
J Cheminform ; 13(1): 14, 2021 Feb 23.
Article in English | MEDLINE | ID: mdl-33622401

ABSTRACT

The process of drug discovery involves a search over the space of all possible chemical compounds. Generative Adversarial Networks (GANs) provide a valuable tool towards exploring chemical space and optimizing known compounds for a desired functionality. Standard approaches to training GANs, however, can result in mode collapse, in which the generator primarily produces samples closely related to a small subset of the training data. In contrast, the search for novel compounds necessitates exploration beyond the original data. Here, we present an approach to training GANs that promotes incremental exploration and limits the impacts of mode collapse using concepts from Genetic Algorithms. In our approach, valid samples from the generator are used to replace samples from the training data. We consider both random and guided selection along with recombination during replacement. By tracking the number of novel compounds produced during training, we show that updates to the training data drastically outperform the traditional approach, increasing potential applications for GANs in drug discovery.

7.
Sci Adv ; 4(11): eaau0695, 2018 11.
Article in English | MEDLINE | ID: mdl-30474057

ABSTRACT

In microbial communities, social interactions such as competition occur ubiquitously across multiple spatial scales from local proximity to remote distance. However, it remains unclear how such a spatial variation of interaction contributes to the structural development of microbial populations. Here, we developed synthetic consortia, biophysical theory, and simulations to elucidate the role of spatial interference scale in governing ecosystem organization during range expansion. For consortia with unidirectional interference, we discovered that, at growing fronts, the extinction time of toxin-sensitive species is reciprocal to the spatial interference scale. In contrast, for communities with bidirectional interference, their structures diverge into distinct monoculture colonies under different initial conditions, with the corresponding separatrix set by the spatial scale of interference. Near the separatrix, ecosystem development becomes noise-driven and yields opposite structures. Our results establish spatial interaction scale as a key determinant for microbial range expansion, providing insights into microbial spatial organization and synthetic ecosystem engineering.


Subject(s)
Ecosystem , Microbiota , Models, Theoretical , Spatio-Temporal Analysis
8.
Biophys J ; 114(3): 737-746, 2018 02 06.
Article in English | MEDLINE | ID: mdl-29414718

ABSTRACT

Quantitative modeling of gene circuits is fundamentally important to synthetic biology, as it offers the potential to transform circuit engineering from trial-and-error construction to rational design and, hence, facilitates the advance of the field. Currently, typical models regard gene circuits as isolated entities and focus only on the biochemical processes within the circuits. However, such a standard paradigm is getting challenged by increasing experimental evidence suggesting that circuits and their host are intimately connected, and their interactions can potentially impact circuit behaviors. Here we systematically examined the roles of circuit-host coupling in shaping circuit dynamics by using a self-activating gene switch as a model circuit. Through a combination of deterministic modeling, stochastic simulation, and Fokker-Planck equation formalism, we found that circuit-host coupling alters switch behaviors across multiple scales. At the single-cell level, it slows the switch dynamics in the high protein production regime and enlarges the difference between stable steady-state values. At the population level, it favors cells with low protein production through differential growth amplification. Together, the two-level coupling effects induce both quantitative and qualitative modulations of the switch, with the primary component of the effects determined by the circuit's architectural parameters. This study illustrates the complexity and importance of circuit-host coupling in modulating circuit behaviors, demonstrating the need for a new paradigm-integrated modeling of the circuit-host system-for quantitative understanding of engineered gene networks.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Proteins/metabolism , Synthetic Biology , Systems Biology , Humans , Models, Genetic , Proteins/genetics
9.
Nat Microbiol ; 2(12): 1658-1666, 2017 Dec.
Article in English | MEDLINE | ID: mdl-28947816

ABSTRACT

One fundamental challenge in synthetic biology is the lack of quantitative tools that accurately describe and predict the behaviours of engineered gene circuits. This challenge arises from multiple factors, among which the complex interdependence of circuits and their host is a leading cause. Here we present a gene circuit modelling framework that explicitly integrates circuit behaviours with host physiology through bidirectional circuit-host coupling. The framework consists of a coarse-grained but mechanistic description of host physiology that involves dynamic resource partitioning, multilayered circuit-host coupling including both generic and system-specific interactions, and a detailed kinetic module of exogenous circuits. We showed that, following training, the framework was able to capture and predict a large set of experimental data concerning the host and its foreign gene overexpression. To demonstrate its utility, we applied the framework to examine a growth-modulating feedback circuit whose dynamics is qualitatively altered by circuit-host interactions. Using an extended version of the framework, we further systematically revealed the behaviours of a toggle switch across scales from single-cell dynamics to population structure and to spatial ecology. This work advances our quantitative understanding of gene circuit behaviours and also benefits the rational design of synthetic gene networks.


Subject(s)
Gene Regulatory Networks/genetics , Gene Regulatory Networks/physiology , Genes, Synthetic/genetics , Models, Genetic , Escherichia coli/genetics , Gene Expression Regulation, Bacterial , Genetic Engineering , Kinetics , Synthetic Biology
10.
Nucleic Acids Res ; 45(2): 1005-1014, 2017 01 25.
Article in English | MEDLINE | ID: mdl-27899571

ABSTRACT

Controllable spatial patterning is a major goal for the engineering of biological systems. Recently, synthetic gene circuits have become promising tools to achieve the goal; however, they need to possess both functional robustness and tunability in order to facilitate future applications. Here we show that, by harnessing the dual signaling and antibiotic features of nisin, simple synthetic circuits can direct Lactococcus lactis populations to form programmed spatial band-pass structures that do not require fine-tuning and are robust against environmental and cellular context perturbations. Although robust, the patterns are highly tunable, with their band widths specified by the external nisin gradient and cellular nisin immunity. Additionally, the circuits can direct cells to consistently generate designed patterns, even when the gradient is driven by structured nisin-producing bacteria and the patterning cells are composed of multiple species. A mathematical model successfully reproduces all of the observed patterns. Furthermore, the circuits allow us to establish predictable structures of synthetic communities and controllable arrays of cellular stripes and spots in space. This study offers new synthetic biology tools to program spatial structures. It also demonstrates that a deep mining of natural functionalities of living systems is a valuable route to build circuit robustness and tunability.


Subject(s)
Gene Regulatory Networks , Models, Biological , Bacteria/genetics , Bacteria/metabolism , Computer Simulation , Environment , Gene Expression Profiling , Gene Expression Regulation, Bacterial , Gene-Environment Interaction , Nisin/biosynthesis , Signal Transduction , Synthetic Biology
11.
BMC Syst Biol ; 9: 59, 2015 Sep 16.
Article in English | MEDLINE | ID: mdl-26377684

ABSTRACT

BACKGROUND: Social interactions have been increasingly recognized as one of the major factors that contribute to the dynamics and function of bacterial communities. To understand their functional roles and enable the design of robust synthetic consortia, one fundamental step is to determine the relationship between the social interactions of individuals and the spatiotemporal structures of communities. RESULTS: We present a systematic computational survey on this relationship for two-species communities by developing and utilizing a hybrid computational framework that combines discrete element techniques with reaction-diffusion equations. We found that deleterious interactions cause an increased variance in relative abundance, a drastic decrease in surviving lineages, and a rough expanding front. In contrast, beneficial interactions contribute to a reduced variance in relative abundance, an enhancement in lineage number, and a smooth expanding front. We also found that mutualism promotes spatial homogeneity and population robustness while competition increases spatial segregation and population fluctuations. To examine the generality of these findings, a large set of initial conditions with varying density and species abundance was tested and analyzed. In addition, a simplified mathematical model was developed to provide an analytical interpretation of the findings. CONCLUSIONS: This work advances our fundamental understanding of bacterial social interactions and population structures and, simultaneously, benefits synthetic biology for facilitated engineering of artificial microbial consortia.


Subject(s)
Bacteria , Computational Biology/methods , Microbial Interactions , Bacteria/cytology , Bacteria/growth & development , Models, Biological , Spatio-Temporal Analysis
12.
ACS Synth Biol ; 4(3): 240-8, 2015 Mar 20.
Article in English | MEDLINE | ID: mdl-24635143

ABSTRACT

One promising frontier for synthetic biology is the development of synthetic ecologies, whereby interacting species form an additional layer of connectivity for engineered gene circuits. Toward this goal, an important step is to understand different types of bacterial interactions in natural settings, among which competition is the most prevalent. By constructing a two-species population dynamics model, here, we mimicked bacterial growth in nature with resource-limited fluctuating environments and searched for optimal strategies for bacterial exploitative competition. In a simple game with two strategy options (constant or susceptible growth), we found that the species playing the constant growth strategy always outplays or is evenly matched with its competitor, suggesting that constant growth is a "no-loss" good bet. We also showed that adoption of sophisticated strategies enables a species to maximize its fitness when its competitor grows susceptibly. The pursuit of fitness maximization is, however, associated with potential loss if both species are capable of strategy adjustment, indicating an intrinsic risk-return trade-off. These findings offer new insights into bacterial competition and may also facilitate the engineering of microbial consortia for synthetic biology applications.


Subject(s)
Bacterial Physiological Phenomena , Microbial Consortia/physiology , Microbial Interactions/physiology , Models, Biological
13.
Biophys J ; 107(9): 2112-21, 2014 Nov 04.
Article in English | MEDLINE | ID: mdl-25418096

ABSTRACT

The SNARE complex plays a vital role in vesicle fusion arising during neuronal exocytosis. Key components in the regulation of SNARE complex formation, and ultimately fusion, are the transmembrane and linker regions of the vesicle-associated protein, synaptobrevin. However, the membrane-embedded structure of synaptobrevin in its prefusion state, which determines its interaction with other SNARE proteins during fusion, is largely unknown. This study reports all-atom molecular-dynamics simulations of the prefusion configuration of synaptobrevin in a lipid bilayer, aimed at characterizing the insertion depth and the orientation of the protein in the membrane, as well as the nature of the amino acids involved in determining these properties. By characterizing the structural properties of both wild-type and mutant synaptobrevin, the effects of C-terminal additions on tilt and insertion depth of membrane-embedded synaptobrevin are determined. The simulations suggest a robust, highly tilted state for membrane-embedded synaptobrevin with a helical connection between the transmembrane and linker regions, leading to an apparently new characterization of structural elements in prefusion synaptobrevin and providing a framework for interpreting past mutation experiments.


Subject(s)
Lipid Bilayers/chemistry , R-SNARE Proteins/chemistry , Amino Acid Sequence , Molecular Dynamics Simulation , Mutation , Phosphatidylcholines/chemistry , R-SNARE Proteins/genetics
14.
BMC Syst Biol ; 8: 23, 2014 Feb 27.
Article in English | MEDLINE | ID: mdl-24576330

ABSTRACT

BACKGROUND: Contact-dependent inhibition (CDI) has been recently revealed as an intriguing but ubiquitous mechanism for bacterial competition in which a species injects toxins into its competitors through direct physical contact for growth suppression. Although the molecular and genetic aspects of CDI systems are being increasingly explored, a quantitative and systematic picture of how CDI systems benefit population competition and hence alter corresponding competition outcomes is not well elucidated. RESULTS: By constructing a mathematical model for a population consisting of CDI+ and CDI- species, we have systematically investigated the dynamics and possible outcomes of population competition. In the well-mixed case, we found that the two species are mutually exclusive: Competition always results in extinction for one of the two species, with the winner determined by the tradeoff between the competitive benefit of the CDI+ species and its growth disadvantage from increased metabolic burden. Initial conditions in certain circumstances can also alter the outcome of competition. In the spatial case, in addition to exclusive extinction, coexistence and localized patterns may emerge from population competition. For spatial coexistence, population diffusion is also important in influencing the outcome. Using a set of illustrative examples, we further showed that our results hold true when the competition of the population is extended from one to two dimensional space. CONCLUSIONS: We have revealed that the competition of a population with CDI can produce diverse patterns, including extinction, coexistence, and localized aggregation. The emergence, relative abundance, and characteristic features of these patterns are collectively determined by the competitive benefit of CDI and its growth disadvantage for a given rate of population diffusion. Thus, this study provides a systematic and statistical view of CDI-based bacterial population competition, expanding the spectrum of our knowledge about CDI systems and possibly facilitating new experimental tests for a deeper understanding of bacterial interactions.


Subject(s)
Bacteria/cytology , Contact Inhibition , Extinction, Biological , Models, Biological
SELECTION OF CITATIONS
SEARCH DETAIL
...