Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Nat Mach Intell ; 6(1): 25-39, 2024.
Article in English | MEDLINE | ID: mdl-38274364

ABSTRACT

Time-series single-cell RNA sequencing (scRNA-seq) datasets provide unprecedented opportunities to learn dynamic processes of cellular systems. Due to the destructive nature of sequencing, it remains challenging to link the scRNA-seq snapshots sampled at different time points. Here we present TIGON, a dynamic, unbalanced optimal transport algorithm that reconstructs dynamic trajectories and population growth simultaneously as well as the underlying gene regulatory network from multiple snapshots. To tackle the high-dimensional optimal transport problem, we introduce a deep learning method using a dimensionless formulation based on the Wasserstein-Fisher-Rao (WFR) distance. TIGON is evaluated on simulated data and compared with existing methods for its robustness and accuracy in predicting cell state transition and cell population growth. Using three scRNA-seq datasets, we show the importance of growth in the temporal inference, TIGON's capability in reconstructing gene expression at unmeasured time points and its applications to temporal gene regulatory networks and cell-cell communication inference.

2.
Nat Comput Sci ; 3(2): 149-163, 2023 Feb.
Article in English | MEDLINE | ID: mdl-37637776

ABSTRACT

While protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, is constrained by experimental capacity, various machine learning models have substantially expedited protein engineering. Three-dimensional protein structures promise further advantages, but their intricate geometric complexity hinders their applications in deep mutational screening. Persistent homology, an established algebraic topology tool for protein structural complexity reduction, fails to capture the homotopic shape evolution during the filtration of a given data. This work introduces a Topology-offered protein Fitness (TopFit) framework to complement protein sequence and structure embeddings. Equipped with an ensemble regression strategy, TopFit integrates the persistent spectral theory, a new topological Laplacian, and two auxiliary sequence embeddings to capture mutation-induced topological invariant, shape evolution, and sequence disparity in the protein fitness landscape. The performance of TopFit is assessed by 34 benchmark datasets with 128,634 variants, involving a vast variety of protein structure acquisition modalities and training set size variations.

3.
ArXiv ; 2023 Jul 27.
Article in English | MEDLINE | ID: mdl-37547662

ABSTRACT

Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.

4.
Brief Bioinform ; 24(5)2023 09 20.
Article in English | MEDLINE | ID: mdl-37580175

ABSTRACT

Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.


Subject(s)
Artificial Intelligence , Protein Engineering , Natural Language Processing , Antibodies , Data Analysis
5.
Commun Biol ; 6(1): 536, 2023 05 18.
Article in English | MEDLINE | ID: mdl-37202415

ABSTRACT

Virtual screening (VS) is a critical technique in understanding biomolecular interactions, particularly in drug design and discovery. However, the accuracy of current VS models heavily relies on three-dimensional (3D) structures obtained through molecular docking, which is often unreliable due to the low accuracy. To address this issue, we introduce a sequence-based virtual screening (SVS) as another generation of VS models that utilize advanced natural language processing (NLP) algorithms and optimized deep K-embedding strategies to encode biomolecular interactions without relying on 3D structure-based docking. We demonstrate that SVS outperforms state-of-the-art performance for four regression datasets involving protein-ligand binding, protein-protein, protein-nucleic acid binding, and ligand inhibition of protein-protein interactions and five classification datasets for protein-protein interactions in five biological species. SVS has the potential to transform current practices in drug discovery and protein engineering.


Subject(s)
Algorithms , Proteins , Molecular Docking Simulation , Ligands , Proteins/metabolism , Drug Discovery/methods
6.
J Chem Inf Model ; 63(1): 335-342, 2023 01 09.
Article in English | MEDLINE | ID: mdl-36577010

ABSTRACT

Accurate and reliable forecasting of emerging dominant severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants enables policymakers and vaccine makers to get prepared for future waves of infections. The last three waves of SARS-CoV-2 infections caused by dominant variants, Omicron (BA.1), BA.2, and BA.4/BA.5, were accurately foretold by our artificial intelligence (AI) models built with biophysics, genotyping of viral genomes, experimental data, algebraic topology, and deep learning. On the basis of newly available experimental data, we analyzed the impacts of all possible viral spike (S) protein receptor-binding domain (RBD) mutations on the SARS-CoV-2 infectivity. Our analysis sheds light on viral evolutionary mechanisms, i.e., natural selection through infectivity strengthening and antibody resistance. We forecast that BP.1, BL*, BA.2.75*, BQ.1*, and particularly BN.1* have a high potential to become the new dominant variants to drive the next surge. Our key projection about these variants dominance made on Oct. 18, 2022 (see arXiv:2210.09485) became reality in late November 2022.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Artificial Intelligence , Antibodies
7.
Comput Biol Med ; 151(Pt A): 106262, 2022 12.
Article in English | MEDLINE | ID: mdl-36379191

ABSTRACT

Due to its high transmissibility, Omicron BA.1 ousted the Delta variant to become a dominating variant in late 2021 and was replaced by more transmissible Omicron BA.2 in March 2022. An important question is which new variants will dominate in the future. Topology-based deep learning models have had tremendous success in forecasting emerging variants in the past. However, topology is insensitive to homotopic shape evolution in virus-human protein-protein binding, which is crucial to viral evolution and transmission. This challenge is tackled with persistent Laplacian, which is able to capture both the topological change and homotopic shape evolution of data. Persistent Laplacian-based deep learning models are developed to systematically evaluate variant infectivity. Our comparative analysis of Alpha, Beta, Gamma, Delta, Lambda, Mu, and Omicron BA.1, BA.1.1, BA.2, BA.2.11, BA.2.12.1, BA.3, BA.4, and BA.5 unveils that Omicron BA.2.11, BA.2.12.1, BA.3, BA.4, and BA.5 are more contagious than BA.2. In particular, BA.4 and BA.5 are about 36% more infectious than BA.2 and are projected to become new dominant variants by natural selection. Moreover, the proposed models outperform the state-of-the-art methods on three major benchmark datasets for mutation-induced protein-protein binding free energy changes. Our key projection about BA4 and BA.5's dominance made on May 1, 2022 (see arXiv:2205.00532) became a reality in late June 2022.


Subject(s)
Benchmarking , Humans , Mutation
8.
ArXiv ; 2022 Oct 18.
Article in English | MEDLINE | ID: mdl-36299737

ABSTRACT

Accurate and reliable forecasting of emerging dominant severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants enables policymakers and vaccine makers to get prepared for future waves of infections. The last three waves of SARS-CoV-2 infections caused by dominant variants Omicron (BA.1), BA.2, and BA.4/BA.5 were accurately foretold by our artificial intelligence (AI) models built with biophysics, genotyping of viral genomes, experimental data, algebraic topology, and deep learning. Based on newly available experimental data, we analyzed the impacts of all possible viral spike (S) protein receptor-binding domain (RBD) mutations on the SARS-CoV-2 infectivity. Our analysis sheds light on viral evolutionary mechanisms, i.e., natural selection through infectivity strengthening and antibody resistance. We forecast that BA.2.10.4, BA.2.75, BQ.1.1, and particularly, BA.2.75+R346T, have high potential to become new dominant variants to drive the next surge.

9.
J Chem Inf Model ; 62(19): 4629-4641, 2022 10 10.
Article in English | MEDLINE | ID: mdl-36154171

ABSTRACT

Directed evolution, a revolutionary biotechnology in protein engineering, optimizes protein fitness by searching an astronomical mutational space via expensive experiments. The cluster learning-assisted directed evolution (CLADE) efficiently explores the mutational space via a combination of unsupervised hierarchical clustering and supervised learning. However, the initial-stage sampling in CLADE treats all clusters equally despite many clusters containing a large portion of non-functional mutations. Recent statistical and deep learning tools enable evolutionary density modeling to access protein fitness in an unsupervised manner. In this work, we construct an ensemble of multiple evolutionary scores to guide the initial sampling in CLADE. The resulting evolutionary score-enhanced CLADE, called CLADE 2.0, efficiently selects a training set within a small informative space using the evolution-driven clustering sampling. CLADE 2.0 is validated by using two benchmark libraries both having 160,000 sequences from four-site mutational combinations. Extensive computational experiments and comparisons with existing cutting-edge methods indicate that CLADE 2.0 is a new state-of-art tool for machine learning-assisted directed evolution.


Subject(s)
Machine Learning , Proteins , Cluster Analysis , Protein Engineering/methods , Proteins/genetics
10.
Chem Rev ; 122(13): 11287-11368, 2022 07 13.
Article in English | MEDLINE | ID: mdl-35594413

ABSTRACT

Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Models, Molecular
11.
J Pharmacokinet Pharmacodyn ; 49(1): 39-50, 2022 02.
Article in English | MEDLINE | ID: mdl-34637069

ABSTRACT

Quantitative systems pharmacology (QSP) is an important approach in pharmaceutical research and development that facilitates in silico generation of quantitative mechanistic hypotheses and enables in silico trials. As demonstrated by applications from numerous industry groups and interest from regulatory authorities, QSP is becoming an increasingly critical component in clinical drug development. With rapidly evolving computational tools and methods, QSP modeling has achieved important progress in pharmaceutical research and development, including for heart failure (HF). However, various challenges exist in the QSP modeling and clinical characterization of HF. Machine/deep learning (ML/DL) methods have had success in a wide variety of fields and disciplines. They provide data-driven approaches in HF diagnosis and modeling, and offer a novel strategy to inform QSP model development and calibration. The combination of ML/DL and QSP modeling becomes an emergent direction in the understanding of HF and clinical development new therapies. In this work, we review the current status and achievement in QSP and ML/DL for HF, and discuss remaining challenges and future perspectives in the field.


Subject(s)
Heart Failure , Pharmacology , Calibration , Heart Failure/diagnosis , Heart Failure/drug therapy , Humans , Machine Learning , Models, Biological , Network Pharmacology
12.
PLoS Comput Biol ; 17(6): e1009077, 2021 06.
Article in English | MEDLINE | ID: mdl-34161317

ABSTRACT

The vertebrate hindbrain is segmented into rhombomeres (r) initially defined by distinct domains of gene expression. Previous studies have shown that noise-induced gene regulation and cell sorting are critical for the sharpening of rhombomere boundaries, which start out rough in the forming neural plate (NP) and sharpen over time. However, the mechanisms controlling simultaneous formation of multiple rhombomeres and accuracy in their sizes are unclear. We have developed a stochastic multiscale cell-based model that explicitly incorporates dynamic morphogenetic changes (i.e. convergent-extension of the NP), multiple morphogens, and gene regulatory networks to investigate the formation of rhombomeres and their corresponding boundaries in the zebrafish hindbrain. During pattern initiation, the short-range signal, fibroblast growth factor (FGF), works together with the longer-range morphogen, retinoic acid (RA), to specify all of these boundaries and maintain accurately sized segments with sharp boundaries. At later stages of patterning, we show a nonlinear change in the shape of rhombomeres with rapid left-right narrowing of the NP followed by slower dynamics. Rapid initial convergence improves boundary sharpness and segment size by regulating cell sorting and cell fate both independently and coordinately. Overall, multiple morphogens and tissue dynamics synergize to regulate the sizes and boundaries of multiple segments during development.


Subject(s)
Body Patterning/physiology , Models, Biological , Zebrafish/embryology , Animals , Body Patterning/genetics , Computational Biology , Embryonic Development/genetics , Embryonic Development/physiology , Fibroblast Growth Factors/physiology , Gene Expression Regulation, Developmental , Growth Substances/physiology , Rhombencephalon/cytology , Rhombencephalon/embryology , Signal Transduction , Stochastic Processes , Tretinoin/physiology , Zebrafish/genetics
13.
Nat Comput Sci ; 1(12): 809-818, 2021 Dec.
Article in English | MEDLINE | ID: mdl-35811998

ABSTRACT

Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by expensive and time-consuming screening or selection of large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces a MLDE framework, cluster learning-assisted directed evolution (CLADE), that combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves the global maximal fitness hit rate up to 91.0% and 34.0% for GB1 and PhoQ datasets, respectively, improved from 18.6% and 7.2% obtained by random-sampling-based MLDE.

14.
Dev Cell ; 53(6): 724-739.e14, 2020 06 22.
Article in English | MEDLINE | ID: mdl-32574592

ABSTRACT

Gradients of decapentaplegic (Dpp) pattern Drosophila wing imaginal discs, establishing gene expression boundaries at specific locations. As discs grow, Dpp gradients expand, keeping relative boundary positions approximately stationary. Such scaling fails in mutants for Pentagone (pent), a gene repressed by Dpp that encodes a diffusible protein that expands Dpp gradients. Although these properties fit a recent mathematical model of automatic gradient scaling, that model requires an expander that spreads with minimal loss throughout a morphogen field. Here, we show that Pent's actions are confined to within just a few cell diameters of its site of synthesis and can be phenocopied by manipulating non-diffusible Pent targets strictly within the Pent expression domain. Using genetics and mathematical modeling, we develop an alternative model of scaling driven by feedback downregulation of Dpp receptors and co-receptors. Among the model's predictions is a size beyond which scaling fails-something we observe directly in wing discs.


Subject(s)
Drosophila Proteins/genetics , Extracellular Matrix Proteins/genetics , Gene Expression Regulation, Developmental , Morphogenesis , Animals , Down-Regulation , Drosophila Proteins/metabolism , Drosophila melanogaster , Extracellular Matrix Proteins/metabolism , Feedback, Physiological , Imaginal Discs/embryology , Imaginal Discs/metabolism , Models, Theoretical
15.
Exp Dermatol ; 28(4): 493-502, 2019 04.
Article in English | MEDLINE | ID: mdl-30801791

ABSTRACT

Following injury, skin activates a complex wound healing programme. While cellular and signalling mechanisms of wound repair have been extensively studied, the principles of epidermal-dermal interactions and their effects on wound healing outcomes are only partially understood. To gain new insight into the effects of epidermal-dermal interactions, we developed a multiscale, hybrid mathematical model of skin wound healing. The model takes into consideration interactions between epidermis and dermis across the basement membrane via diffusible signals, defined as activator and inhibitor. Simulations revealed that epidermal-dermal interactions are critical for proper extracellular matrix deposition in the dermis, suggesting these signals may influence how wound scars form. Our model makes several theoretical predictions. First, basal levels of epidermal activator and inhibitor help to maintain dermis in a steady state, whereas their absence results in a raised, scar-like dermal phenotype. Second, wound-triggered increase in activator and inhibitor production by basal epidermal cells, coupled with fast re-epithelialization kinetics, reduces dermal scar size. Third, high-density fibrin clot leads to a raised, hypertrophic scar phenotype, whereas low-density fibrin clot leads to a hypotrophic phenotype. Fourth, shallow wounds, compared to deep wounds, result in overall reduced scarring. Taken together, our model predicts the important role of signalling across dermal-epidermal interface and the effect of fibrin clot density and wound geometry on scar formation. This hybrid modelling approach may be also applicable to other complex tissue systems, enabling the simulation of dynamic processes, otherwise computationally prohibitive with fully discrete models due to a large number of variables.


Subject(s)
Dermis/metabolism , Epidermis/metabolism , Models, Biological , Wound Healing , Animals , Cicatrix/etiology , Fibrin/metabolism , Fibroblasts/metabolism , Keratinocytes/metabolism
16.
Discrete Continuous Dyn Syst Ser B ; 24(8): 3971-3994, 2019 Aug.
Article in English | MEDLINE | ID: mdl-32269502

ABSTRACT

During epithelium tissue maintenance, lineages of cells differentiate and proliferate in a coordinated way to provide the desirable size and spatial organization of different types of cells. While mathematical models through deterministic description have been used to dissect role of feedback regulations on tissue layer size and stratification, how the stochastic effects influence tissue maintenance remains largely unknown. Here we present a stochastic continuum model for cell lineages to investigate how both layer thickness and layer stratification are affected by noise. We find that the cell-intrinsic noise often causes reduction and oscillation of layer size whereas the cell-extrinsic noise increases the thickness, and sometimes, leads to uncontrollable growth of the tissue layer. The layer stratification usually deteriorates as the noise level increases in the cell lineage systems. Interestingly, the morphogen noise, which mixes both cell-intrinsic noise and cell-extrinsic noise, can lead to larger size of layer with little impact on the layer stratification. By investigating different combinations of the three types of noise, we find the layer thickness variability is reduced when cell-extrinsic noise level is high or morphogen noise level is low. Interestingly, there exists a tradeoff between low thickness variability and strong layer stratification due to competition among the three types of noise, suggesting robust layer homeostasis requires balanced levels of different types of noise in the cell lineage systems.

17.
Discrete Continuous Dyn Syst Ser B ; 24(12): 6387-6417, 2019 Dec.
Article in English | MEDLINE | ID: mdl-32405272

ABSTRACT

The second-order implicit integration factor method (IIF2) is effective at solving stiff reaction-diffusion equations owing to its nice stability condition. IIF has previously been applied primarily to systems in which the reaction contained no explicitly time-dependent terms and the boundary conditions were homogeneous. If applied to a system with explicitly time-dependent reaction terms, we find that IIF2 requires prohibitively small time-steps, that are relative to the square of spatial grid sizes, to attain its theoretical second-order temporal accuracy. Although the second-order implicit exponential time differencing (iETD2) method can accurately handle explicitly time-dependent reactions, it is more computationally expensive than IIF2. In this paper, we develop a hybrid approach that combines the advantages of both methods, applying IIF2 to reaction terms that are not explicitly time-dependent and applying iETD2 to those which are. The second-order hybrid IIF-ETD method (hIFE2) inherits the lower complexity of IIF2 and the ability to remain second-order accurate in time for large time-steps from iETD2. Also, it inherits the unconditional stability from IIF2 and iETD2 methods for dealing with the stiffness in reaction-diffusion systems. Through a transformation, hIFE2 can handle nonhomogeneous boundary conditions accurately and efficiently. In addition, this approach can be naturally combined with the compact and array representations of IIF and ETD for systems in higher spatial dimensions. Various numerical simulations containing linear and nonlinear reactions are presented to demonstrate the superior stability, accuracy, and efficiency of the new hIFE method.

18.
Zhongguo Zhen Jiu ; 37(2): 215-218, 2017 Feb 12.
Article in Chinese | MEDLINE | ID: mdl-29231491

ABSTRACT

Acupuncture expectation refers to the subjective estimation for the effect of acupuncture to be applied. As sham acupuncture is usually used in acupuncture randomized clinical trials,there exists the effect of acupuncture expectation on subjects. It is necessary to evaluate and standardize it. The factors that influence the evaluation standard of acupuncture expectations are different acupuncture expectation value evaluations,evaluation criterions and time points. They will affect the evaluation of clinical efficacy. It is urgent to establish a unified evaluation standard to improve its reliability.


Subject(s)
Acupuncture Therapy/psychology , Humans , Placebo Effect , Randomized Controlled Trials as Topic , Reproducibility of Results , Treatment Outcome
SELECTION OF CITATIONS
SEARCH DETAIL
...