Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 201
Filter
1.
Comput Biol Med ; 182: 109101, 2024 Sep 06.
Article in English | MEDLINE | ID: mdl-39243518

ABSTRACT

The COVID-19 pandemic has driven substantial evolution of the SARS-CoV-2 virus, yielding subvariants that exhibit enhanced infectiousness in humans. However, this adaptive advantage may not universally extend to zoonotic transmission. In this work, we hypothesize that viral adaptations favoring animal hosts do not necessarily correlate with increased human infectivity. In addition, we consider the potential for gain-of-function mutations that could facilitate the virus's rapid evolution in humans following adaptation in animal hosts. Specifically, we identify the SARS-CoV-2 receptor-binding domain (RBD) mutations that enhance human-animal cross-transmission. To this end, we construct a multitask deep learning model, MT-TopLap trained on multiple deep mutational scanning datasets, to accurately predict the binding free energy changes upon mutation for the RBD to ACE2 of various species, including humans, cats, bats, deer, and hamsters. By analyzing these changes, we identified key RBD mutations such as Q498H in SARS-CoV-2 and R493K in the BA.2 variant that are likely to increase the potential for human-animal cross-transmission.

2.
Biophys J ; 123(17): E1-E3, 2024 Sep 03.
Article in English | MEDLINE | ID: mdl-39173628
3.
J Chem Inf Model ; 64(16): 6676-6683, 2024 Aug 26.
Article in English | MEDLINE | ID: mdl-39116039

ABSTRACT

AlphaFold 3 (AF3), the latest version of protein structure prediction software, goes beyond its predecessors by predicting protein-protein complexes. It could revolutionize drug discovery and protein engineering, marking a major step toward comprehensive, automated protein structure prediction. However, independent validation of AF3's predictions is necessary. In this work, we evaluate AF3 complex structures using the SKEMPI 2.0 database which involves 317 protein-protein complexes and 8338 mutations. AF3 complex structures when applied to the most advanced TDL model, MT-TopLap (MultiTask-Topological Laplacian), give rise to a very good Pearson correlation coefficient of 0.86 for predicting protein-protein binding free energy changes upon mutation, which is slightly less than the 0.88 achieved earlier with the Protein Data Bank (PDB) structures. Nonetheless, AF3 complex structures led to a 8.6% increase in the prediction RMSE compared to original PDB complex structures. Additionally, some of AF3's complex structures have large errors, which were not captured in its ipTM performance metric. Finally, it is found that AF3's complex structures are not reliable for intrinsically flexible regions or domains.


Subject(s)
Databases, Protein , Mutation , Protein Binding , Proteins , Software , Thermodynamics , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Protein Conformation , Models, Molecular
5.
ArXiv ; 2024 Sep 04.
Article in English | MEDLINE | ID: mdl-38947927

ABSTRACT

Data sets with imbalanced class sizes, where one class size is much smaller than that of others, occur exceedingly often in many applications, including those with biological foundations, such as disease diagnosis and drug discovery. Therefore, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to do so can result in heavy costs. Nonetheless, many data classification procedures do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this work, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) approaches and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification tasks on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed technique not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer procedure based on an attention mechanism for self-supervised learning. In addition, the model implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed method is validated using six molecular data sets and compared to other related techniques. The computational experiments show that the proposed technique is superior to competing approaches even in the case of a high class imbalance ratio.

6.
ArXiv ; 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38883239

ABSTRACT

AlphaFold 3 (AF3), the latest version of protein structure prediction software, goes beyond its predecessors by predicting protein-protein complexes. It could revolutionize drug discovery and protein engineering, marking a major step towards comprehensive, automated protein structure prediction. However, independent validation of AF3's predictions is necessary. Evaluated using the SKEMPI 2.0 database which involves 317 protein-protein complexes and 8338 mutations, AF3 complex structures give rise to a very good Pearson correlation coefficient of 0.86 for predicting protein-protein binding free energy changes upon mutation, slightly less than the 0.88 achieved earlier with the Protein Data Bank (PDB) structures. Nonetheless, AF3 complex structures led to a 8.6% increase in the prediction RMSE compared to original PDB complex structures. Additionally, some of AF3's complex structures have large errors, which were not captured in its ipTM performance metric. Finally, it is found that AF3's complex structures are not reliable for intrinsically flexible regions or domains.

8.
J Chem Inf Model ; 64(8): 3558-3568, 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38572676

ABSTRACT

RNA velocity has the ability to capture the cell dynamic information in the biological processes; yet, a comprehensive analysis of the cell state transitions and their associated chemical and biological processes remains a gap. In this work, we provide the Hodge decomposition, coupled with discrete exterior calculus (DEC), to unveil cell dynamics by examining the decomposed curl-free, divergence-free, and harmonic components of the RNA velocity field in a low dimensional representation, such as a UMAP or a t-SNE representation. Decomposition results show that the decomposed components distinctly reveal key cell dynamic features such as cell cycle, bifurcation, and cell lineage differentiation, regardless of the choice of the low-dimensional representations. The consistency across different representations demonstrates that the Hodge decomposition is a reliable and robust way to extract these cell dynamic features, offering unique analysis and insightful visualization of single-cell RNA velocity fields.


Subject(s)
RNA , Single-Cell Analysis , RNA/chemistry , RNA/metabolism , Humans
9.
Comput Biol Med ; 175: 108497, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38678944

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.


Subject(s)
Principal Component Analysis , Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Sequence Analysis, RNA/methods , Algorithms , RNA-Seq/methods
10.
J Chem Inf Model ; 64(7): 2125-2128, 2024 04 08.
Article in English | MEDLINE | ID: mdl-38587006
11.
ArXiv ; 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38495558

ABSTRACT

As COVID-19 enters its fifth year, it continues to pose a significant global health threat, with the constantly mutating SARS-CoV-2 virus challenging drug effectiveness. A comprehensive understanding of virus-drug interactions is essential for predicting and improving drug effectiveness, especially in combating drug resistance during the pandemic. In response, the Path Laplacian Transformer-based Prospective Analysis Framework (PLFormer-PAF) has been proposed, integrating historical data analysis and predictive modeling strategies. This dual-strategy approach utilizes path topology to transform protein-ligand complexes into topological sequences, enabling the use of advanced large language models for analyzing protein-ligand interactions and enhancing its reliability with factual insights garnered from historical data. It has shown unparalleled performance in predicting binding affinity tasks across various benchmarks, including specific evaluations related to SARS-CoV-2, and assesses the impact of virus mutations on drug efficacy, offering crucial insights into potential drug resistance. The predictions align with observed mutation patterns in SARS-CoV-2, indicating that the widespread use of the Pfizer drug has lead to viral evolution and reduced drug efficacy. PLFormer-PAF's capabilities extend beyond identifying drug-resistant strains, positioning it as a key tool in drug discovery research and the development of new therapeutic strategies against fast-mutating viruses like COVID-19.

12.
J Comput Appl Math ; 4452024 Aug 01.
Article in English | MEDLINE | ID: mdl-38464901

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

13.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38499497

ABSTRACT

The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein-protein interaction network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5 and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging binding affinities of DrugBank compounds to selected targets. Furthermore, we elucidated the interactions of promising drugs with the targets and evaluated their drug-likeness. This study delineates a multi-faceted and comprehensive analytical framework, amalgamating bioinformatics, topological data analysis and machine learning, for drug repurposing in addiction treatment, setting the stage for subsequent experimental validation. The versatility of the methods we developed allows for applications across a range of diseases and transcriptomic datasets.


Subject(s)
Drug Repositioning , Transcriptome , United States , Drug Repositioning/methods , Reproducibility of Results , Gene Expression Profiling , Computational Biology/methods
14.
J Magn Reson Imaging ; 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38358090

ABSTRACT

In recent years, magnetic particle imaging (MPI) has emerged as a promising imaging technique depicting high sensitivity and spatial resolution. It originated in the early 2000s where it proposed a new approach to challenge the low spatial resolution achieved by using relaxometry in order to measure the magnetic fields. MPI presents 2D and 3D images with high temporal resolution, non-ionizing radiation, and optimal visual contrast due to its lack of background tissue signal. Traditionally, the images were reconstructed by the conversion of signal from the induced voltage by generating system matrix and X-space based methods. Because image reconstruction and analyses play an integral role in obtaining precise information from MPI signals, newer artificial intelligence-based methods are continuously being researched and developed upon. In this work, we summarize and review the significance and employment of machine learning and deep learning models for applications with MPI and the potential they hold for the future. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY: Stage 1.

15.
Comput Biol Med ; 171: 108211, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38422960

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology, offering unparalleled insights into the intricate landscape of cellular diversity and gene expression dynamics. scRNA-seq analysis represents a challenging and cutting-edge frontier within the field of biological research. Differential geometry serves as a powerful mathematical tool in various applications of scientific research. In this study, we introduce, for the first time, a multiscale differential geometry (MDG) strategy for addressing the challenges encountered in scRNA-seq data analysis. We assume that intrinsic properties of cells lie on a family of low-dimensional manifolds embedded in the high-dimensional space of scRNA-seq data. Multiscale cell-cell interactive manifolds are constructed to reveal complex relationships in the cell-cell network, where curvature-based features for cells can decipher the intricate structural and biological information. We showcase the utility of our novel approach by demonstrating its effectiveness in classifying cell types. This innovative application of differential geometry in scRNA-seq analysis opens new avenues for understanding the intricacies of biological networks and holds great potential for network analysis in other fields.


Subject(s)
Data Analysis , Gene Expression Profiling , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Cluster Analysis
16.
Res Sq ; 2024 Feb 09.
Article in English | MEDLINE | ID: mdl-38405777

ABSTRACT

Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.

17.
Biophys J ; 123(17): 2807-2814, 2024 Sep 03.
Article in English | MEDLINE | ID: mdl-38356263

ABSTRACT

Electrostatics is of paramount importance to chemistry, physics, biology, and medicine. The Poisson-Boltzmann (PB) theory is a primary model for electrostatic analysis. However, it is highly challenging to compute accurate PB electrostatic solvation free energies for macromolecules due to the nonlinearity, dielectric jumps, charge singularity, and geometric complexity associated with the PB equation. The present work introduces a PB-based machine learning (PBML) model for biomolecular electrostatic analysis. Trained with the second-order accurate MIBPB solver, the proposed PBML model is found to be more accurate and faster than several eminent PB solvers in electrostatic analysis. The proposed PBML model can provide highly accurate PB electrostatic solvation free energy of new biomolecules or new conformations generated by molecular dynamics with much reduced computational cost.


Subject(s)
Machine Learning , Static Electricity , Molecular Dynamics Simulation , Poisson Distribution , Thermodynamics
18.
Comput Biol Med ; 169: 107918, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38194782

ABSTRACT

Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hundreds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.


Subject(s)
Machine Learning , Solubility , Amino Acid Sequence , Mutation
19.
J Comput Chem ; 45(6): 306-320, 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-37830273

ABSTRACT

The Poisson-Boltzmann (PB) model is a widely used electrostatic model for biomolecular solvation analysis. Formulated as an elliptic interface problem, the PB model can be numerically solved on either Eulerian meshes using finite difference/finite element methods or Lagrangian meshes using boundary element methods. Molecular surface generators, which produce the discretized dielectric interfaces between solutes and solvents, are critical factors in determining the accuracy and efficiency of the PB solvers. In this work, we investigate the utility of the Eulerian Solvent Excluded Surface (ESES) software for rendering conjugated Eulerian and Lagrangian surface representations, which enables us to numerically validate and compare the quality of Eulerian PB solvers, such as the MIBPB solver, and the Lagrangian PB solvers, such as the TABI-PB solver. Furthermore, with the ESES software and its associated PB solvers, we are able to numerically validate an interesting and useful but often neglected source-target symmetric property associated with the linearized PB model.

20.
Small ; 20(5): e2305300, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37735143

ABSTRACT

Caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), coronavirus disease 2019 (COVID-19) has shown extensive lung manifestations in vulnerable individuals, putting lung imaging and monitoring at the forefront of early detection and treatment. Magnetic particle imaging (MPI) is an imaging modality, which can bring excellent contrast, sensitivity, and signal-to-noise ratios to lung imaging for the development of new theranostic approaches for respiratory diseases. Advances in MPI tracers would offer additional improvements and increase the potential for clinical translation of MPI. Here, a high-performance nanotracer based on shape anisotropy of magnetic nanoparticles is developed and its use in MPI imaging of the lung is demonstrated. Shape anisotropy proves to be a critical parameter for increasing signal intensity and resolution and exceeding those properties of conventional spherical nanoparticles. The 0D nanoparticles exhibit a 2-fold increase, while the 1D nanorods have a > 5-fold increase in signal intensity when compared to VivoTrax. Newly designed 1D nanorods displayed high signal intensities and excellent resolution in lung images. A spatiotemporal lung imaging study in mice revealed that this tracer offers new opportunities for monitoring disease and guiding intervention.


Subject(s)
Magnetite Nanoparticles , Nanoparticles , Mice , Animals , Anisotropy , Diagnostic Imaging/methods , Magnetics , Magnetic Phenomena , Magnetic Resonance Imaging
SELECTION OF CITATIONS
SEARCH DETAIL