Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-38915555

RESUMEN

LMNA -Related Dilated Cardiomyopathy (DCM) is an autosomal-dominant genetic condition with cardiomyocyte and conduction system dysfunction often resulting in heart failure or sudden death. The condition is caused by mutation in the Lamin A/C ( LMNA ) gene encoding Type-A nuclear lamin proteins involved in nuclear integrity, epigenetic regulation of gene expression, and differentiation. Molecular mechanisms of disease are not completely understood, and there are no definitive treatments to reverse progression or prevent mortality. We investigated possible mechanisms of LMNA -Related DCM using induced pluripotent stem cells derived from a family with a heterozygous LMNA c.357-2A>G splice-site mutation. We differentiated one LMNA mutant iPSC line derived from an affected female (Patient) and two non-mutant iPSC lines derived from her unaffected sister (Control) and conducted single-cell RNA sequencing for 12 samples (4 Patient and 8 Control) across seven time points: Day 0, 2, 4, 9, 16, 19, and 30. Our bioinformatics workflow identified 125,554 cells in raw data and 110,521 (88%) high-quality cells in sequentially processed data. Unsupervised clustering, cell annotation, and trajectory inference found complex heterogeneity: ten main cell types; many possible subtypes; and lineage bifurcation for Cardiac Progenitors to Cardiomyocytes (CM) and Epicardium-Derived Cells (EPDC). Data integration and comparative analyses of Patient and Control cells found cell type and lineage differentially expressed genes (DEG) with enrichment to support pathway dysregulation. Top DEG and enriched pathways included: 10 ZNF genes and RNA polymerase II transcription in Pluripotent cells (PP); BMP4 and TGF Beta/BMP signaling, sarcomere gene subsets and cardiogenesis, CDH2 and EMT in CM; LMNA and epigenetic regulation and DDIT4 and mTORC1 signaling in EPDC. Top DEG also included: XIST and other X-linked genes, six imprinted genes: SNRPN , PWAR6 , NDN , PEG10 , MEG3 , MEG8 , and enriched gene sets in metabolism, proliferation, and homeostasis. We confirmed Lamin A/C haploinsufficiency by allelic expression and Western blot. Our complex Patient-derived iPSC model for Lamin A/C haploinsufficiency in PP, CM, and EPDC provided support for dysregulation of genes and pathways, many previously associated with Lamin A/C defects, such as epigenetic gene expression, signaling, and differentiation. Our findings support disruption of epigenomic developmental programs as proposed in other LMNA disease models. We recognized other factors influencing epigenetics and differentiation; thus, our approach needs improvement to further investigate this mechanism in an iPSC-derived model.

2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38632952

RESUMEN

Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Transcriptoma , Análisis por Conglomerados
3.
Biophys J ; 2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38356263

RESUMEN

Electrostatics is of paramount importance to chemistry, physics, biology, and medicine. The Poisson-Boltzmann (PB) theory is a primary model for electrostatic analysis. However, it is highly challenging to compute accurate PB electrostatic solvation free energies for macromolecules due to the nonlinearity, dielectric jumps, charge singularity, and geometric complexity associated with the PB equation. The present work introduces a PB-based machine learning (PBML) model for biomolecular electrostatic analysis. Trained with the second-order accurate MIBPB solver, the proposed PBML model is found to be more accurate and faster than several eminent PB solvers in electrostatic analysis. The proposed PBML model can provide highly accurate PB electrostatic solvation free energy of new biomolecules or new conformations generated by molecular dynamics with much reduced computational cost.

4.
Nat Methods ; 20(2): 218-228, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36690742

RESUMEN

Spatial transcriptomic technologies and spatially annotated single-cell RNA sequencing datasets provide unprecedented opportunities to dissect cell-cell communication (CCC). However, incorporation of the spatial information and complex biochemical processes required in the reconstruction of CCC remains a major challenge. Here, we present COMMOT (COMMunication analysis by Optimal Transport) to infer CCC in spatial transcriptomics, which accounts for the competition between different ligand and receptor species as well as spatial distances between cells. A collective optimal transport method is developed to handle complex molecular interactions and spatial constraints. Furthermore, we introduce downstream analysis tools to infer spatial signaling directionality and genes regulated by signaling using machine learning models. We apply COMMOT to simulation data and eight spatial datasets acquired with five different technologies to show its effectiveness and robustness in identifying spatial CCC in data with varying spatial resolutions and gene coverages. Finally, COMMOT identifies new CCCs during skin morphogenesis in a case study of human epidermal development.


Asunto(s)
Comunicación Celular , Transcriptoma , Humanos , Comunicación Celular/genética , Perfilación de la Expresión Génica , Transducción de Señal , Simulación por Computador , Análisis de la Célula Individual
5.
Nat Commun ; 13(1): 4076, 2022 07 14.
Artículo en Inglés | MEDLINE | ID: mdl-35835774

RESUMEN

One major challenge in analyzing spatial transcriptomic datasets is to simultaneously incorporate the cell transcriptome similarity and their spatial locations. Here, we introduce SpaceFlow, which generates spatially-consistent low-dimensional embeddings by incorporating both expression similarity and spatial information using spatially regularized deep graph networks. Based on the embedding, we introduce a pseudo-Spatiotemporal Map that integrates the pseudotime concept with spatial locations of the cells to unravel spatiotemporal patterns of cells. By comparing with multiple existing methods on several spatial transcriptomic datasets at both spot and single-cell resolutions, SpaceFlow is shown to produce a robust domain segmentation and identify biologically meaningful spatiotemporal patterns. Applications of SpaceFlow reveal evolving lineage in heart developmental data and tumor-immune interactions in human breast cancer data. Our study provides a flexible deep learning framework to incorporate spatiotemporal information in analyzing spatial transcriptomic data.


Asunto(s)
Transcriptoma , Humanos , Transcriptoma/genética
6.
Commun Biol ; 5(1): 220, 2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35273328

RESUMEN

The rapid development of spatial transcriptomics (ST) techniques has allowed the measurement of transcriptional levels across many genes together with the spatial positions of cells. This has led to an explosion of interest in computational methods and techniques for harnessing both spatial and transcriptional information in analysis of ST datasets. The wide diversity of approaches in aim, methodology and technology for ST provides great challenges in dissecting cellular functions in spatial contexts. Here, we synthesize and review the key problems in analysis of ST data and methods that are currently applied, while also expanding on open questions and areas of future development.


Asunto(s)
Transcriptoma
7.
Cell Rep ; 37(12): 110140, 2021 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-34936864

RESUMEN

Neural crest (NC) cells migrate throughout vertebrate embryos to give rise to a huge variety of cell types, but when and where lineages emerge and their regulation remain unclear. We have performed single-cell RNA sequencing (RNA-seq) of cranial NC cells from the first pharyngeal arch in zebrafish over several stages during migration. Computational analysis combining pseudotime and real-time data reveals that these NC cells first adopt a transitional state, becoming specified mid-migration, with the first lineage decisions being skeletal and pigment, followed by neural and glial progenitors. In addition, by computationally integrating these data with RNA-seq data from a transgenic Wnt reporter line, we identify gene cohorts with similar temporal responses to Wnts during migration and show that one, Atp6ap2, is required for melanocyte differentiation. Together, our results show that cranial NC cell lineages arise progressively and uncover a series of spatially restricted cell interactions likely to regulate such cell-fate decisions.


Asunto(s)
Linaje de la Célula , Cresta Neural/metabolismo , Proteínas Wnt/metabolismo , Proteínas de Pez Cebra/genética , Proteínas de Pez Cebra/metabolismo , Pez Cebra/genética , Pez Cebra/metabolismo , Animales , Animales Modificados Genéticamente , Región Branquial/metabolismo , Comunicación Celular , Diferenciación Celular , Movimiento Celular , Nervios Craneales/metabolismo , Embrión no Mamífero/metabolismo , Perfilación de la Expresión Génica/métodos , Regulación del Desarrollo de la Expresión Génica , RNA-Seq , Transducción de Señal , Análisis de la Célula Individual
8.
Ann Biomed Eng ; 49(12): 3524-3539, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34585335

RESUMEN

Genetic mutations to the Lamin A/C gene (LMNA) can cause heart disease, but the mechanisms making cardiac tissues uniquely vulnerable to the mutations remain largely unknown. Further, patients with LMNA mutations have highly variable presentation of heart disease progression and type. In vitro patient-specific experiments could provide a powerful platform for studying this phenomenon, but the use of induced pluripotent stem cell-derived cardiomyocytes (iPSC-CM) introduces heterogeneity in maturity and function thus complicating the interpretation of the results of any single experiment. We hypothesized that integrating single cell RNA sequencing (scRNA-seq) with analysis of the tissue architecture and contractile function would elucidate some of the probable mechanisms. To test this, we investigated five iPSC-CM lines, three controls and two patients with a (c.357-2A>G) mutation. The patient iPSC-CM tissues had significantly weaker stress generation potential than control iPSC-CM tissues demonstrating the viability of our in vitro approach. Through scRNA-seq, differentially expressed genes between control and patient lines were identified. Some of these genes, linked to quantitative structural and functional changes, were cardiac specific, explaining the targeted nature of the disease progression seen in patients. The results of this work demonstrate the utility of combining in vitro tools in exploring heart disease mechanics.


Asunto(s)
Cardiomiopatía Dilatada/genética , Cardiomiopatía Dilatada/fisiopatología , Expresión Génica , Células Madre Pluripotentes Inducidas/citología , Lamina Tipo A/genética , Contracción Miocárdica , Miocitos Cardíacos/fisiología , Adulto , Anciano , Línea Celular , Humanos , Persona de Mediana Edad
9.
Insect Biochem Mol Biol ; 137: 103625, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34358664

RESUMEN

Scorpion α-toxins bind at the pharmacologically-defined site-3 on the sodium channel and inhibit channel inactivation by preventing the outward movement of the voltage sensor in domain IV (IVS4), whereas scorpion ß-toxins bind at site-4 on the sodium channel and enhance channel activation by trapping the voltage sensor of domain II (IIS4) in its outward position. However, limited information is available on the role of the voltage-sensing modules (VSM, comprising S1-S4) of domains I and III in toxin actions. We have previously shown that charge reversing substitutions of the innermost positively-charged residues in IIIS4 (R4E, R5E) increase the activity of an insect-selective site-4 scorpion toxin, Lqh-dprIT3-c, on BgNav1-1a, a cockroach sodium channel. Here we show that substitutions R4E and R5E in IIIS4 also increase the activity of two site-3 toxins, LqhαIT from Leiurusquinquestriatus hebraeus and insect-selective Av3 from Anemonia viridis. Furthermore, charge reversal of either of two conserved negatively-charged residues, D1K and E2K, in IIIS2 also increase the action of the site-3 and site-4 toxins. Homology modeling suggests that S2-D1 and S2-E2 interact with S4-R4 and S4-R5 in the VSM of domain III (III-VSM), respectively, in the activated state of the channel. However, charge swapping between S2-D1 and S4-R4 had no compensatory effects on gating or toxin actions, suggesting that charged residue interactions are complex. Collectively, our results highlight the involvement of III-VSM in the actions of both site 3 and site 4 toxins, suggesting that charge reversing substitutions in III-VSM allosterically facilitate IIS4 or IVS4 voltage sensor trapping by these toxins.


Asunto(s)
Venenos de Cnidarios/farmacología , Drosophila melanogaster/genética , Proteínas de Insectos/genética , Venenos de Escorpión/farmacología , Canales de Sodio/genética , Animales , Drosophila melanogaster/efectos de los fármacos , Drosophila melanogaster/metabolismo , Proteínas de Insectos/metabolismo , Oocitos/efectos de los fármacos , Oocitos/metabolismo , Canales de Sodio/metabolismo
10.
Curr Opin Syst Biol ; 26: 12-23, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33969247

RESUMEN

Cell-cell communication is a fundamental process that shapes biological tissue. Historically, studies of cell-cell communication have been feasible for one or two cell types and a few genes. With the emergence of single-cell transcriptomics, we are now able to examine the genetic profiles of individual cells at unprecedented scale and depth. The availability of such data presents an exciting opportunity to construct a more comprehensive description of cell-cell communication. This review discusses the recent explosion of methods that have been developed to infer cell-cell communication from non-spatial and spatial single-cell transcriptomics, two promising technologies which have complementary strengths and limitations. We propose several avenues to propel this rapidly expanding field forward in meaningful ways.

11.
Front Genet ; 12: 636743, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33833776

RESUMEN

Single-cell RNA sequencing (scRNA-seq) data provides unprecedented information on cell fate decisions; however, the spatial arrangement of cells is often lost. Several recent computational methods have been developed to impute spatial information onto a scRNA-seq dataset through analyzing known spatial expression patterns of a small subset of genes known as a reference atlas. However, there is a lack of comprehensive analysis of the accuracy, precision, and robustness of the mappings, along with the generalizability of these methods, which are often designed for specific systems. We present a system-adaptive deep learning-based method (DEEPsc) to impute spatial information onto a scRNA-seq dataset from a given spatial reference atlas. By introducing a comprehensive set of metrics that evaluate the spatial mapping methods, we compare DEEPsc with four existing methods on four biological systems. We find that while DEEPsc has comparable accuracy to other methods, an improved balance between precision and robustness is achieved. DEEPsc provides a data-adaptive tool to connect scRNA-seq datasets and spatial imaging datasets to analyze cell fate decisions. Our implementation with a uniform API can serve as a portal with access to all the methods investigated in this work for spatial exploration of cell fate decisions in scRNA-seq data. All methods evaluated in this work are implemented as an open-source software with a uniform interface.

12.
PLoS Comput Biol ; 17(3): e1008571, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33684098

RESUMEN

During early mammalian embryo development, a small number of cells make robust fate decisions at particular spatial locations in a tight time window to form inner cell mass (ICM), and later epiblast (Epi) and primitive endoderm (PE). While recent single-cell transcriptomics data allows scrutinization of heterogeneity of individual cells, consistent spatial and temporal mechanisms the early embryo utilize to robustly form the Epi/PE layers from ICM remain elusive. Here we build a multiscale three-dimensional model for mammalian embryo to recapitulate the observed patterning process from zygote to late blastocyst. By integrating the spatiotemporal information reconstructed from multiple single-cell transcriptomic datasets, the data-informed modeling analysis suggests two major processes critical to the formation of Epi/PE layers: a selective cell-cell adhesion mechanism (via EphA4/EphrinB2) for fate-location coordination and a temporal attenuation mechanism of cell signaling (via Fgf). Spatial imaging data and distinct subsets of single-cell gene expression data are then used to validate the predictions. Together, our study provides a multiscale framework that incorporates single-cell gene expression datasets to analyze gene regulations, cell-cell communications, and physical interactions among cells in complex geometries at single-cell resolution, with direct application to late-stage development of embryogenesis.


Asunto(s)
Desarrollo Embrionario/genética , Estratos Germinativos , Modelos Biológicos , Transcriptoma/genética , Animales , Embrión de Mamíferos/citología , Embrión de Mamíferos/metabolismo , Embrión de Mamíferos/fisiología , Estratos Germinativos/citología , Estratos Germinativos/metabolismo , Estratos Germinativos/fisiología , Ratones , Análisis de la Célula Individual
13.
BMVC ; 322021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36227018

RESUMEN

Complex biological tissues consist of numerous cells in a highly coordinated manner and carry out various biological functions. Therefore, segmenting a tissue into spatial and functional domains is critically important for understanding and controlling the biological functions. The emerging spatial transcriptomics technologies allow simultaneous measurements of thousands of genes with precise spatial information, providing an unprecedented opportunity for dissecting biological tissues. However, how to utilize such noisy, sparse, and high dimensional data for tissue segmentation remains a major challenge. Here, we develop a deep learning-based method, named SCAN-IT by transforming the spatial domain identification problem into an image segmentation problem, with cells mimicking pixels and expression values of genes within a cell representing the color channels. Specifically, SCAN-IT relies on geometric modeling, graph neural networks, and an informatics approach, DeepGraphInfomax. We demonstrate that SCAN-IT can handle datasets from a wide range of spatial transcriptomics techniques, including the ones with high spatial resolution but low gene coverage as well as those with low spatial resolution but high gene coverage. We show that SCAN-IT outperforms state-of-the-art methods using a benchmark dataset with ground truth domain annotations.

14.
Proc Natl Acad Sci U S A ; 117(36): 22146-22156, 2020 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-32848056

RESUMEN

Packing interaction is a critical driving force in the folding of helical membrane proteins. Despite the importance, packing defects (i.e., cavities including voids, pockets, and pores) are prevalent in membrane-integral enzymes, channels, transporters, and receptors, playing essential roles in function. Then, a question arises regarding how the two competing requirements, packing for stability vs. cavities for function, are reconciled in membrane protein structures. Here, using the intramembrane protease GlpG of Escherichiacoli as a model and cavity-filling mutation as a probe, we tested the impacts of native cavities on the thermodynamic stability and function of a membrane protein. We find several stabilizing mutations which induce substantial activity reduction without distorting the active site. Notably, these mutations are all mapped onto the regions of conformational flexibility and functional importance, indicating that the cavities facilitate functional movement of GlpG while compromising the stability. Experiment and molecular dynamics simulation suggest that the stabilization is induced by the coupling between enhanced protein packing and weakly unfavorable lipid desolvation, or solely by favorable lipid solvation on the cavities. Our result suggests that, stabilized by the relatively weak interactions with lipids, cavities are accommodated in membrane proteins without severe energetic cost, which, in turn, serve as a platform to fine-tune the balance between stability and flexibility for optimal activity.


Asunto(s)
Proteínas de Unión al ADN/química , Endopeptidasas/química , Proteínas de Escherichia coli/química , Proteínas de la Membrana/química , Dominio Catalítico , Proteínas de Unión al ADN/metabolismo , Endopeptidasas/metabolismo , Proteínas de Escherichia coli/metabolismo , Humanos , Proteínas de la Membrana/metabolismo , Modelos Moleculares , Simulación de Dinámica Molecular , Mutación , Conformación Proteica , Pliegue de Proteína , Estabilidad Proteica , Serina Endopeptidasas/química
15.
Nat Commun ; 11(1): 2084, 2020 04 29.
Artículo en Inglés | MEDLINE | ID: mdl-32350282

RESUMEN

Single-cell RNA sequencing (scRNA-seq) provides details for individual cells; however, crucial spatial information is often lost. We present SpaOTsc, a method relying on structured optimal transport to recover spatial properties of scRNA-seq data by utilizing spatial measurements of a relatively small number of genes. A spatial metric for individual cells in scRNA-seq data is first established based on a map connecting it with the spatial measurements. The cell-cell communications are then obtained by "optimally transporting" signal senders to target signal receivers in space. Using partial information decomposition, we next compute the intercellular gene-gene information flow to estimate the spatial regulations between genes across cells. Four datasets are employed for cross-validation of spatial gene expression prediction and comparison to known cell-cell communications. SpaOTsc has broader applications, both in integrating non-spatial single-cell measurements with spatial data, and directly in spatial single-cell transcriptomics data to reconstruct spatial cellular dynamics in tissues.


Asunto(s)
Transducción de Señal/genética , Análisis de la Célula Individual , Transcriptoma/genética , Animales , Comunicación Celular , Análisis por Conglomerados , Bases de Datos Genéticas , Drosophila/embriología , Drosophila/genética , Regulación del Desarrollo de la Expresión Génica , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Corteza Visual/metabolismo , Pez Cebra/embriología , Pez Cebra/genética
16.
Cell Rep ; 30(11): 3932-3947.e6, 2020 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-32187560

RESUMEN

Our knowledge of transcriptional heterogeneities in epithelial stem and progenitor cell compartments is limited. Epidermal basal cells sustain cutaneous tissue maintenance and drive wound healing. Previous studies have probed basal cell heterogeneity in stem and progenitor potential, but a comprehensive dissection of basal cell dynamics during differentiation is lacking. Using single-cell RNA sequencing coupled with RNAScope and fluorescence lifetime imaging, we identify three non-proliferative and one proliferative basal cell state in homeostatic skin that differ in metabolic preference and become spatially partitioned during wound re-epithelialization. Pseudotemporal trajectory and RNA velocity analyses predict a quasi-linear differentiation hierarchy where basal cells progress from Col17a1Hi/Trp63Hi state to early-response state, proliferate at the juncture of these two states, or become growth arrested before differentiating into spinous cells. Wound healing induces plasticity manifested by dynamic basal-spinous interconversions at multiple basal transcriptional states. Our study provides a systematic view of epidermal cellular dynamics, supporting a revised "hierarchical-lineage" model of homeostasis.


Asunto(s)
Epidermis/metabolismo , Epidermis/patología , Perfilación de la Expresión Génica , Homeostasis/genética , Análisis de la Célula Individual , Cicatrización de Heridas/genética , Animales , Movimiento Celular/genética , Femenino , Inflamación/genética , Inflamación/patología , Ratones Endogámicos C57BL , Ratones Transgénicos , Regulación hacia Arriba/genética
17.
Phys Chem Chem Phys ; 22(8): 4343-4367, 2020 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-32067019

RESUMEN

Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.


Asunto(s)
Biología Computacional , Modelos Biológicos , Algoritmos , Datos de Secuencia Molecular
18.
Nat Mach Intell ; 2(2): 116-123, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-34170981

RESUMEN

The ability to predict protein-protein interactions is crucial to our understanding of a wide range of biological activities and functions in the human body, and for guiding drug discovery. Despite considerable efforts to develop suitable computational methods, predicting protein-protein interaction binding affinity changes following mutation (ΔΔG) remains a severe challenge. Algebraic topology, a champion in recent worldwide competitions for protein-ligand binding affinity predictions, is a promising approach to simplifying the complexity of biological structures. Here we introduce element- and site-specific persistent homology (a new branch of algebraic topology) to simplify the structural complexity of protein-protein complexes and embed crucial biological information into topological invariants. We also propose a new deep learning algorithm called NetTree to take advantage of convolutional neural networks and gradient-boosting trees. A topology-based network tree is constructed by integrating the topological representation and NetTree for predicting protein-protein interaction ΔΔG. Tests on major benchmark datasets indicate that the proposed topology-based network tree is an important improvement over the current state of the art in predicting ΔΔG.

19.
J Appl Comput Topol ; 4(4): 481-507, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34179350

RESUMEN

While the spatial topological persistence is naturally constructed from a radius-based filtration, it has hardly been derived from a temporal filtration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based filtration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are defined on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein flexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the field.

20.
SIAM J Math Data Sci ; 2(2): 396-418, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-34222831

RESUMEN

Persistent homology is a powerful tool for characterizing the topology of a data set at various geometric scales. When applied to the description of molecular structures, persistent homology can capture the multiscale geometric features and reveal certain interaction patterns in terms of topological invariants. However, in addition to the geometric information, there is a wide variety of nongeometric information of molecular structures, such as element types, atomic partial charges, atomic pairwise interactions, and electrostatic potential functions, that is not described by persistent homology. Although element-specific homology and electrostatic persistent homology can encode some nongeometric information into geometry based topological invariants, it is desirable to have a mathematical paradigm to systematically embed both geometric and nongeometric information, i.e., multicomponent heterogeneous information, into unified topological representations. To this end, we propose a persistent cohomology based framework for the enriched representation of data. In our framework, nongeometric information can either be distributed globally or reside locally on the datasets in the geometric sense and can be properly defined on topological spaces, i.e., simplicial complexes. Using the proposed persistent cohomology based framework, enriched barcodes are extracted from datasets to represent heterogeneous information. We consider a variety of datasets to validate the present formulation and illustrate the usefulness of the proposed method based on persistent cohomology. It is found that the proposed framework outperforms or at least matches the state-of-the-art methods in the protein-ligand binding affinity prediction from massive biomolecular datasets without resorting to any deep learning formulation.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...