Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
NPJ Syst Biol Appl ; 10(1): 38, 2024 Apr 09.
Article in English | MEDLINE | ID: mdl-38594351

ABSTRACT

Acute myeloid leukemia (AML) is characterized by uncontrolled proliferation of poorly differentiated myeloid cells, with a heterogenous mutational landscape. Mutations in IDH1 and IDH2 are found in 20% of the AML cases. Although much effort has been made to identify genes associated with leukemogenesis, the regulatory mechanism of AML state transition is still not fully understood. To alleviate this issue, here we develop a new computational approach that integrates genomic data from diverse sources, including gene expression and ATAC-seq datasets, curated gene regulatory interaction databases, and mathematical modeling to establish models of context-specific core gene regulatory networks (GRNs) for a mechanistic understanding of tumorigenesis of AML with IDH mutations. The approach adopts a new optimization procedure to identify the top network according to its accuracy in capturing gene expression states and its flexibility to allow sufficient control of state transitions. From GRN modeling, we identify key regulators associated with the function of IDH mutations, such as DNA methyltransferase DNMT1, and network destabilizers, such as E2F1. The constructed core regulatory network and outcomes of in-silico network perturbations are supported by survival data from AML patients. We expect that the combined bioinformatics and systems-biology modeling approach will be generally applicable to elucidate the gene regulation of disease progression.


Subject(s)
Leukemia, Myeloid, Acute , Nucleophosmin , Humans , Gene Regulatory Networks/genetics , Isocitrate Dehydrogenase/genetics , Leukemia, Myeloid, Acute/genetics , Carcinogenesis
2.
bioRxiv ; 2023 Jul 31.
Article in English | MEDLINE | ID: mdl-37577526

ABSTRACT

Acute myeloid leukemia (AML) is characterized by uncontrolled proliferation of poorly differentiated myeloid cells, with a heterogenous mutational landscape. Mutations in IDH1 and IDH2 are found in 20% of the AML cases. Although much effort has been made to identify genes associated with leukemogenesis, the regulatory mechanism of AML state transition is still not fully understood. To alleviate this issue, here we develop a new computational approach that integrates genomic data from diverse sources, including gene expression and ATAC-seq datasets, curated gene regulatory interaction databases, and mathematical modeling to establish models of context-specific core gene regulatory networks (GRNs) for a mechanistic understanding of tumorigenesis of AML with IDH mutations. The approach adopts a novel optimization procedure to identify the optimal network according to its accuracy in capturing gene expression states and its flexibility to allow sufficient control of state transitions. From GRN modeling, we identify key regulators associated with the function of IDH mutations, such as DNA methyltransferase DNMT1, and network destabilizers, such as E2F1. The constructed core regulatory network and outcomes of in-silico network perturbations are supported by survival data from AML patients. We expect that the combined bioinformatics and systems-biology modeling approach will be generally applicable to elucidate the gene regulation of disease progression.

3.
Bioinform Adv ; 3(1): vbad032, 2023.
Article in English | MEDLINE | ID: mdl-37038446

ABSTRACT

Motivation: Biological processes are regulated by underlying genes and their interactions that form gene regulatory networks (GRNs). Dysregulation of these GRNs can cause complex diseases such as cancer, Alzheimer's and diabetes. Hence, accurate GRN inference is critical for elucidating gene function, allowing for the faster identification and prioritization of candidate genes for functional investigation. Several statistical and machine learning-based methods have been developed to infer GRNs based on biological and synthetic datasets. Here, we developed a method named AGRN that infers GRNs by employing an ensemble of machine learning algorithms. Results: From the idea that a single method may not perform well on all datasets, we calculate the gene importance scores using three machine learning methods-random forest, extra tree and support vector regressors. We calculate the importance scores from Shapley Additive Explanations, a recently published method to explain machine learning models. We have found that the importance scores from Shapley values perform better than the traditional importance scoring methods based on almost all the benchmark datasets. We have analyzed the performance of AGRN using the datasets from the DREAM4 and DREAM5 challenges for GRN inference. The proposed method, AGRN-an ensemble machine learning method with Shapley values, outperforms the existing methods both in the DREAM4 and DREAM5 datasets. With improved accuracy, we believe that AGRN inferred GRNs would enhance our mechanistic understanding of biological processes in health and disease. Availabilityand implementation: https://github.com/DuaaAlawad/AGRN. Supplementary information: Supplementary data are available at Bioinformatics online.

4.
Genome Biol ; 23(1): 270, 2022 12 27.
Article in English | MEDLINE | ID: mdl-36575445

ABSTRACT

A major question in systems biology is how to identify the core gene regulatory circuit that governs the decision-making of a biological process. Here, we develop a computational platform, named NetAct, for constructing core transcription factor regulatory networks using both transcriptomics data and literature-based transcription factor-target databases. NetAct robustly infers regulators' activity using target expression, constructs networks based on transcriptional activity, and integrates mathematical modeling for validation. Our in silico benchmark test shows that NetAct outperforms existing algorithms in inferring transcriptional activity and gene networks. We illustrate the application of NetAct to model networks driving TGF-ß-induced epithelial-mesenchymal transition and macrophage polarization.


Subject(s)
Computational Biology , Transcription Factors , Transcription Factors/metabolism , Gene Expression Regulation , Gene Regulatory Networks , Systems Biology , Algorithms
5.
Comput Syst Oncol ; 1(2)2021 Jun.
Article in English | MEDLINE | ID: mdl-34164628

ABSTRACT

Epithelial-mesenchymal transition (EMT) is an important biological process through which epithelial cells undergo phenotypic transitions to mesenchymal cells by losing cell-cell adhesion and gaining migratory properties that cells use in embryogenesis, wound healing, and cancer metastasis. An important research topic is to identify the underlying gene regulatory networks (GRNs) governing the decision making of EMT and develop predictive models based on the GRNs. The advent of recent genomic technology, such as single-cell RNA sequencing, has opened new opportunities to improve our understanding about the dynamical controls of EMT. In this article, we review three major types of computational and mathematical approaches and methods for inferring and modeling GRNs driving EMT. We emphasize (1) the bottom-up approaches, where GRNs are constructed through literature search; (2) the top-down approaches, where GRNs are derived from genome-wide sequencing data; (3) the combined top-down and bottom-up approaches, where EMT GRNs are constructed and simulated by integrating bioinformatics and mathematical modeling. We discuss the methodologies and applications of each approach and the available resources for these studies.

6.
Bioinformatics ; 37(9): 1327-1329, 2021 06 09.
Article in English | MEDLINE | ID: mdl-33279968

ABSTRACT

SUMMARY: GeneEx is an interactive web-app that uses an ODE-based mathematical modeling approach to simulate, visualize and analyze gene regulatory circuits (GRCs) for an explicit kinetic parameter set or for a large ensemble of random parameter sets. GeneEx offers users the freedom to modify many aspects of the simulation such as the parameter ranges, the levels of gene expression noise and the GRC network topology itself. This degree of flexibility allows users to explore a variety of hypotheses by providing insight into the number and stability of attractors for a given GRC. Moreover, users have the option to upload, and subsequently compare, experimental gene expression data to simulated data generated from the analysis of a built or uploaded custom circuit. Finally, GeneEx offers a curated database that contains circuit motifs and known biological GRCs to facilitate further inquiry into these. Overall, GeneEx enables users to investigate the effects of parameter variation, stochasticity and/or topological changes on gene expression for GRCs using a systems-biology approach. AVAILABILITY AND IMPLEMENTATION: GeneEx is available at https://geneex.jax.org. This web-app is released under the MIT license and is free and open to all users and there is no mandatory login requirement. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Regulatory Networks , Mobile Applications , Databases, Factual , Kinetics , Software , Systems Biology
7.
iScience ; 23(6): 101150, 2020 Jun 26.
Article in English | MEDLINE | ID: mdl-32450514

ABSTRACT

Many biological processes involve precise cellular state transitions controlled by complex gene regulation. Here, we use budding yeast cell cycle as a model system and explore how a gene regulatory circuit encodes essential information of state transitions. We present a generalized random circuit perturbation method for circuits containing heterogeneous regulation types and its usage to analyze both steady and oscillatory states from an ensemble of circuit models with random kinetic parameters. The stable steady states form robust clusters with a circular structure that are associated with cell cycle phases. This circular structure in the clusters is consistent with single-cell RNA sequencing data. The oscillatory states specify the irreversible state transitions along cell cycle progression. Furthermore, we identify possible mechanisms to understand the irreversible state transitions from the steady states. We expect this approach to be robust and generally applicable to unbiasedly predict dynamical transitions of a gene regulatory circuit.

8.
IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1537-1549, 2019.
Article in English | MEDLINE | ID: mdl-28961123

ABSTRACT

Modeling the interface region of a protein complex paves the way for understanding its dynamics and functionalities. Existing works model the interface region of a complex by using different approaches, such as, the residue composition at the interface region, the geometry of the interface residues, or the structural alignment of interface regions. These approaches are useful for ranking a set of docked conformation or for building scoring function for protein-protein docking, but they do not provide a generic and scalable technique for the extraction of interface patterns leading to functional motif discovery. In this work, we model the interface region of a protein complex by graphs and extract interface patterns of the given complex in the form of frequent subgraphs. To achieve this, we develop a scalable algorithm for frequent subgraph mining. We show that a systematic review of the mined subgraphs provides an effective method for the discovery of functional motifs that exist along the interface region of a given protein complex. In our experiments, we use three PDB protein structure datasets. The first two datasets are composed of PDB structures from different conformations of two dimeric protein complexes: HIV-1 protease (329 structures), and triosephosphate isomerase (TIM) (86 structures). The third dataset is a collection of different enzyme structures protein structures from the six top-level enzyme classes, namely: Oxydoreductase, Transferase, Hydrolase, Lyase, Isomerase, and Ligase. We show that for the first two datasets, our method captures the locking mechanism at the dimeric interface by taking into account the spatial positioning of the interfacial residues through graphs. Indeed, our frequent subgraph mining based approach discovers the patterns representing the dimerization lock which is formed at the base of the structure in 323 of the 329 HIV-1 protease structures. Similarly, for 86 TIM structures, our approach discovers the dimerization lock formation in 50 structures. For the enzyme structures, we show that we are able to capture the functional motifs (active sites) that are specific to each of the six top-level classes of enzymes through frequent subgraphs.


Subject(s)
Amino Acid Motifs , Computational Biology/methods , Proteins , Algorithms , Data Mining , Databases, Protein , Models, Molecular , Protein Conformation , Protein Subunits , Proteins/chemistry , Proteins/metabolism
9.
J Theor Biol ; 389: 60-71, 2016 Jan 21.
Article in English | MEDLINE | ID: mdl-26549467

ABSTRACT

Secondary structure (SS) refers to the local spatial organization of a polypeptide backbone atoms of a protein. Accurate prediction of SS can provide crucial features to form the next higher level of 3D structure of a protein accurately. SS has three different major components, helix (H), beta (E) and coil (C). Most of the SS predictors express imbalanced accuracies by claiming higher prediction performances in predicting H and C, and on the contrary having low accuracy in E predictions. E component being in low count, a predictor may show very good overall performance by over-predicting H and C and under predicting E, which can make such predictors biologically inapplicable. In this work we are motivated to develop a balanced SS predictor by incorporating 33 physicochemical properties into 15-tuble peptides via Chou׳s general PseAAC, which allowed obtaining higher accuracies in predicting all three SS components. Our approach uses three different support vector machines for binary classification of the major classes and then form optimized multiclass predictor using genetic algorithm (GA). The trained three binary SVMs are E versus non-E (i.e., E/¬E), C/¬C and H/¬H. This GA based optimized and combined three class predictor, called cSVM, is further combined with SPINE X to form the proposed final balanced predictor, called MetaSSPred. This novel paradigm assists us in optimizing the precision and recall. We prepared two independent test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. MetaSSPred significantly increases beta accuracy (QE) for both the datasets. QE score of MetaSSPred on CB471 and N295 were 71.7% and 74.4% respectively. These scores are 20.9% and 19.0% improvement over the QE scores given by SPINE X alone on CB471 and N295 datasets respectively. Standard deviations of the accuracies across three SS classes of MetaSSPred on CB471 and N295 datasets were 4.2% and 2.3% respectively. On the other hand, for SPINE X, these values are 12.9% and 10.9% respectively. These findings suggest that the proposed MetaSSPred is a well-balanced SS predictor compared to the state-of-the-art SPINE X predictor.


Subject(s)
Computational Biology/methods , Protein Structure, Secondary , Proteins/chemistry , Algorithms , Databases, Protein , HIV Protease/chemistry , Internet , Probability , Reproducibility of Results , Sequence Analysis, Protein , Support Vector Machine
10.
Biochemistry ; 54(22): 3543-54, 2015 Jun 09.
Article in English | MEDLINE | ID: mdl-25982518

ABSTRACT

Aldolases are essential enzymes in the glycolysis pathway and catalyze the reaction cleaving fructose/tagatose 1,6-bisphosphate into dihydroxyacetone phosphate and glyceraldehyde 3-phosphate. To determine how the aldolase motions relate to its catalytic process, we studied the dynamics of three different class II aldolase structures through simulations. We employed coarse-grained elastic network normal-mode analyses to investigate the dynamics of Escherichia coli fructose 1,6-bisphosphate aldolase, E. coli tagatose 1,6-bisphosphate aldolase, and Thermus aquaticus fructose 1,6-bisphosphate aldolase and compared their motions in different oligomeric states. The first one is a dimer, and the second and third are tetramers. Our analyses suggest that oligomerization not only stabilizes the aldolase structures, showing fewer fluctuations at the subunit interfaces, but also allows the enzyme to achieve the required dynamics for its functional loops. The essential mobility of these loops in the functional oligomeric states can facilitate the enzymatic mechanism, substrate recruitment in the open state, bringing the catalytic residues into their required configuration in the closed bound state, and moving back to the open state to release the catalytic products and repositioning the enzyme for its next catalytic cycle. These findings suggest that the aldolase global motions are conserved among aldolases having different oligomeric states to preserve its catalytic mechanism. The coarse-grained approaches taken permit an unprecedented view of the changes in the structural dynamics and how these relate to the critical structural stabilities essential for catalysis. The results are supported by experimental findings from many previous studies.


Subject(s)
Aldehyde-Lyases/chemistry , Escherichia coli Proteins/chemistry , Escherichia coli/enzymology , Fructose-Bisphosphate Aldolase/chemistry , Thermus/enzymology , Aldehyde-Lyases/genetics , Escherichia coli/genetics , Escherichia coli Proteins/genetics , Fructose-Bisphosphate Aldolase/genetics , Protein Multimerization , Protein Structure, Quaternary , Protein Structure, Secondary , Thermus/genetics
11.
AIMS Biophys ; 2(4): 613-629, 2015.
Article in English | MEDLINE | ID: mdl-28133628

ABSTRACT

The spatial organization of nucleosomes in 30-nm fibers remains unknown in detail. To tackle this problem, we analyzed all stereochemically possible configurations of two-start chromatin fibers with DNA linkers L = 10-70 bp (nucleosome repeat length NRL = 157-217 bp). In our model, the energy of a fiber is a sum of the elastic energy of the linker DNA, steric repulsion, electrostatics, and the H4 tail-acidic patch interaction between two stacked nucleosomes. We found two families of energetically feasible conformations of the fibers-one observed earlier, and the other novel. The fibers from the two families are characterized by different DNA linking numbers-that is, they are topologically different. Remarkably, the optimal geometry of a fiber and its topology depend on the linker length: the fibers with linkers L = 10n and 10n + 5 bp have DNA linking numbers per nucleosome ΔLk ≈ -1.5 and -1.0, respectively. In other words, the level of DNA supercoiling is directly related to the length of the inter-nucleosome linker in the chromatin fiber (and therefore, to NRL). We hypothesize that this topological polymorphism of chromatin fibers may play a role in the process of transcription, which is known to generate different levels of DNA supercoiling upstream and downstream from RNA polymerase. A genome-wide analysis of the NRL distribution in active and silent yeast genes yielded results consistent with this assumption.

12.
Methods Mol Biol ; 1215: 213-36, 2015.
Article in English | MEDLINE | ID: mdl-25330965

ABSTRACT

The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.


Subject(s)
HIV Protease/chemistry , Models, Molecular , Databases, Protein , Entropy , Humans , Principal Component Analysis
13.
Protein Sci ; 23(2): 213-28, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24318986

ABSTRACT

Triosephosphate isomerase (TIM) catalyzes the reaction to convert dihydroxyacetone phosphate into glyceraldehyde 3-phosphate, and vice versa. In most organisms, its functional oligomeric state is a homodimer; however, tetramer formation in hyperthermophiles is required for functional activity. The tetrameric TIM structure also provides added stability to the structure, enabling it to function at more extreme temperatures. We apply Principal Component Analysis to find that the TIM structure space is clearly divided into two groups--the open and the closed TIM structures. The distribution of the structures in the open set is much sparser than that in the closed set, showing a greater conformational diversity of the open structures. We also apply the Elastic Network Model to four different TIM structures--an engineered monomeric structure, a dimeric structure from a mesophile--Trypanosoma brucei, and two tetrameric structures from hyperthermophiles Thermotoga maritima and Pyrococcus woesei. We find that dimerization not only stabilizes the structures, it also enhances their functional dynamics. Moreover, tetramerization of the hyperthermophilic structures increases their functional loop dynamics, enabling them to function in the destabilizing environment of extreme temperatures. Computations also show that the functional loop motions, especially loops 6 and 7, are highly coordinated. In summary, our computations reveal the underlying mechanism of the allosteric regulation of the functional loops of the TIM structures, and show that tetramerization of the structure as found in the hyperthermophilic organisms is required to maintain the coordination of the functional loops at a level similar to that in the dimeric mesophilic structure.


Subject(s)
Protein Conformation , Triose-Phosphate Isomerase/chemistry , Trypanosoma brucei brucei/chemistry , Amino Acid Sequence , Crystallography, X-Ray , Dimerization , Principal Component Analysis , Protein Structure, Quaternary , Pyrococcus/enzymology , Thermotoga maritima/enzymology
14.
Phys Biol ; 9(1): 014001, 2012 02.
Article in English | MEDLINE | ID: mdl-22314977

ABSTRACT

Loops in proteins that connect secondary structures such as alpha-helix and beta-sheet, are often on the surface and may play a critical role in some functions of a protein. The mobility of loops is central for the motional freedom and flexibility requirements of active-site loops and may play a critical role for some functions. The structures and behaviors of loops have not been studied much in the context of the whole structure and its overall motions, especially how these might be coupled. Here we investigate loop motions by using coarse-grained structures (C(α) atoms only) to solve the motions of the system by applying Lagrange equations with elastic network models to learn about which loops move in an independent fashion and which move in coordination with domain motions, faster and slower, respectively. The normal modes of the system are calculated using eigen-decomposition of the stiffness matrix. The contribution of individual modes and groups of modes is investigated for their effects on all residues in each loop by using Fourier analyses. Our results indicate overall that the motions of functional sets of loops behave in similar ways as the whole structure. But overall only a relatively few loops move in coordination with the dominant slow modes of motion, and these are often closely related to function.

15.
BMC Struct Biol ; 10 Suppl 1: S4, 2010 May 17.
Article in English | MEDLINE | ID: mdl-20487511

ABSTRACT

BACKGROUND: Currently a huge amount of protein-protein interaction data is available from high throughput experimental methods. In a large network of protein-protein interactions, groups of proteins can be identified as functional clusters having related functions where a single protein can occur in multiple clusters. However experimental methods are error-prone and thus the interactions in a functional cluster may include false positives or there may be unreported interactions. Therefore correctly identifying a functional cluster of proteins requires the knowledge of whether any two proteins in a cluster interact, whether an interaction can exclude other interactions, or how strong the affinity between two interacting proteins is. METHODS: In the present work the yeast protein-protein interaction network is clustered using a spectral clustering method proposed by us in 2006 and the individual clusters are investigated for functional relationships among the member proteins. 3D structural models of the proteins in one cluster have been built--the protein structures are retrieved from the Protein Data Bank or predicted using a comparative modeling approach. A rigid body protein docking method (Cluspro) is used to predict the protein-protein interaction complexes. Binding sites of the docked complexes are characterized by their buried surface areas in the docked complexes, as a measure of the strength of an interaction. RESULTS: The clustering method yields functionally coherent clusters. Some of the interactions in a cluster exclude other interactions because of shared binding sites. New interactions among the interacting proteins are uncovered, and thus higher order protein complexes in the cluster are proposed. Also the relative stability of each of the protein complexes in the cluster is reported. CONCLUSIONS: Although the methods used are computationally expensive and require human intervention and judgment, they can identify the interactions that could occur together or ones that are mutually exclusive. In addition indirect interactions through another intermediate protein can be identified. These theoretical predictions might be useful for crystallographers to select targets for the X-ray crystallographic determination of protein complexes.


Subject(s)
Protein Interaction Mapping/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Cluster Analysis , Models, Biological , Models, Molecular , Protein Binding , Saccharomyces cerevisiae/chemistry , Saccharomyces cerevisiae Proteins/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...