Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Front Mol Biosci ; 10: 1258902, 2023.
Article in English | MEDLINE | ID: mdl-38028548

ABSTRACT

Background: Rare endocrine cancers such as Adrenocortical Carcinoma (ACC) present a serious diagnostic and prognostication challenge. The knowledge about ACC pathogenesis is incomplete, and patients have limited therapeutic options. Identification of molecular drivers and effective biomarkers is required for timely diagnosis of the disease and stratify patients to offer the most beneficial treatments. In this study we demonstrate how machine learning methods integrating multi-omics data, in combination with system biology tools, can contribute to the identification of new prognostic biomarkers for ACC. Methods: ACC gene expression and DNA methylation datasets were downloaded from the Xena Browser (GDC TCGA Adrenocortical Carcinoma cohort). A highly correlated multi-omics signature discriminating groups of samples was identified with the data integration analysis for biomarker discovery using latent components (DIABLO) method. Additional regulators of the identified signature were discovered using Clarivate CBDD (Computational Biology for Drug Discovery) network propagation and hidden nodes algorithms on a curated network of molecular interactions (MetaBase™). The discriminative power of the multi-omics signature and their regulators was delineated by training a random forest classifier using 55 samples, by employing a 10-fold cross validation with five iterations. The prognostic value of the identified biomarkers was further assessed on an external ACC dataset obtained from GEO (GSE49280) using the Kaplan-Meier estimator method. An optimal prognostic signature was finally derived using the stepwise Akaike Information Criterion (AIC) that allowed categorization of samples into high and low-risk groups. Results: A multi-omics signature including genes, micro RNA's and methylation sites was generated. Systems biology tools identified additional genes regulating the features included in the multi-omics signature. RNA-seq, miRNA-seq and DNA methylation sets of features revealed a high power to classify patients from stages I-II and stages III-IV, outperforming previously identified prognostic biomarkers. Using an independent dataset, associations of the genes included in the signature with Overall Survival (OS) data demonstrated that patients with differential expression levels of 8 genes and 4 micro RNA's showed a statistically significant decrease in OS. We also found an independent prognostic signature for ACC with potential use in clinical practice, combining 9-gene/micro RNA features, that successfully predicted high-risk ACC cancer patients. Conclusion: Machine learning and integrative analysis of multi-omics data, in combination with Clarivate CBDD systems biology tools, identified a set of biomarkers with high prognostic value for ACC disease. Multi-omics data is a promising resource for the identification of drivers and new prognostic biomarkers in rare diseases that could be used in clinical practice.

2.
Int J Mol Sci ; 23(1)2021 Dec 22.
Article in English | MEDLINE | ID: mdl-35008491

ABSTRACT

Protein-protein interactions is a longstanding challenge in cardiac remodeling processes and heart failure. Here, we use the MetaCore network and the Google matrix algorithms for prediction of protein-protein interactions dictating cardiac fibrosis, a primary cause of end-stage heart failure. The developed algorithms allow identification of interactions between key proteins and predict new actors orchestrating fibroblast activation linked to fibrosis in mouse and human tissues. These data hold great promise for uncovering new therapeutic targets to limit myocardial fibrosis.


Subject(s)
Fibrosis/metabolism , Protein Interaction Maps/physiology , Algorithms , Animals , Heart Failure/metabolism , Humans , Mice , Myocardium/metabolism , Search Engine/methods , Ventricular Remodeling/physiology
3.
Front Genet ; 11: 605, 2020.
Article in English | MEDLINE | ID: mdl-32719714

ABSTRACT

BACKGROUND: Duchenne muscular dystrophy (DMD) is a rare and severe X-linked muscular dystrophy in which the standard of care with variable outcome, also due to different drug response, is chronic off-label treatment with corticosteroids (CS). In order to search for SNP biomarkers for corticosteroid responsiveness, we genotyped variants across 205 DMD-related genes in patients with differential response to steroid treatment. METHODS AND FINDINGS: We enrolled a total of 228 DMD patients with identified dystrophin mutations, 78 of these patients have been under corticosteroid treatment for at least 5 years. DMD patients were defined as high responders (HR) if they had maintained the ability to walk after 15 years of age and low responders (LR) for those who had lost ambulation before the age of 10 despite corticosteroid therapy. Based on interactome mapping, we prioritized 205 genes and sequenced them in 21 DMD patients (discovery cohort or DiC = 21). We identified 43 SNPs that discriminate between HR and LR. Discriminant Analysis of Principal Components (DAPC) prioritized 2 response-associated SNPs in the TNFRSF10A gene. Validation of this genotype was done in two additional larger cohorts composed of 46 DMD patients on corticosteroid therapy (validation cohorts or VaC1), and 150 non ambulant DMD patients and never treated with corticosteroids (VaC2). SNP analysis in all validation cohorts (N = 207) showed that the CT haplotype is significantly associated with HR DMDs confirming the discovery results. CONCLUSION: We have shown that TNFRSF10A CT haplotype correlates with corticosteroid response in DMD patients and propose it as an exploratory CS response biomarker.

4.
Proc Natl Acad Sci U S A ; 116(19): 9671-9676, 2019 05 07.
Article in English | MEDLINE | ID: mdl-31004050

ABSTRACT

Dysregulation of signaling pathways in multiple sclerosis (MS) can be analyzed by phosphoproteomics in peripheral blood mononuclear cells (PBMCs). We performed in vitro kinetic assays on PBMCs in 195 MS patients and 60 matched controls and quantified the phosphorylation of 17 kinases using xMAP assays. Phosphoprotein levels were tested for association with genetic susceptibility by typing 112 single-nucleotide polymorphisms (SNPs) associated with MS susceptibility. We found increased phosphorylation of MP2K1 in MS patients relative to the controls. Moreover, we identified one SNP located in the PHDGH gene and another on IRF8 gene that were associated with MP2K1 phosphorylation levels, providing a first clue on how this MS risk gene may act. The analyses in patients treated with disease-modifying drugs identified the phosphorylation of each receptor's downstream kinases. Finally, using flow cytometry, we detected in MS patients increased STAT1, STAT3, TF65, and HSPB1 phosphorylation in CD19+ cells. These findings indicate the activation of cell survival and proliferation (MAPK), and proinflammatory (STAT) pathways in the immune cells of MS patients, primarily in B cells. The changes in the activation of these kinases suggest that these pathways may represent therapeutic targets for modulation by kinase inhibitors.


Subject(s)
B-Lymphocytes , MAP Kinase Signaling System/genetics , Multiple Sclerosis , Phosphoproteins , Polymorphism, Single Nucleotide , Proteomics , B-Lymphocytes/metabolism , B-Lymphocytes/pathology , Cell Proliferation , Cell Survival , Female , Humans , Male , Multiple Sclerosis/genetics , Multiple Sclerosis/metabolism , Multiple Sclerosis/pathology , Phosphoproteins/genetics , Phosphoproteins/metabolism , Phosphorylation/genetics , Protein Kinases/genetics , Protein Kinases/metabolism
5.
PLoS Comput Biol ; 13(10): e1005757, 2017 Oct.
Article in English | MEDLINE | ID: mdl-29073203

ABSTRACT

Multiple Sclerosis (MS) is an autoimmune disease driving inflammatory and degenerative processes that damage the central nervous system (CNS). However, it is not well understood how these events interact and evolve to evoke such a highly dynamic and heterogeneous disease. We established a hypothesis whereby the variability in the course of MS is driven by the very same pathogenic mechanisms responsible for the disease, the autoimmune attack on the CNS that leads to chronic inflammation, neuroaxonal degeneration and remyelination. We propose that each of these processes acts more or less severely and at different times in each of the clinical subgroups. To test this hypothesis, we developed a mathematical model that was constrained by experimental data (the expanded disability status scale [EDSS] time series) obtained from a retrospective longitudinal cohort of 66 MS patients with a long-term follow-up (up to 20 years). Moreover, we validated this model in a second prospective cohort of 120 MS patients with a three-year follow-up, for which EDSS data and brain volume time series were available. The clinical heterogeneity in the datasets was reduced by grouping the EDSS time series using an unsupervised clustering analysis. We found that by adjusting certain parameters, albeit within their biological range, the mathematical model reproduced the different disease courses, supporting the dynamic CNS damage hypothesis to explain MS heterogeneity. Our analysis suggests that the irreversible axon degeneration produced in the early stages of progressive MS is mainly due to the higher rate of myelinated axon degeneration, coupled to the lower capacity for remyelination. However, and in agreement with recent pathological studies, degeneration of chronically demyelinated axons is not a key feature that distinguishes this phenotype. Moreover, the model reveals that lower rates of axon degeneration and more rapid remyelination make relapsing MS more resilient than the progressive subtype. Therefore, our results support the hypothesis of a common pathogenesis for the different MS subtypes, even in the presence of genetic and environmental heterogeneity. Hence, MS can be considered as a single disease in which specific dynamics can provoke a variety of clinical outcomes in different patient groups. These results have important implications for the design of therapeutic interventions for MS at different stages of the disease.


Subject(s)
Brain , Computational Biology/methods , Image Processing, Computer-Assisted/methods , Multiple Sclerosis , Brain/diagnostic imaging , Brain/physiopathology , Databases, Factual , Humans , Inflammation , Magnetic Resonance Imaging , Multiple Sclerosis/classification , Multiple Sclerosis/diagnostic imaging , Multiple Sclerosis/physiopathology , Prospective Studies
6.
Neurol Neuroimmunol Neuroinflamm ; 4(2): e321, 2017 Mar.
Article in English | MEDLINE | ID: mdl-28180139

ABSTRACT

OBJECTIVE: To identify differences in the metabolomic profile in the serum of patients with multiple sclerosis (MS) compared to controls and to identify biomarkers of disease severity. METHODS: We studied 2 cohorts of patients with MS: a retrospective longitudinal cohort of 238 patients and 74 controls and a prospective cohort of 61 patients and 41 controls with serial serum samples. Patients were stratified into active or stable disease based on 2 years of prospective assessment accounting for presence of clinical relapses or changes in disability measured with the Expanded Disability Status Scale (EDSS). Metabolomic profiling (lipids and amino acids) was performed by ultra-high-performance liquid chromatography coupled to mass spectrometry in serum samples. Data analysis was performed using parametric methods, principal component analysis, and partial least square discriminant analysis for assessing the differences between cases and controls and for subgroups based on disease severity. RESULTS: We identified metabolomics signatures with high accuracy for classifying patients vs controls as well as for classifying patients with medium to high disability (EDSS >3.0). Among them, sphingomyelin and lysophosphatidylethanolamine were the metabolites that showed a more robust pattern in the time series analysis for discriminating between patients and controls. Moreover, levels of hydrocortisone, glutamic acid, tryptophan, eicosapentaenoic acid, 13S-hydroxyoctadecadienoic acid, lysophosphatidylcholines, and lysophosphatidylethanolamines were associated with more severe disease (non-relapse-free or increase in EDSS). CONCLUSIONS: We identified metabolomic signatures composed of hormones, lipids, and amino acids associated with MS and with a more severe course.

7.
Oncotarget ; 7(32): 52493-52516, 2016 08 09.
Article in English | MEDLINE | ID: mdl-27191992

ABSTRACT

Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.


Subject(s)
Neoplasms/genetics , Precision Medicine/methods , Biomarkers, Tumor/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , High-Throughput Nucleotide Sequencing/trends , Humans , Medical Oncology/methods , Medical Oncology/standards , Medical Oncology/trends , Pharmacogenetics/methods , Pharmacogenetics/standards , Pharmacogenetics/trends , Precision Medicine/standards , Precision Medicine/trends , Systems Biology/methods , Systems Biology/standards , Systems Biology/trends
8.
J Cell Sci ; 129(8): 1671-84, 2016 Apr 15.
Article in English | MEDLINE | ID: mdl-26945058

ABSTRACT

Collagen VI myopathies are genetic disorders caused by mutations in collagen 6 A1, A2 and A3 genes, ranging from the severe Ullrich congenital muscular dystrophy to the milder Bethlem myopathy, which is recapitulated by collagen-VI-null (Col6a1(-/-)) mice. Abnormalities in mitochondria and autophagic pathway have been proposed as pathogenic causes of collagen VI myopathies, but the link between collagen VI defects and these metabolic circuits remains unknown. To unravel the expression profiling perturbation in muscles with collagen VI myopathies, we performed a deep RNA profiling in both Col6a1(-/-)mice and patients with collagen VI pathology. The interactome map identified common pathways suggesting a previously undetected connection between circadian genes and collagen VI pathology. Intriguingly, Bmal1(-/-)(also known as Arntl) mice, a well-characterized model displaying arrhythmic circadian rhythms, showed profound deregulation of the collagen VI pathway and of autophagy-related genes. The involvement of circadian rhythms in collagen VI myopathies is new and links autophagy and mitochondrial abnormalities. It also opens new avenues for therapies of hereditary myopathies to modulate the molecular clock or potential gene-environment interactions that might modify muscle damage pathogenesis.


Subject(s)
ARNTL Transcription Factors/genetics , Circadian Clocks/physiology , Collagen Type VI/genetics , Contracture/genetics , Mitochondria/physiology , Muscular Dystrophies/congenital , Mutation/genetics , Sclerosis/genetics , Animals , Autophagy/genetics , Gene Expression Profiling , Humans , Mice , Mice, Knockout , Microarray Analysis , Muscular Dystrophies/genetics , RNA/analysis
9.
PLoS One ; 10(2): e0116718, 2015.
Article in English | MEDLINE | ID: mdl-25665127

ABSTRACT

BACKGROUND: In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). METHODS: The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. RESULTS: Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. CONCLUSION: The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.


Subject(s)
Biological Ontologies , Electronic Health Records , Information Storage and Retrieval , Multiple Sclerosis/classification , PubMed , Antineoplastic Agents/therapeutic use , Antirheumatic Agents/therapeutic use , Computational Biology/methods , Fingolimod Hydrochloride/therapeutic use , Humans , Immunosuppressive Agents/therapeutic use , Knowledge Discovery , Mitoxantrone/therapeutic use , Multiple Sclerosis/drug therapy , Rituximab/therapeutic use
10.
Mult Scler ; 21(2): 138-46, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25112814

ABSTRACT

The pathogenesis of multiple sclerosis (MS) involves alterations to multiple pathways and processes, which represent a significant challenge for developing more-effective therapies. Systems biology approaches that study pathway dysregulation should offer benefits by integrating molecular networks and dynamic models with current biological knowledge for understanding disease heterogeneity and response to therapy. In MS, abnormalities have been identified in several cytokine-signaling pathways, as well as those of other immune receptors. Among the downstream molecules implicated are Jak/Stat, NF-Kb, ERK1/3, p38 or Jun/Fos. Together, these data suggest that MS is likely to be associated with abnormalities in apoptosis/cell death, microglia activation, blood-brain barrier functioning, immune responses, cytokine production, and/or oxidative stress, although which pathways contribute to the cascade of damage and can be modulated remains an open question. While current MS drugs target some of these pathways, others remain untouched. Here, we propose a pragmatic systems analysis approach that involves the large-scale extraction of processes and pathways relevant to MS. These data serve as a scaffold on which computational modeling can be performed to identify disease subgroups based on the contribution of different processes. Such an analysis, targeting these relevant MS-signaling pathways, offers the opportunity to accelerate the development of novel individual or combination therapies.


Subject(s)
Multiple Sclerosis/drug therapy , Multiple Sclerosis/metabolism , Signal Transduction/drug effects , Signal Transduction/physiology , Drug Discovery , Humans
11.
PLoS One ; 9(1): e84955, 2014.
Article in English | MEDLINE | ID: mdl-24416320

ABSTRACT

One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.


Subject(s)
Adenoma/genetics , Algorithms , Carcinoma/genetics , Colorectal Neoplasms/genetics , Neuromuscular Diseases/genetics , Adenoma/classification , Adenoma/diagnosis , Carcinoma/classification , Carcinoma/diagnosis , Cluster Analysis , Colorectal Neoplasms/classification , Colorectal Neoplasms/diagnosis , Diagnosis, Differential , Gene Expression Profiling , Gene Expression Regulation , Gene Regulatory Networks , Humans , Multigene Family , Neuromuscular Diseases/classification , Neuromuscular Diseases/diagnosis , Oligonucleotide Array Sequence Analysis , Precision Medicine
12.
PLoS Comput Biol ; 8(2): e1002365, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22319435

ABSTRACT

Elucidation of new biomarkers and potential drug targets from high-throughput profiling data is a challenging task due to a limited number of available biological samples and questionable reproducibility of differential changes in cross-dataset comparisons. In this paper we propose a novel computational approach for drug and biomarkers discovery using comprehensive analysis of multiple expression profiling datasets.The new method relies on aggregation of individual profiling experiments combined with leave-one-dataset-out validation approach. Aggregated datasets were studied using Sub-Network Enrichment Analysis algorithm (SNEA) to find consistent statistically significant key regulators within the global literature-extracted expression regulation network. These regulators were linked to the consistent differentially expressed genes.We have applied our approach to several publicly available human muscle gene expression profiling datasets related to Duchenne muscular dystrophy (DMD). In order to detect both enhanced and repressed processes we considered up- and down-regulated genes separately. Applying the proposed approach to the regulators search we discovered the disturbance in the activity of several muscle-related transcription factors (e.g. MYOG and MYOD1), regulators of inflammation, regeneration, and fibrosis. Almost all SNEA-derived regulators of down-regulated genes (e.g. AMPK, TORC2, PPARGC1A) correspond to a single common pathway important for fast-to-slow twitch fiber type transition. We hypothesize that this process can affect the severity of DMD symptoms, making corresponding regulators and downstream genes valuable candidates for being potential drug targets and exploratory biomarkers.


Subject(s)
Computational Biology/methods , Drug Discovery/methods , Gene Expression Profiling/methods , Muscular Dystrophy, Duchenne/drug therapy , Algorithms , Biomarkers/analysis , Databases, Genetic , Humans , Male , Meta-Analysis as Topic , Muscular Dystrophy, Duchenne/genetics , Muscular Dystrophy, Duchenne/metabolism , Oligonucleotide Array Sequence Analysis
13.
J Bioinform Comput Biol ; 8(3): 593-606, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20556864

ABSTRACT

Heterogeneous high-throughput biological data become readily available for various diseases. The amount of data points generated by such experiments does not allow manual integration of the information to design the most optimal therapy for a disease. We describe a novel computational workflow for designing therapy using Ariadne Genomics Pathway Studio software. We use publically available microarray experiments for glioblastoma and automatically constructed ResNet and ChemEffect databases to exemplify how to find potentially effective chemicals for glioblastoma--the disease yet without effective treatment. Our first approach involved construction of signaling pathway affected in glioblastoma using scientific literature and data available in ResNet database. Compounds known to affect multiple proteins in this pathway were found in ChemEffect database. Another approach involved analysis of differential expression in glioblastoma patients using Sub-Network Enrichment Analysis (SNEA). SNEA identified angiogenesis-related protein Cyr61 as the major positive regulator upstream of genes differentially expressed in glioblastoma. Using our findings, we then identified breast cancer drug Fulvestrant as a major inhibitor of glioblastoma pathway as well as Cyr61. This suggested Fulvestrant as a potential treatment against glioblastoma. We further show how to increase efficacy of glioblastoma treatment by finding optimal combinations of Fulvestrant with other drugs.


Subject(s)
Antineoplastic Agents/administration & dosage , Combinatorial Chemistry Techniques/methods , Glioblastoma/drug therapy , Glioblastoma/metabolism , Models, Biological , Neoplasm Proteins/metabolism , Signal Transduction/drug effects , Animals , Computer Simulation , Drug Design , Humans
14.
PLoS One ; 5(2): e9256, 2010 Feb 17.
Article in English | MEDLINE | ID: mdl-20174649

ABSTRACT

Microarray-based expression profiling of living systems is a quick and inexpensive method to obtain insights into the nature of various diseases and phenotypes. A typical microarray profile can yield hundreds or even thousands of differentially expressed genes and finding biologically plausible themes or regulatory mechanisms underlying these changes is a non-trivial and daunting task. We describe a novel approach for systems-level interpretation of microarray expression data using a manually constructed "overview" pathway depicting the main cellular signaling channels (Atlas of Signaling). Currently, the developed pathway focuses on signal transduction from surface receptors to transcription factors and further transcriptional regulation of cellular "workhorse" proteins. We show how the constructed Atlas of Signaling in combination with an enrichment analysis algorithm allows quick identification and visualization of the main signaling cascades and cellular processes affected in a gene expression profiling experiment. We validate our approach using several publicly available gene expression datasets.


Subject(s)
Gene Expression Profiling/methods , Gene Regulatory Networks , Oligonucleotide Array Sequence Analysis/methods , Signal Transduction/genetics , Algorithms , Gene Expression Regulation , Models, Genetic , Proteome/genetics , Software
15.
Expert Opin Drug Discov ; 4(12): 1307-18, 2009 Dec.
Article in English | MEDLINE | ID: mdl-23480468

ABSTRACT

IMPORTANCE OF THE FIELD: Drug discovery and development is a very complex and costly process. Understanding the detailed molecular mechanisms of a disease and drug actions can make it more efficient not only for new target discovery but also for lead prioritization, drug repositioning and development of biomarkers for drug efficacy and safety. Access to formalized knowledge about functions of proteins and small molecules is crucial for rationalization of the drug development process, and scientific publications are the main source of this knowledge. Protein knowledge networks capturing protein functions, protein-protein relations and organization of proteins in complex cellular sub-systems are making their way into modern drug discovery. Chemical networks representing multiple aspects of chemical functional information integrated into a protein systems biology network is even more advanced and promising paradigm. AREAS COVERED IN THIS REVIEW: This review describes utilization of literature-derived protein and chemical functional knowledge bases in drug development. WHAT THE READER WILL GAIN: Readers will gain an understanding of how integrated protein and chemical knowledge networks can be used for understanding and building the models of cellular events, disease mechanisms, and drug actions, finding biomarkers of drug efficacy and safety, as well as interpretation of high-throughput gene expression, proteomic and metabolomic experiments. TAKE HOME MESSAGE: Integrated literature-derived protein and chemical knowledge bases can rationalize many aspects of drug development process including drug repositioning and biomarker design.

16.
BMC Evol Biol ; 7: 125, 2007 Jul 27.
Article in English | MEDLINE | ID: mdl-17662135

ABSTRACT

BACKGROUND: Molecular evolution is usually described assuming a neutral or weakly non-neutral substitution model. Recently, new data have become available on evolution of sequence regions under a selective pressure, e.g. transcription factor binding sites. To reconstruct the evolutionary history of such sequences, one needs evolutionary models that take into account a substantial constant selective pressure. RESULTS: We present a simple evolutionary model with a single preferred (consensus) nucleotide and the neutral substitution model adopted for all other nucleotides. This evolutionary model has a rate matrix in which all substitutions that do not involve the consensus nucleotide occur with the same rate. The model has two time scales for achieving a stationary distribution; in the general case only one of the two rate parameters can be evaluated from the stationary distribution. In the middle-time zone, a counterintuitive behavior was observed for some parameter values, with a probability of conservation for a non-consensus nucleotide greater than that for the consensus nucleotide. Such an effect can be observed only in the case of weak preference for the consensus nucleotide, when the probability to observe the consensus nucleotide in the stationary distribution is less than 1/2. If the substitution rate is represented as a product of mutation and fixation, only the fixation can be calculated from the stationary distribution. The exhibited conservation of non-consensus nucleotides does not take place if the elements of mutation matrix are identical, and can be related to the reduced mutation rate between the non-consensus nucleotides. This bias can have no effect on the stationary distribution of nucleotide frequencies calculated over the ensemble of multiple alignments, e.g. transcription factor binding sites upstream of different sets of co-regulated orthologous genes. CONCLUSION: The derived model can be used as a null model when analyzing the evolution of orthologous transcription factor binding sites. In particular, our findings show that a nucleotide preferred at some position of a multiple alignment of binding sites for some transcription factor in the same genome is not necessarily the most conserved nucleotide in an alignment of orthologous sites from different species. However, this effect can take place only in the case of a mutation matrix whose elements are not identical.


Subject(s)
DNA/genetics , Evolution, Molecular , Models, Biological , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics , Base Sequence , Binding Sites , Consensus Sequence
17.
Evol Bioinform Online ; 3: 197-206, 2007 Aug 08.
Article in English | MEDLINE | ID: mdl-19461979

ABSTRACT

MOTIVATION: Although a great deal of progress is being made in the development of fast and reliable experimental techniques to extract genome-wide networks of protein-protein and protein-DNA interactions, the sequencing of new genomes proceeds at an even faster rate. That is why there is a considerable need for reliable methods of in-silico prediction of protein interaction based solely on sequence similarity information and known interactions from well-studied organisms. This problem can be solved if a dependency exists between sequence similarity and the conservation of the proteins' functions. RESULTS: In this paper, we introduce a novel probabilistic method for prediction of protein-protein interactions using a new empirical probabilistic formula describing the loss of interactions between homologous proteins during the course of evolution. This formula describes an evolutional process quite similar to the process of the Earth's population growth. In addition, our method favors predictions confirmed by several interacting pairs over predictions coming from a single interacting pair. Our approach is useful in working with "noisy" data such as those coming from high-throughput experiments. We have generated predictions for five "model" organisms: H. sapiens, D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae and evaluated the quality of these predictions.

18.
BMC Bioinformatics ; 7: 171, 2006 Mar 24.
Article in English | MEDLINE | ID: mdl-16563163

ABSTRACT

BACKGROUND: Scientific literature is a source of the most reliable and comprehensive knowledge about molecular interaction networks. Formalization of this knowledge is necessary for computational analysis and is achieved by automatic fact extraction using various text-mining algorithms. Most of these techniques suffer from high false positive rates and redundancy of the extracted information. The extracted facts form a large network with no pathways defined. RESULTS: We describe the methodology for automatic curation of Biological Association Networks (BANs) derived by a natural language processing technology called Medscan. The curated data is used for automatic pathway reconstruction. The algorithm for the reconstruction of signaling pathways is also described and validated by comparison with manually curated pathways and tissue-specific gene expression profiles. CONCLUSION: Biological Association Networks extracted by MedScan technology contain sufficient information for constructing thousands of mammalian signaling pathways for multiple tissues. The automatically curated MedScan data is adequate for automatic generation of good quality signaling networks. The automatically generated Regulome pathways and manually curated pathways used for their validation are available free in the ResNetCore database from Ariadne Genomics, Inc. 1. The pathways can be viewed and analyzed through the use of a free demo version of PathwayStudio software. The Medscan technology is also available for evaluation using the free demo version of PathwayStudio software.


Subject(s)
Databases, Bibliographic , Natural Language Processing , Periodicals as Topic , Protein Interaction Mapping/methods , Proteins/classification , Proteins/metabolism , Signal Transduction/physiology , Information Storage and Retrieval/methods , Software
19.
Gene ; 347(2): 255-63, 2005 Mar 14.
Article in English | MEDLINE | ID: mdl-15725380

ABSTRACT

In bioinformatics, binding of transcription regulatory factors to the cognate binding sites is usually described by sequence-specific binding energy, which is estimated from a training sample of sites. This model implies that all binding sites with binding energy above some threshold are functional and site sequence variations should be considered neutral until they do not reduce this energy below the threshold. To quantify this energy, the binding profile (positional weight matrix, PWM) model or consensus-based model is usually applied. Here we show that in many cases available data are not sufficient to construct a relevant PWM, and modified consensus-based model could be more effective to describe binding properties. Further, using the data about binding sites of several transcription factors, we demonstrate that some non-consensus nucleotides in "orthologous sites" (that is, binding sites of the same factor upstream of orthologous genes), which have been believed to be irrelevant or even hindering the regulation, are evolutionary very stable and specific for the regulated gene. For each two considered genomes, the number of substitutions between non-consensus nucleotides is far less than the expected number of neutral substitutions. Moreover, in several positions of binding sites regulating different genes, there are non-consensus nucleotides conserved in distant genomes. It means that there exists a selection pressure, which results in the stability of non-consensus nucleotides.


Subject(s)
DNA/metabolism , Evolution, Molecular , Transcription Factors/metabolism , Base Sequence , Binding Sites , Consensus Sequence , Models, Biological , Prokaryotic Cells/physiology , Transcription Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...