Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Metabolites ; 9(8)2019 Aug 01.
Article in English | MEDLINE | ID: mdl-31374904

ABSTRACT

In small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.

2.
PLoS One ; 13(10): e0204960, 2018.
Article in English | MEDLINE | ID: mdl-30281653

ABSTRACT

The vascular endothelium is considered as a key cell compartment for the response to ionizing radiation of normal tissues and tumors, and as a promising target to improve the differential effect of radiotherapy in the future. Following radiation exposure, the global endothelial cell response covers a wide range of gene, miRNA, protein and metabolite expression modifications. Changes occur at the transcriptional, translational and post-translational levels and impact cell phenotype as well as the microenvironment by the production and secretion of soluble factors such as reactive oxygen species, chemokines, cytokines and growth factors. These radiation-induced dynamic modifications of molecular networks may control the endothelial cell phenotype and govern recruitment of immune cells, stressing the importance of clearly understanding the mechanisms which underlie these temporal processes. A wide variety of time series data is commonly used in bioinformatics studies, including gene expression, protein concentrations and metabolomics data. The use of clustering of these data is still an unclear problem. Here, we introduce kernels between Gaussian processes modeling time series, and subsequently introduce a spectral clustering algorithm. We apply the methods to the study of human primary endothelial cells (HUVECs) exposed to a radiotherapy dose fraction (2 Gy). Time windows of differential expressions of 301 genes involved in key cellular processes such as angiogenesis, inflammation, apoptosis, immune response and protein kinase were determined from 12 hours to 3 weeks post-irradiation. Then, 43 temporal clusters corresponding to profiles of similar expressions, including 49 genes out of 301 initially measured, were generated according to the proposed method. Forty-seven transcription factors (TFs) responsible for the expression of clusters of genes were predicted from sequence regulatory elements using the MotifMap system. Their temporal profiles of occurrences were established and clustered. Dynamic network interactions and molecular pathways of TFs and differential genes were finally explored, revealing key node genes and putative important cellular processes involved in tissue infiltration by immune cells following exposure to a radiotherapy dose fraction.


Subject(s)
Dose Fractionation, Radiation , Endothelial Cells/metabolism , Endothelial Cells/radiation effects , Transcriptome/radiation effects , Cluster Analysis , Humans , Multigene Family , Normal Distribution , Phenotype , Time Factors , Transcription Factors/metabolism
3.
Med Image Anal ; 35: 360-374, 2017 01.
Article in English | MEDLINE | ID: mdl-27573862

ABSTRACT

Patients follow-up in oncology is generally performed through the acquisition of dynamic sequences of contrast-enhanced images. Estimating parameters of appropriate models of contrast intake diffusion through tissues should help characterizing the tumour physiology. However, several models have been developed and no consensus exists on their clinical use. In this paper, we propose a unified framework to analyse models of perfusion and estimate their parameters in order to obtain reliable and relevant parametric images. After defining the biological context and the general form of perfusion models, we propose a methodological framework for model assessment in the context of parameter estimation from dynamic imaging data: global sensitivity analysis, structural and practical identifiability analysis, parameter estimation and model comparison. Then, we apply our methodology to five of the most widely used compartment models (Tofts model, extended Tofts model, two-compartment model, tissue-homogeneity model and distributed-parameters model) and illustrate the results by analysing the behaviour of these models when applied to data acquired on five patients with abdominal tumours.


Subject(s)
Abdominal Neoplasms/diagnostic imaging , Models, Biological , Perfusion , Tomography, X-Ray Computed/methods , Algorithms , Humans
4.
Bioinformatics ; 32(12): i28-i36, 2016 06 15.
Article in English | MEDLINE | ID: mdl-27307628

ABSTRACT

MOTIVATION: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. RESULTS: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. CONTACT: celine.brouard@aalto.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Machine Learning , Metabolomics , Molecular Structure , Tandem Mass Spectrometry , Algorithms , Databases, Chemical
5.
Article in English | MEDLINE | ID: mdl-26357265

ABSTRACT

Computational methods for predicting protein-protein interactions are important tools that can complement high-throughput technologies and guide biologists in designing new laboratory experiments. The proteins and the interactions between them can be described by a network which is characterized by several topological properties. Information about proteins and interactions between them, in combination with knowledge about topological properties of the network, can be used for developing computational methods that can accurately predict unknown protein-protein interactions. This paper presents a supervised learning framework based on Bayesian inference for combining two types of information: i) network topology information, and ii) information related to proteins and the interactions between them. The motivation of our model is that by combining these two types of information one can achieve a better accuracy in predicting protein-protein interactions, than by using models constructed from these two types of information independently.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Proteins/chemistry , Algorithms , Bayes Theorem , Databases, Factual , Fungal Proteins , Humans
6.
Nucleic Acids Res ; 43(10): 4833-54, 2015 May 26.
Article in English | MEDLINE | ID: mdl-25897113

ABSTRACT

In mouse embryonic cells, ligand-activated retinoic acid receptors (RARs) play a key role in inhibiting pluripotency-maintaining genes and activating some major actors of cell differentiation. To investigate the mechanism underlying this dual regulation, we performed joint RAR/RXR ChIP-seq and mRNA-seq time series during the first 48 h of the RA-induced Primitive Endoderm (PrE) differentiation process in F9 embryonal carcinoma (EC) cells. We show here that this dual regulation is associated with RAR/RXR genomic redistribution during the differentiation process. In-depth analysis of RAR/RXR binding sites occupancy dynamics and composition show that in undifferentiated cells, RAR/RXR interact with genomic regions characterized by binding of pluripotency-associated factors and high prevalence of the non-canonical DR0-containing RA response element. By contrast, in differentiated cells, RAR/RXR bound regions are enriched in functional Sox17 binding sites and are characterized with a higher frequency of the canonical DR5 motif. Our data offer an unprecedentedly detailed view on the action of RA in triggering pluripotent cell differentiation and demonstrate that RAR/RXR action is mediated via two different sets of regulatory regions tightly associated with cell differentiation status.


Subject(s)
Cell Differentiation/genetics , Gene Expression Regulation , Pluripotent Stem Cells/metabolism , Receptors, Retinoic Acid/metabolism , Response Elements , Retinoid X Receptors/metabolism , Transcription, Genetic , Animals , Binding Sites , Embryonal Carcinoma Stem Cells , Genome , Mice , Nucleotide Motifs , Transcription Factors/metabolism , Tretinoin/pharmacology
7.
Bioinformatics ; 31(5): 728-35, 2015 Mar 01.
Article in English | MEDLINE | ID: mdl-25355790

ABSTRACT

MOTIVATION: Identifying the set of genes differentially expressed along time is an important task in two-sample time course experiments. Furthermore, estimating at which time periods the differential expression is present can provide additional insight into temporal gene functions. The current differential detection methods are designed to detect difference along observation time intervals or on single measurement points, warranting dense measurements along time to characterize the full temporal differential expression patterns. RESULTS: We propose a novel Bayesian likelihood ratio test to estimate the differential expression time periods. Applying the ratio test to systems of genes provides the temporal response timings and durations of gene expression to a biological condition. We introduce a novel non-stationary Gaussian process as the underlying expression model, with major improvements on model fitness on perturbation and stress experiments. The method is robust to uneven or sparse measurements along time. We assess the performance of the method on realistically simulated dataset and compare against state-of-the-art methods. We additionally apply the method to the analysis of primary human endothelial cells under an ionizing radiation stress to study the transcriptional perturbations over 283 measured genes in an attempt to better understand the role of endothelium in both normal and cancer tissues during radiotherapy. As a result, using the cascade of differential expression periods, domain literature and gene enrichment analysis, we gain insights into the dynamic response of endothelial cells to irradiation. AVAILABILITY AND IMPLEMENTATION: R package 'nsgp' is available at www.ibisc.fr/en/logiciels_arobas.


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis/methods , Radiotherapy , Bayes Theorem , Cells, Cultured , Dose-Response Relationship, Radiation , Human Umbilical Vein Endothelial Cells/metabolism , Human Umbilical Vein Endothelial Cells/radiation effects , Humans , Neoplasms/radiotherapy , Normal Distribution , Time Factors
8.
Math Biosci ; 246(2): 326-34, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24176667

ABSTRACT

Reconstructing gene regulatory networks from high-throughput measurements represents a key problem in functional genomics. It also represents a canonical learning problem and thus has attracted a lot of attention in both the informatics and the statistical learning literature. Numerous approaches have been proposed, ranging from simple clustering to rather involved dynamic Bayesian network modeling, as well as hybrid ones that combine a number of modeling steps, such as employing ordinary differential equations coupled with genome annotation. These approaches are tailored to the type of data being employed. Available data sources include static steady state data and time course data obtained either for wild type phenotypes or from perturbation experiments. This review focuses on the class of autoregressive models using time course data for inferring gene regulatory networks. The central themes of sparsity, stability and causality are discussed as well as the ability to integrate prior knowledge for successful use of these models for the learning task at hand.


Subject(s)
Gene Regulatory Networks/genetics , Genomics/methods , Models, Genetic , Bayes Theorem , Cluster Analysis , Humans
9.
BMC Bioinformatics ; 14: 273, 2013 Sep 12.
Article in English | MEDLINE | ID: mdl-24028533

ABSTRACT

BACKGROUND: Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. RESULTS: We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions. CONCLUSIONS: The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge.


Subject(s)
Gene Regulatory Networks , Logic , Markov Chains , Systems Biology/methods , Computer Simulation , Databases, Genetic , Humans , Models, Statistical , ROC Curve , Support Vector Machine
10.
Bioinformatics ; 29(11): 1416-23, 2013 Jun 01.
Article in English | MEDLINE | ID: mdl-23574736

ABSTRACT

MOTIVATION: Reverse engineering of gene regulatory networks remains a central challenge in computational systems biology, despite recent advances facilitated by benchmark in silico challenges that have aided in calibrating their performance. A number of approaches using either perturbation (knock-out) or wild-type time-series data have appeared in the literature addressing this problem, with the latter using linear temporal models. Nonlinear dynamical models are particularly appropriate for this inference task, given the generation mechanism of the time-series data. In this study, we introduce a novel nonlinear autoregressive model based on operator-valued kernels that simultaneously learns the model parameters, as well as the network structure. RESULTS: A flexible boosting algorithm (OKVAR-Boost) that shares features from L2-boosting and randomization-based algorithms is developed to perform the tasks of parameter learning and network inference for the proposed model. Specifically, at each boosting iteration, a regularized Operator-valued Kernel-based Vector AutoRegressive model (OKVAR) is trained on a random subnetwork. The final model consists of an ensemble of such models. The empirical estimation of the ensemble model's Jacobian matrix provides an estimation of the network structure. The performance of the proposed algorithm is first evaluated on a number of benchmark datasets from the DREAM3 challenge and then on real datasets related to the In vivo Reverse-Engineering and Modeling Assessment (IRMA) and T-cell networks. The high-quality results obtained strongly indicate that it outperforms existing approaches. AVAILABILITY: The OKVAR-Boost Matlab code is available as the archive: http://amis-group.fr/sourcecode-okvar-boost/OKVARBoost-v1.0.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Gene Regulatory Networks , Computer Simulation , Models, Genetic , Nonlinear Dynamics , T-Lymphocytes/immunology
11.
Bioinformatics ; 25(15): 1987-8, 2009 Aug 01.
Article in English | MEDLINE | ID: mdl-19420054

ABSTRACT

SUMMARY: CycSim is a web application dedicated to in silico experiments with genome-scale metabolic models coupled to the exploration of knowledge from BioCyc and KEGG. Specifically, CycSim supports the design of knockout experiments: simulation of growth phenotypes of single or multiple gene deletions mutants on specified media, comparison of these predictions with experimental phenotypes and direct visualization of both on metabolic maps. The web interface is designed for simplicity, putting constraint-based modelling techniques within easier reach of biologists. CycSim also functions as an online repository of genome-scale metabolic models. AVAILABILITY: http://www.genoscope.cns.fr/cycsim.


Subject(s)
Genome , Genomics/methods , Metabolism/genetics , Software , Computational Biology , Databases, Genetic , Internet , User-Computer Interface
12.
BMC Proc ; 2 Suppl 4: S1, 2008 Dec 17.
Article in English | MEDLINE | ID: mdl-19091048

ABSTRACT

This supplement contains extended versions of a selected subset of papers presented at the workshop MLSB 2007, Machine Learning in Systems Biology, Evry, France, from September 24 to 25, 2007.

13.
BMC Bioinformatics ; 9: 91, 2008 Feb 08.
Article in English | MEDLINE | ID: mdl-18261218

ABSTRACT

BACKGROUND: Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge. RESULTS: We proposed various evolutionary strategies suitable for the task and tested our choices using simulated data drawn from a given bio-realistic network of 35 nodes, the so-called insulin network, which has been used in the literature for benchmarking. We assessed the inferred models against this reference to obtain statistical performance results. We then compared performances of evolutionary algorithms using two kinds of recombination operators that operate at different scales in the graphs. We introduced a niching strategy that reinforces diversity through the population and avoided trapping of the algorithm in one local minimum in the early steps of learning. We show the limited effect of the mutation operator when niching is applied. Finally, we compared our best evolutionary approach with various well known learning algorithms (MCMC, K2, greedy search, TPDA, MMHC) devoted to BN structure learning. CONCLUSION: We studied the behaviour of an evolutionary approach enhanced by niching for the learning of gene regulatory networks with BN. We show that this approach outperforms classical structure learning methods in elucidating the original model. These results were obtained for the learning of a bio-realistic network and, more importantly, on various small datasets. This is a suitable approach for learning transcriptional regulatory networks from real datasets without prior knowledge.


Subject(s)
Biological Evolution , Evolution, Molecular , Gene Expression Regulation/genetics , Genetic Variation/genetics , Models, Genetic , Signal Transduction/genetics , Transcription Factors/genetics , Computer Simulation
14.
Bioinformatics ; 23(23): 3209-16, 2007 Dec 01.
Article in English | MEDLINE | ID: mdl-18042557

ABSTRACT

MOTIVATION: Statistical inference of biological networks such as gene regulatory networks, signaling pathways and metabolic networks can contribute to build a picture of complex interactions that take place in the cell. However, biological systems considered as dynamical, non-linear and generally partially observed processes may be difficult to estimate even if the structure of interactions is given. RESULTS: Using the same approach as Sitz et al. proposed in another context, we derive non-linear state-space models from ODEs describing biological networks. In this framework, we apply Unscented Kalman Filtering (UKF) to the estimation of both parameters and hidden variables of non-linear state-space models. We instantiate the method on a transcriptional regulatory model based on Hill kinetics and a signaling pathway model based on mass action kinetics. We successfully use synthetic data and experimental data to test our approach. CONCLUSION: This approach covers a large set of biological networks models and gives rise to simple and fast estimation algorithms. Moreover, the Bayesian tool used here directly provides uncertainty estimates on parameters and hidden states. Let us also emphasize that it can be coupled with structure inference methods used in Graphical Probabilistic Models. AVAILABILITY: Matlab code available on demand.


Subject(s)
Algorithms , Models, Biological , Proteome/metabolism , Signal Transduction/physiology , Computer Simulation , Models, Statistical , Nonlinear Dynamics , Stochastic Processes , Systems Biology/methods
15.
BMC Bioinformatics ; 8 Suppl 2: S4, 2007 May 03.
Article in English | MEDLINE | ID: mdl-17493253

ABSTRACT

BACKGROUND: Elucidating biological networks between proteins appears nowadays as one of the most important challenges in systems biology. Computational approaches to this problem are important to complement high-throughput technologies and to help biologists in designing new experiments. In this work, we focus on the completion of a biological network from various sources of experimental data. RESULTS: We propose a new machine learning approach for the supervised inference of biological networks, which is based on a kernelization of the output space of regression trees. It inherits several features of tree-based algorithms such as interpretability, robustness to irrelevant variables, and input scalability. We applied this method to the inference of two kinds of networks in the yeast S. cerevisiae: a protein-protein interaction network and an enzyme network. In both cases, we obtained results competitive with existing approaches. We also show that our method provides relevant insights on input data regarding their potential relationship with the existence of interactions. Furthermore, we confirm the biological validity of our predictions in the context of an analysis of gene expression data. CONCLUSION: Output kernel tree based methods provide an efficient tool for the inference of biological networks from experimental data. Their simplicity and interpretability should make them of great value for biologists.


Subject(s)
Algorithms , Artificial Intelligence , Gene Expression Regulation/physiology , Models, Biological , Proteome/metabolism , Signal Transduction/physiology , Computer Simulation , Pattern Recognition, Automated/methods , Regression Analysis , Systems Biology/methods , Time Factors
16.
Bioinformatics ; 19 Suppl 2: ii138-48, 2003 Oct.
Article in English | MEDLINE | ID: mdl-14534183

ABSTRACT

This article deals with the identification of gene regulatory networks from experimental data using a statistical machine learning approach. A stochastic model of gene interactions capable of handling missing variables is proposed. It can be described as a dynamic Bayesian network particularly well suited to tackle the stochastic nature of gene regulation and gene expression measurement. Parameters of the model are learned through a penalized likelihood maximization implemented through an extended version of EM algorithm. Our approach is tested against experimental data relative to the S.O.S. DNA Repair network of the Escherichia coli bacterium. It appears to be able to extract the main regulations between the genes involved in this network. An added missing variable is found to model the main protein of the network. Good prediction abilities on unlearned data are observed. These first results are very promising: they show the power of the learning algorithm and the ability of the model to capture gene interactions.


Subject(s)
DNA Repair/physiology , Escherichia coli Proteins/metabolism , Escherichia coli/physiology , Gene Expression Profiling/methods , Gene Expression Regulation/physiology , Models, Biological , Signal Transduction/physiology , Algorithms , Artificial Intelligence , Bayes Theorem , Computer Simulation , Data Interpretation, Statistical , Pattern Recognition, Automated
SELECTION OF CITATIONS
SEARCH DETAIL
...