ABSTRACT
Reconstruction of large scale gene regulatory networks (GRNs in the following) is an important step for understanding the complex regulatory mechanisms within the cell. Many modeling approaches have been introduced to find the causal relationship between genes using expression data. However, they have been suffering from high dimensionality-large number of genes but a small number of samples, overfitting, heavy computation time and low interpretability. We have previously proposed an original Data Mining algorithm Licorn, that infers cooperative regulation network from expression datasets. In this work, we present an extension of Licorn to a hybrid inference method h-Licorn that uses search in both discrete and real valued spaces. Licorn's algorithm, using the discrete space to find cooperative regulation relationships fitting the target gene expression, has been shown to be powerful in identifying cooperative regulation relationships that are out of the scope of most GRN inference methods. Still, as many of related GRN inference techniques, Licorn suffers from a large number of false positives. We propose here an extension of Licorn with a numerical selection step, expressed as a linear regression problem, that effectively complements the discrete search of Licorn. We evaluate a bootstrapped version of h-Licorn on the in silico Dream5 dataset and show that h-Licorn has significantly higher performance than Licorn, and is competitive or outperforms state of the art GRN inference algorithms, especially when operating on small data sets. We also applied h-Licorn on a real dataset of human bladder cancer and show that it performs better than other methods in finding candidate regulatory interactions. In particular, solely based on gene expression data, h-Licorn is able to identify experimentally validated regulator cooperative relationships involved in cancer.
Subject(s)
Algorithms , Gene Regulatory Networks , Urinary Bladder Neoplasms/genetics , Computational Biology , Gene Expression Regulation, Neoplastic , HumansABSTRACT
MOTIVATION: The identification of recurrent genomic alterations can provide insight into the initiation and progression of genetic diseases, such as cancer. Array-CGH can identify chromosomal regions that have been gained or lost, with a resolution of approximately 1 mb, for the cutting-edge techniques. The extraction of discrete profiles from raw array-CGH data has been studied extensively, but subsequent steps in the analysis require flexible, efficient algorithms, particularly if the number of available profiles exceeds a few tens or the number of array probes exceeds a few thousands. RESULTS: We propose two algorithms for computing minimal and minimal constrained regions of gain and loss from discretized CGH profiles. The second of these algorithms can handle additional constraints describing relevant regions of copy number change. We have validated these algorithms on two public array-CGH datasets. AVAILABILITY: From the authors, upon request. CONTACT: celine@lri.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.