Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Genome Biol ; 25(1): 24, 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-38238840

RESUMO

BACKGROUND: Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS: We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION: Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.


Assuntos
Redes Reguladoras de Genes , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Algoritmos , Leucócitos Mononucleares , Fatores de Transcrição/genética
2.
bioRxiv ; 2023 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-37790443

RESUMO

Cells respond to environmental and developmental stimuli by remodeling their transcriptomes through regulation of both mRNA transcription and mRNA decay. A central goal of biology is identifying the global set of regulatory relationships between factors that control mRNA production and degradation and their target transcripts and construct a predictive model of gene expression. Regulatory relationships are typically identified using transcriptome measurements and causal inference algorithms. RNA kinetic parameters are determined experimentally by employing run-on or metabolic labeling (e.g. 4-thiouracil) methods that allow transcription and decay rates to be separately measured. Here, we develop a deep learning model, trained with single-cell RNA-seq data, that both infers causal regulatory relationships and estimates RNA kinetic parameters. The resulting in silico model predicts future gene expression states and can be perturbed to simulate the effect of transcription factor changes. We acquired model training data by sequencing the transcriptomes of 175,000 individual Saccharomyces cerevisiae cells that were subject to an external perturbation and continuously sampled over a one hour period. The rate of change for each transcript was calculated on a per-cell basis to estimate RNA velocity. We then trained a deep learning model with transcriptome and RNA velocity data to calculate time-dependent estimates of mRNA production and decay rates. By separating RNA velocity into transcription and decay rates, we show that rapamycin treatment causes existing ribosomal protein transcripts to be rapidly destabilized, while production of new transcripts gradually slows over the course of an hour. The neural network framework we present is designed to explicitly model causal regulatory relationships between transcription factors and their genes, and shows superior performance to existing models on the basis of recovery of known regulatory relationships. We validated the predictive power of the model by perturbing transcription factors in silico and comparing transcriptome-wide effects with experimental data. Our study represents the first step in constructing a complete, predictive, biophysical model of gene expression regulation.

3.
bioRxiv ; 2023 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-36778259

RESUMO

The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.

4.
Sci Rep ; 12(1): 16531, 2022 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-36192495

RESUMO

The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.


Assuntos
Escherichia coli , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Escherichia coli/genética
5.
Bioinformatics ; 38(9): 2519-2528, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35188184

RESUMO

MOTIVATION: Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. RESULTS: In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. AVAILABILITY AND IMPLEMENTATION: The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Reguladoras de Genes , Software , Animais , Camundongos , Genômica , Genoma , Cromatina
6.
PLoS Comput Biol ; 17(1): e1008569, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33411784

RESUMO

The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using, e.g., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework, Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision (DEWÄKSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods. Code and example data for DEWÄKSS is available at https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch.


Assuntos
Algoritmos , Genômica/métodos , Análise de Célula Única/métodos , Aprendizado de Máquina Supervisionado , Animais , Linhagem Celular , Bases de Dados Genéticas , Humanos , Camundongos , RNA-Seq , Saccharomyces cerevisiae
7.
NPJ Syst Biol Appl ; 6(1): 37, 2020 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-33168813

RESUMO

The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where ~1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and potential novel cancer-related regulatory interactions were identified.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Neoplasias/genética , Algoritmos , Linhagem Celular Tumoral , Humanos , Neoplasias/patologia
8.
Sci Rep ; 10(1): 14149, 2020 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-32843692

RESUMO

The gene regulatory network (GRN) of human cells encodes mechanisms to ensure proper functioning. However, if this GRN is dysregulated, the cell may enter into a disease state such as cancer. Understanding the GRN as a system can therefore help identify novel mechanisms underlying disease, which can lead to new therapies. To deduce regulatory interactions relevant to cancer, we applied a recent computational inference framework to data from perturbation experiments in squamous carcinoma cell line A431. GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. We developed a novel approach to assess the predictiveness of inferred GRNs against validation data, despite the lack of a gold standard. The best GRN was significantly more predictive than the null model, both in cross-validated benchmarks and for an independent dataset of the same genes under a different perturbation design. The inferred GRN captures many known regulatory interactions central to cancer-relevant processes in addition to predicting many novel interactions, some of which were experimentally validated, thus providing mechanistic insights that are useful for future cancer research.


Assuntos
Carcinogênese/genética , Redes Reguladoras de Genes , Genes Neoplásicos , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patologia , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patologia , Linhagem Celular Tumoral , Meios de Cultura Livres de Soro , Técnicas de Silenciamento de Genes , Glioma/genética , Glioma/patologia , Humanos , Método de Monte Carlo , Interferência de RNA , RNA Interferente Pequeno/genética
10.
Nat Commun ; 11(1): 856, 2020 02 12.
Artigo em Inglês | MEDLINE | ID: mdl-32051402

RESUMO

Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, that commonly define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without prior knowledge of a biological network, instead training a deep autoencoder from large transcriptional data. We hypothesize that modules could be discovered within the autoencoder representations. We find a statistically significant enrichment of genome-wide association studies (GWAS) relevant genes in the last layer, and to a successively lesser degree in the middle and first layers respectively. In contrast, we find an opposite gradient where a modular protein-protein interaction signal is strongest in the first layer, but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach is sufficient to discover groups of disease-related genes.


Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Mapas de Interação de Proteínas/genética , Algoritmos , Biologia Computacional , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Genética , Humanos , Biologia de Sistemas
11.
Genome Med ; 11(1): 47, 2019 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-31358043

RESUMO

BACKGROUND: Genomic medicine has paved the way for identifying biomarkers and therapeutically actionable targets for complex diseases, but is complicated by the involvement of thousands of variably expressed genes across multiple cell types. Single-cell RNA-sequencing study (scRNA-seq) allows the characterization of such complex changes in whole organs. METHODS: The study is based on applying network tools to organize and analyze scRNA-seq data from a mouse model of arthritis and human rheumatoid arthritis, in order to find diagnostic biomarkers and therapeutic targets. Diagnostic validation studies were performed using expression profiling data and potential protein biomarkers from prospective clinical studies of 13 diseases. A candidate drug was examined by a treatment study of a mouse model of arthritis, using phenotypic, immunohistochemical, and cellular analyses as read-outs. RESULTS: We performed the first systematic analysis of pathways, potential biomarkers, and drug targets in scRNA-seq data from a complex disease, starting with inflamed joints and lymph nodes from a mouse model of arthritis. We found the involvement of hundreds of pathways, biomarkers, and drug targets that differed greatly between cell types. Analyses of scRNA-seq and GWAS data from human rheumatoid arthritis (RA) supported a similar dispersion of pathogenic mechanisms in different cell types. Thus, systems-level approaches to prioritize biomarkers and drugs are needed. Here, we present a prioritization strategy that is based on constructing network models of disease-associated cell types and interactions using scRNA-seq data from our mouse model of arthritis, as well as human RA, which we term multicellular disease models (MCDMs). We find that the network centrality of MCDM cell types correlates with the enrichment of genes harboring genetic variants associated with RA and thus could potentially be used to prioritize cell types and genes for diagnostics and therapeutics. We validated this hypothesis in a large-scale study of patients with 13 different autoimmune, allergic, infectious, malignant, endocrine, metabolic, and cardiovascular diseases, as well as a therapeutic study of the mouse arthritis model. CONCLUSIONS: Overall, our results support that our strategy has the potential to help prioritize diagnostic and therapeutic targets in human disease.


Assuntos
Suscetibilidade a Doenças , Técnicas de Diagnóstico Molecular , Herança Multifatorial , Análise de Célula Única , Animais , Artrite Reumatoide/diagnóstico , Artrite Reumatoide/etiologia , Biomarcadores , Biologia Computacional/métodos , Modelos Animais de Doenças , Descoberta de Drogas/métodos , Perfilação da Expressão Gênica , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Redes Neurais de Computação , Reprodutibilidade dos Testes , Análise de Célula Única/métodos
12.
Bioinformatics ; 35(6): 1026-1032, 2019 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-30169550

RESUMO

MOTIVATION: Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. RESULTS: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences. AVAILABILITY AND IMPLEMENTATION: https://bitbucket.org/sonnhammergrni/genespider/src/NB/%2B Methods/NestBoot.m.


Assuntos
Algoritmos , Redes Reguladoras de Genes
13.
PLoS Comput Biol ; 13(6): e1005608, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28640810

RESUMO

Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.


Assuntos
Mapeamento Cromossômico/métodos , Modelos Genéticos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Software , Células Th2/metabolismo , Algoritmos , Diferenciação Celular/fisiologia , Células Cultivadas , Simulação por Computador , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Humanos , Linguagens de Programação
14.
Mol Biosyst ; 13(7): 1304-1312, 2017 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-28485748

RESUMO

A key question in network inference, that has not been properly answered, is what accuracy can be expected for a given biological dataset and inference method. We present GeneSPIDER - a Matlab package for tuning, running, and evaluating inference algorithms that allows independent control of network and data properties to enable data-driven benchmarking. GeneSPIDER is uniquely suited to address this question by first extracting salient properties from the experimental data and then generating simulated networks and data that closely match these properties. It enables data-driven algorithm selection, estimation of inference accuracy from biological data, and a more multifaceted benchmarking. Included are generic pipelines for the design of perturbation experiments, bootstrapping, analysis of linear dependence, sample selection, scaling of SNR, and performance evaluation. With GeneSPIDER we aim to move the goal of network inference benchmarks from simple performance measurement to a deeper understanding of how the accuracy of an algorithm is determined by different combinations of network and data properties.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Animais , Benchmarking , Humanos , Modelos Genéticos
15.
Genome Med ; 8(1): 88, 2016 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-27553366

RESUMO

BACKGROUND: Cancer patients often show no or only modest benefit from a given therapy. This major problem in oncology is generally attributed to the lack of specific predictive biomarkers, yet a global measure of cancer cell activity may support a comprehensive mechanistic understanding of therapy efficacy. We reasoned that network analysis of omic data could help to achieve this goal. METHODS: A measure of "cancer network activity" (CNA) was implemented based on a previously defined network feature of communicability. The network nodes and edges corresponded to human proteins and experimentally identified interactions, respectively. The edges were weighted proportionally to the expression of the genes encoding for the corresponding proteins and relative to the number of direct interactors. The gene expression data corresponded to the basal conditions of 595 human cancer cell lines. Therapeutic responses corresponded to the impairment of cell viability measured by the half maximal inhibitory concentration (IC50) of 130 drugs approved or under clinical development. Gene ontology, signaling pathway, and transcription factor-binding annotations were taken from public repositories. Predicted synergies were assessed by determining the viability of four breast cancer cell lines and by applying two different analytical methods. RESULTS: The effects of drug classes were associated with CNAs formed by different cell lines. CNAs also differentiate target families and effector pathways. Proteins that occupy a central position in the network largely contribute to CNA. Known key cancer-associated biological processes, signaling pathways, and master regulators also contribute to CNA. Moreover, the major cancer drivers frequently mediate CNA and therapeutic differences. Cell-based assays centered on these differences and using uncorrelated drug effects reveals novel synergistic combinations for the treatment of breast cancer dependent on PI3K-mTOR signaling. CONCLUSIONS: Cancer therapeutic responses can be predicted on the basis of a systems-level analysis of molecular interactions and gene expression. Fundamental cancer processes, pathways, and drivers contribute to this feature, which can also be exploited to predict precise synergistic drug combinations.


Assuntos
Antineoplásicos/farmacologia , Drogas em Investigação/farmacologia , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Redes Reguladoras de Genes/efeitos dos fármacos , Proteínas de Neoplasias/genética , Medicamentos sob Prescrição/farmacologia , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Sobrevivência Celular/efeitos dos fármacos , Sinergismo Farmacológico , Feminino , Perfilação da Expressão Gênica , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Mutação , Proteínas de Neoplasias/metabolismo , Fosfatidilinositol 3-Quinases/genética , Fosfatidilinositol 3-Quinases/metabolismo , Transdução de Sinais , Serina-Treonina Quinases TOR/genética , Serina-Treonina Quinases TOR/metabolismo
16.
Mol Biosyst ; 11(1): 287-96, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25377664

RESUMO

Statistical regularisation methods such as LASSO and related L1 regularised regression methods are commonly used to construct models of gene regulatory networks. Although they can theoretically infer the correct network structure, they have been shown in practice to make errors, i.e. leave out existing links and include non-existing links. We show that L1 regularisation methods typically produce a poor network model when the analysed data are ill-conditioned, i.e. the gene expression data matrix has a high condition number, even if it contains enough information for correct network inference. However, the correct structure of network models can be obtained for informative data, data with such a signal to noise ratio that existing links can be proven to exist, when these methods fail, by using least-squares regression and setting small parameters to zero, or by using robust network inference, a recent method taking the intersection of all non-rejectable models. Since available experimental data sets are generally ill-conditioned, we recommend to check the condition number of the data matrix to avoid this pitfall of L1 regularised inference, and to also consider alternative methods.


Assuntos
Redes Reguladoras de Genes , Modelos Biológicos , Modelos Estatísticos , Algoritmos , Conjuntos de Dados como Assunto , Reprodutibilidade dos Testes
17.
Bioinformatics ; 30(12): i130-8, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931976

RESUMO

MOTIVATION: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. RESULTS: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética
18.
J Comput Biol ; 20(5): 398-408, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23641867

RESUMO

Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call ζ (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of ζ. In order to avoid such poor choices, we propose a method for optimization of ζ, which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave-one-out cross-optimization and selection of the ζ value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of ζ. We demonstrate that our ζ optimization method for two widely used inference algorithms--Glmnet and NIR--gives accurate and informative estimates of the network structure, given that the data is informative enough.


Assuntos
Algoritmos , Regulação da Expressão Gênica , Modelos Biológicos
19.
Nucleic Acids Res ; 40(Database issue): D821-8, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22110034

RESUMO

FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.


Assuntos
Bases de Dados Genéticas , Redes Reguladoras de Genes , Mapeamento de Interação de Proteínas , Animais , Galinhas/genética , Galinhas/metabolismo , Cães , Humanos , Mapas de Interação de Proteínas , Peixe-Zebra/genética , Peixe-Zebra/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...