Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 9(1): 1681, 2018 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-29703885

RESUMO

Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop BIISQ, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. BIISQ does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. BIISQ shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.


Assuntos
Processamento Alternativo/genética , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Teorema de Bayes , Simulação por Computador , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Humanos , Isoformas de Proteínas/genética , Software , Estatísticas não Paramétricas
2.
Bioinformatics ; 29(13): i117-25, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812975

RESUMO

MOTIVATION: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. RESULTS: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. AVAILABILITY: Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.


Assuntos
DNA/metabolismo , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação , DNA/química , Genoma , Humanos , Modelos Lineares , Análise Serial de Proteínas , Ligação Proteica , Proteínas de Saccharomyces cerevisiae/metabolismo , Máquina de Vetores de Suporte
3.
Methods Mol Biol ; 939: 47-58, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23192540

RESUMO

Elucidating the structure of gene regulatory networks (GRN), i.e., identifying which genes are under control of which transcription factors, is an important challenge to gain insight on a cell's working mechanisms. We present SIRENE, a method to estimate a GRN from a collection of expression data. Contrary to most existing methods for GRN inference, SIRENE requires as input a list of known regulations, in addition to expression data, and implements a supervised machine-learning approach based on learning from positive and unlabeled examples to account for the lack of negative examples.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Redes Reguladoras de Genes , Algoritmos , Escherichia coli/genética , Escherichia coli/metabolismo , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Transcrição
4.
BMC Syst Biol ; 6: 145, 2012 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-23173819

RESUMO

BACKGROUND: Inferring the structure of gene regulatory networks (GRN) from a collection of gene expression data has many potential applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. RESULTS: In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection, for that purpose. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (for Trustful Inference of Gene REgulation with Stability Selection), was ranked among the top GRN inference methods in the DREAM5 gene network inference challenge. In particular, TIGRESS was evaluated to be the best linear regression-based method in the challenge. We investigate in depth the influence of the various parameters of the method, and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference, in both directed and undirected settings. CONCLUSIONS: TIGRESS reaches state-of-the-art performance on benchmark data, including both in silico and in vivo (E. coli and S. cerevisiae) networks. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/tigress. Moreover, TIGRESS can be run online through the GenePattern platform (GP-DREAM, http://dream.broadinstitute.org).


Assuntos
Redes Reguladoras de Genes , Biologia de Sistemas/métodos , Transcriptoma , Escherichia coli/genética , Escherichia coli/metabolismo , Análise de Regressão , Fatores de Transcrição/metabolismo
5.
BMC Bioinformatics ; 12: 389, 2011 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-21977986

RESUMO

BACKGROUND: Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. RESULTS: We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases. CONCLUSIONS: ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at http://cbio.ensmp.fr/prodige.


Assuntos
Algoritmos , Doença/genética , Predisposição Genética para Doença , Inteligência Artificial , Biologia Computacional/métodos , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Humanos
6.
Bioinformatics ; 24(16): i76-82, 2008 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-18689844

RESUMO

MOTIVATION: Living cells are the product of gene expression programs that involve the regulated transcription of thousands of genes. The elucidation of transcriptional regulatory networks is thus needed to understand the cell's working mechanism, and can for example, be useful for the discovery of novel therapeutic targets. Although several methods have been proposed to infer gene regulatory networks from gene expression data, a recent comparison on a large-scale benchmark experiment revealed that most current methods only predict a limited number of known regulations at a reasonable precision level. RESULTS: We propose SIRENE (Supervised Inference of Regulatory Networks), a new method for the inference of gene regulatory networks from a compendium of expression data. The method decomposes the problem of gene regulatory network inference into a large number of local binary classification problems, that focus on separating target genes from non-targets for each transcription factor. SIRENE is thus conceptually simple and computationally efficient. We test it on a benchmark experiment aimed at predicting regulations in Escherichia coli, and show that it retrieves of the order of 6 times more known regulations than other state-of-the-art inference methods. AVAILABILITY: All data and programs are freely available at http://cbio. ensmp.fr/sirene.


Assuntos
Inteligência Artificial , Proteínas de Escherichia coli/metabolismo , Escherichia coli/fisiologia , Perfilação da Expressão Gênica/métodos , Regulação Bacteriana da Expressão Gênica/fisiologia , Modelos Biológicos , Transdução de Sinais/fisiologia , Simulação por Computador , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...