Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 19(1): 86, 2018 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-29510689

RESUMO

BACKGROUND: Transcription factor (TF) binding site specificity is commonly represented by some form of matrix model in which the positions in the binding site are assumed to contribute independently to the site's activity. The independence assumption is known to be an approximation, often a good one but sometimes poor. Alternative approaches have been developed that use k-mers (DNA "words" of length k) to account for the non-independence, and more recently DNA structural parameters have been incorporated into the models. ChIP-seq data are often used to assess the discriminatory power of motifs and to compare different models. However, to measure the improvement due to using more complex models, one must compare to optimized matrix models. RESULTS: We describe a program "Discriminative Additive Model Optimization" (DAMO) that uses positive and negative examples, as in ChIP-seq data, and finds the additive position weight matrix (PWM) that maximizes the Area Under the Receiver Operating Characteristic Curve (AUROC). We compare to a recent study where structural parameters, serving as features in a gradient boosting classifier algorithm, are shown to improve the AUROC over JASPAR position frequency matrices (PFMs). In agreement with the previous results, we find that adding structural parameters gives the largest improvement, but most of the gain can be obtained by an optimized PWM and nearly all of the gain can be obtained with a di-nucleotide extension to the PWM. CONCLUSION: To appropriately compare different models for TF bind sites, optimized models must be used. PWMs and their extensions are good representations of binding specificity for most TFs, and more complex models, including the incorporation of DNA shape features and gradient boosting classifiers, provide only moderate improvements for a few TFs.


Assuntos
Algoritmos , DNA/química , Modelos Moleculares , Motivos de Nucleotídeos/genética , Matrizes de Pontuação de Posição Específica , Área Sob a Curva , Sítios de Ligação , Bases de Dados de Ácidos Nucleicos , Humanos , Ligação Proteica
2.
PLoS Comput Biol ; 13(7): e1005638, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28686588

RESUMO

The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.


Assuntos
DNA/química , DNA/metabolismo , Modelos Estatísticos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Biologia Computacional , Simulação por Computador , Software
3.
Bioinformatics ; 33(15): 2288-2295, 2017 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-28379348

RESUMO

MOTIVATION: Characterizing the binding specificities of transcription factors (TFs) is crucial to the study of gene expression regulation. Recently developed high-throughput experimental methods, including protein binding microarrays (PBM) and high-throughput SELEX (HT-SELEX), have enabled rapid measurements of the specificities for hundreds of TFs. However, few studies have developed efficient algorithms for estimating binding motifs based on HT-SELEX data. Also the simple method of constructing a position weight matrix (PWM) by comparing the frequency of the preferred sequence with single-nucleotide variants has the risk of generating motifs with higher information content than the true binding specificity. RESULTS: We developed an algorithm called BEESEM that builds on a comprehensive biophysical model of protein-DNA interactions, which is trained using the expectation maximization method. BEESEM is capable of selecting the optimal motif length and calculating the confidence intervals of estimated parameters. By comparing BEESEM with the published motifs estimated using the same HT-SELEX data, we demonstrate that BEESEM provides significant improvements. We also evaluate several motif discovery algorithms on independent PBM and ChIP-seq data. BEESEM provides significantly better fits to in vitro data, but its performance is similar to some other methods on in vivo data under the criterion of the area under the receiver operating characteristic curve (AUROC). This highlights the limitations of the purely rank-based AUROC criterion. Using quantitative binding data to assess models, however, demonstrates that BEESEM improves on prior models. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://stormo.wustl.edu/resources.html . CONTACT: stormo@wustl.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Imunoprecipitação da Cromatina/métodos , DNA/metabolismo , Análise Serial de Proteínas/métodos , Software , Termodinâmica , Fatores de Transcrição/metabolismo , Algoritmos , Animais , Sítios de Ligação , DNA/química , Humanos , Camundongos , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Análise de Sequência de DNA/métodos , Fatores de Transcrição/química
4.
PLoS Genet ; 10(7): e1004501, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25058586

RESUMO

All forms of life are confronted with environmental and genetic perturbations, making phenotypic robustness an important characteristic of life. Although development has long been viewed as a key component of phenotypic robustness, the underlying mechanism is unclear. Here we report that the determinative developmental cell lineages of two protostomes and one deuterostome are structured such that the resulting cellular compositions of the organisms are only modestly affected by cell deaths. Several features of the cell lineages, including their shallowness, topology, early ontogenic appearances of rare cells, and non-clonality of most cell types, underlie the robustness. Simple simulations of cell lineage evolution demonstrate the possibility that the observed robustness arose as an adaptation in the face of random cell deaths in development. These results reveal general organizing principles of determinative developmental cell lineages and a conceptually new mechanism of phenotypic robustness, both of which have important implications for development and evolution.


Assuntos
Adaptação Fisiológica/genética , Evolução Biológica , Linhagem da Célula/genética , Interação Gene-Ambiente , Animais , Caenorhabditis elegans/genética , Caenorhabditis elegans/crescimento & desenvolvimento , Morte Celular/genética , Genótipo , Mutação , Fenótipo
5.
Genetics ; 191(3): 781-90, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22505627

RESUMO

Identifying transcription factor (TF) binding sites is essential for understanding regulatory networks. The specificity of most TFs is currently modeled using position weight matrices (PWMs) that assume the positions within a binding site contribute independently to binding affinity for any site. Extensive, high-throughput quantitative binding assays let us examine, for the first time, the independence assumption for many TFs. We find that the specificity of most TFs is well fit with the simple PWM model, but in some cases more complex models are required. We introduce a binding energy model (BEM) that can include energy parameters for nonindependent contributions to binding affinity. We show that in most cases where a PWM is not sufficient, a BEM that includes energy parameters for adjacent dinucleotide contributions models the specificity very well. Having more accurate models of specificity greatly improves the interpretation of in vivo TF localization data, such as from chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments.


Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Fatores de Transcrição/metabolismo , Animais , Linhagem Celular , Humanos , Funções Verossimilhança , Camundongos , Modelos Biológicos , Análise Serial de Proteínas , Ligação Proteica , Especificidade por Substrato , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...