Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 22(1): 234, 2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-33962556

RESUMO

BACKGROUND: Cis-regulatory elements (CREs) are DNA sequence segments that regulate gene expression. Among CREs are promoters, enhancers, Boundary Elements (BEs) and Polycomb Response Elements (PREs), all of which are enriched in specific sequence motifs that form particular occurrence landscapes. We have recently introduced a hierarchical machine learning approach (SVM-MOCCA) in which Support Vector Machines (SVMs) are applied on the level of individual motif occurrences, modelling local sequence composition, and then combined for the prediction of whole regulatory elements. We used SVM-MOCCA to predict PREs in Drosophila and found that it was superior to other methods. However, we did not publish a polished implementation of SVM-MOCCA, which can be useful for other researchers, and we only tested SVM-MOCCA with IUPAC motifs and PREs. RESULTS: We here present an expanded suite for modelling CRE sequences in terms of motif occurrence combinatorics-Motif Occurrence Combinatorics Classification Algorithms (MOCCA). MOCCA contains efficient implementations of several modelling methods, including SVM-MOCCA, and a new method, RF-MOCCA, a Random Forest-derivative of SVM-MOCCA. We used SVM-MOCCA and RF-MOCCA to model Drosophila PREs and BEs in cross-validation experiments, making this the first study to model PREs with Random Forests and the first study that applies the hierarchical MOCCA approach to the prediction of BEs. Both models significantly improve generalization to PREs and boundary elements beyond that of previous methods-including 4-spectrum and motif occurrence frequency Support Vector Machines and Random Forests-, with RF-MOCCA yielding the best results. CONCLUSION: MOCCA is a flexible and powerful suite of tools for the motif-based modelling of CRE sequences in terms of motif composition. MOCCA can be applied to any new CRE modelling problems where motifs have been identified. MOCCA supports IUPAC and Position Weight Matrix (PWM) motifs. For ease of use, MOCCA implements generation of negative training data, and additionally a mode that requires only that the user specifies positives, motifs and a genome. MOCCA is licensed under the MIT license and is available on Github at https://github.com/bjornbredesen/MOCCA .


Assuntos
Algoritmos , Máquina de Vetores de Suporte , Sequência de Bases , Motivos de Nucleotídeos/genética , Matrizes de Pontuação de Posição Específica
2.
Nucleic Acids Res ; 47(15): 7781-7797, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31340029

RESUMO

Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.


Assuntos
Algoritmos , DNA/genética , Drosophila melanogaster/genética , Genoma de Inseto , Proteínas do Grupo Polycomb/genética , Elementos de Resposta , Animais , Sítios de Ligação , Cromatina/química , Cromatina/metabolismo , Proteínas Cromossômicas não Histona/genética , Proteínas Cromossômicas não Histona/metabolismo , DNA/química , DNA/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Embrião não Mamífero , Ontologia Genética , Histonas/genética , Histonas/metabolismo , Larva/genética , Larva/metabolismo , Anotação de Sequência Molecular , Motivos de Nucleotídeos , Proteínas do Grupo Polycomb/metabolismo , Ligação Proteica , Software , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...