Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 9(1): 7703, 2019 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-31118426

RESUMO

Identifying potential protein-ligand interactions is central to the field of drug discovery as it facilitates the identification of potential novel drug leads, contributes to advancement from hits to leads, predicts potential off-target explanations for side effects of approved drugs or candidates, as well as de-orphans phenotypic hits. For the rapid identification of protein-ligand interactions, we here present a novel chemogenomics algorithm for the prediction of protein-ligand interactions using a new machine learning approach and novel class of descriptor. The algorithm applies Bayesian Additive Regression Trees (BART) on a newly proposed proteochemical space, termed the bow-pharmacological space. The space spans three distinctive sub-spaces that cover the protein space, the ligand space, and the interaction space. Thereby, the model extends the scope of classical target prediction or chemogenomic modelling that relies on one or two of these subspaces. Our model demonstrated excellent prediction power, reaching accuracies of up to 94.5-98.4% when evaluated on four human target datasets constituting enzymes, nuclear receptors, ion channels, and G-protein-coupled receptors . BART provided a reliable probabilistic description of the likelihood of interaction between proteins and ligands, which can be used in the prioritization of assays to be performed in both discovery and vigilance phases of small molecule development.


Assuntos
Desenvolvimento de Medicamentos , Ensaios de Triagem em Larga Escala/métodos , Ligantes , Modelos Químicos , Proteínas/efeitos dos fármacos , Algoritmos , Teorema de Bayes , Sítios de Ligação , Humanos , Interações Hidrofóbicas e Hidrofílicas , Aprendizado de Máquina , Modelos Moleculares , Simulação de Acoplamento Molecular , Ligação Proteica , Estatísticas não Paramétricas
2.
Bioinformatics ; 34(2): 249-257, 2018 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-28968736

RESUMO

MOTIVATION: Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and 'integrative' algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. RESULTS: We present NetProphet 2.0, a 'data light' algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. AVAILABILITY AND IMPLEMENTATION: Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Mol Cell ; 59(4): 685-97, 2015 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-26257285

RESUMO

We developed Split DamID (SpDamID), a protein complementation version of DamID, to mark genomic DNA bound in vivo by interacting or juxtapositioned transcription factors. Inactive halves of DAM (DNA adenine methyltransferase) were fused to protein pairs to be queried. Either direct interaction between proteins or proximity enabled DAM reconstitution and methylation of adenine in GATC. Inducible SpDamID was used to analyze Notch-mediated transcriptional activation. We demonstrate that Notch complexes label RBP sites broadly across the genome and show that a subset of these complexes that recruit MAML and p300 undergo changes in chromatin accessibility in response to Notch signaling. SpDamID differentiates between monomeric and dimeric binding, thereby allowing for identification of half-site motifs used by Notch dimers. Motif enrichment of Notch enhancers coupled with SpDamID reveals co-targeting of regulatory sequences by Notch and Runx1. SpDamID represents a sensitive and powerful tool that enables dynamic analysis of combinatorial protein-DNA transactions at a genome-wide level.


Assuntos
DNA/genética , Técnicas de Sonda Molecular , Receptores Notch/fisiologia , Animais , Sequência de Bases , Sítios de Ligação , Linhagem Celular , DNA/metabolismo , Elementos Facilitadores Genéticos , Camundongos Transgênicos , Dados de Sequência Molecular , Ligação Proteica
5.
Interdiscip Sci ; 7(1): 65-77, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25792441

RESUMO

Single nucleotide polymorphisms (SNPs) make up the most common form of mutations in human cytochrome P450 enzymes family, and have the potential to bring with different drug responses or specific diseases in individual patients. Here, based on machine learning technology, we aim to explore an effective set of sequence-based features for improving prediction of SNPs by using support vector machine algorithms. The features are derived from the target residues and flanking protein sequences, such as amino acid types, sequences composition, physicochemical properties, position-specific scoring matrix, phylogenetic entropy and the number of possible codons of target residues. In order to deal with the imbalance data with a majority of non-SNPs and a minority of SNPs, a preprocessing strategy based on fuzzy set theory was applied to the datasets. Our final model achieves the performance of 93.8% in sensitivity, 88.8% in specificity, 91.3% in accuracy and 0.971 of AUC value, which is significantly higher than the previous DNA sequence-based or protein sequence-based methods. Furthermore, our study also suggested the roles of individual features for prediction of SNPs. The most important features consist of the amino acid type, the number of available codons, position-specific scoring matrix and phylogenetic entropy. The improved model will be a promising tool for SNP predictions, and assist in the research of genome mutation and personalized prescriptions.


Assuntos
Sequência de Aminoácidos , Aminoácidos , Sistema Enzimático do Citocromo P-450/genética , Modelos Moleculares , Mutação , Polimorfismo de Nucleotídeo Único , Máquina de Vetores de Suporte , Área Sob a Curva , Inteligência Artificial , Sequência de Bases , Códon , Biologia Computacional , Sistema Enzimático do Citocromo P-450/química , Lógica Fuzzy , Humanos , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...