Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 8(1): 14841, 2018 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-30287904

RESUMO

Residue-residue close contact (R2R-C) data procured from three-dimensional protein-protein interaction (PPI) experiments is currently used for predicting residue-residue interaction (R2R-I) in PPI. However, due to complex physiochemical environments, R2R-I incidences, facilitated by multiple factors, are usually entangled in the source environment and masked in the acquired data. Here we present a novel method, P2K (Pattern to Knowledge), to disentangle R2R-I patterns and render much succinct discriminative information expressed in different specific R2R-I statistical/functional spaces. Since such knowledge is not visible in the data acquired, we refer to it as deep knowledge. Leveraging the deep knowledge discovered to construct machine learning models for sequence-based R2R-I prediction, without trial-and-error combination of the features over external knowledge of sequences, our R2R-I predictor was validated for its effectiveness under stringent leave-one-complex-out-alone cross-validation in a benchmark dataset, and was surprisingly demonstrated to perform better than an existing sequence-based R2R-I predictor by 28% (p: 1.9E-08). P2K is accessible via our web server on https://p2k.uwaterloo.ca .

2.
Artigo em Inglês | MEDLINE | ID: mdl-26336137

RESUMO

Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and gene regulation. Limited by expensive experiments, it is promising to discover them with variations directly from sequence data. Although existing computational methods have produced satisfactory results, they are one-to-one mappings with no site-specific information on residue/nucleotide variations, where these variations in binding cores may impact binding specificity. This study presents a new representation for modeling binding cores by incorporating variations and an algorithm to discover them from only sequence data. Our algorithm takes protein and DNA sequences from TRANSFAC (a Protein-DNA Binding Database) as input; discovers from both sets of sequences conserved regions in Aligned Pattern Clusters (APCs); associates them as Protein-DNA Co-Occurring APCs; ranks the Protein-DNA Co-Occurring APCs according to their co-occurrence, and among the top ones, finds three-dimensional structures to support each binding core candidate. If successful, candidates are verified as binding cores. Otherwise, homology modeling is applied to their close matches in PDB to attain new chemically feasible binding cores. Our algorithm obtains binding cores with higher precision and much faster runtime ( ≥ 1,600x) than that of its contemporaries, discovering candidates that do not co-occur as one-to-one associated patterns in the raw data. AVAILABILITY: http://www.pami.uwaterloo.ca/~ealee/files/tcbbPnDna2015/Release.zip.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , DNA/química , Alinhamento de Sequência/métodos , Algoritmos , DNA/análise , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Mineração de Dados , Ligação Proteica , Análise de Sequência de DNA , Análise de Sequência de Proteína
3.
Artigo em Inglês | MEDLINE | ID: mdl-26357085

RESUMO

Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and for the deep understanding of gene regulation. Traditionally, binding cores are identified in resolved high-resolution 3D structures. However, it is expensive, labor-intensive and time-consuming to obtain these structures. Hence, it is promising to discover binding cores computationally on a large scale. Previous studies successfully applied association rule mining to discover binding cores from TF-TFBS binding sequence data only. Despite the successful results, there are limitations such as the use of tight support and confidence thresholds, the distortion by statistical bias in counting pattern occurrences, and the lack of a unified scheme to rank TF-TFBS associated patterns. In this study, we proposed an association rule mining algorithm incorporating statistical measures and ranking to address these limitations. Experimental results demonstrated that, even when the threshold on support was lowered to one-tenth of the value used in previous studies, a satisfactory verification ratio was consistently observed under different confidence levels. Moreover, we proposed a novel ranking scheme for TF-TFBS associated patterns based on p-values and co-support values. By comparing with other discovery approaches, the effectiveness of our algorithm was demonstrated. Eighty-four binding cores with PDB support are uniquely identified.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , DNA/química , Modelos Estatísticos , Algoritmos , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Mineração de Dados , Ligação Proteica
4.
BMC Bioinformatics ; 15 Suppl 12: S2, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25474736

RESUMO

BACKGROUND: The large influx of biological sequences poses the importance of identifying and correlating conserved regions in homologous sequences to acquire valuable biological knowledge. These conserved regions contain statistically significant residue associations as sequence patterns. Thus, patterns from two conserved regions co-occurring frequently on the same sequences are inferred to have joint functionality. A method for finding conserved regions in protein families with frequent co-occurrence patterns is proposed. The biological significance of the discovered clusters of conserved regions with co-occurrences patterns can be validated by their three-dimensional closeness of amino acids and the biological functionality found in those regions as supported by published work. METHODS: Using existing algorithms, we discovered statistically significant amino acid associations as sequence patterns. We then aligned and clustered them into Aligned Pattern Clusters (APCs) corresponding to conserved regions with amino acid conservation and variation. When one APC frequently co-occurred with another APC, the two APCs have high co-occurrence. We then clustered APCs with high co-occurrence into what we refer to as Co-occurrence APC Clusters (Co-occurrence Clusters). RESULTS: Our results show that for Co-occurrence Clusters, the three-dimensional distance between their amino acids is closer than average amino acid distances. For the Co-occurrence Clusters of the ubiquitin and the cytochrome c families, we observed biological significance among the residing amino acids of the APCs within the same cluster. In ubiquitin, the residues are responsible for ubiquitination as well as conventional and unconventional ubiquitin-bindings. In cytochrome c, amino acids in the first co-occurrence cluster contribute to binding of other proteins in the electron transport chain, and amino acids in the second co-occurrence cluster contribute to the stability of the axial heme ligand. CONCLUSIONS: Thus, our co-occurrence clustering algorithm can efficiently find and rank conserved regions that contain patterns that frequently co-occurring on the same proteins. Co-occurring patterns are biologically significant due to their three-dimensional closeness and other evidences reported in literature. These results play an important role in drug discovery as biologists can quickly identify the target for drugs to conduct detailed preclinical studies.


Assuntos
Algoritmos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Aminoácidos/química , Análise por Conglomerados , Citocromos c/química , Conformação Proteica , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência , Ubiquitina/química
5.
Artigo em Inglês | MEDLINE | ID: mdl-24091402

RESUMO

Understanding protein-DNA interactions, specifically transcription factor (TF) and transcription factor binding site (TFBS) bindings, is crucial in deciphering gene regulation. The recent associated TF-TFBS pattern discovery combines one-sided motif discovery on both the TF and the TFBS sides. Using sequences only, it identifies the short protein-DNA binding cores available only in high-resolution 3D structures. The discovered patterns lead to promising subtype and disease analysis applications. While the related studies use either association rule mining or existing TFBS annotations, none has proposed any formal unified (both-sided) model to prioritize the top verifiable associated patterns. We propose the unified scores and develop an effective pipeline for associated TF-TFBS pattern discovery. Our stringent instance-level evaluations show that the patterns with the top unified scores match with the binding cores in 3D structures considerably better than the previous works, where up to 90 percent of the top 20 scored patterns are verified. We also introduce extended verification from literature surveys, where the high unified scores correspond to even higher verification percentage. The top scored patterns are confirmed to match the known WRKY binding cores with no available 3D structures and agree well with the top binding affinities of in vivo experiments.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , DNA/química , Fatores de Transcrição/química , Algoritmos , DNA/metabolismo , Bases de Dados de Proteínas , Modelos Moleculares , Ligação Proteica , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...