Search | VHL Regional Portal

Discovering Protein-DNA Binding Cores by Aligned Pattern Clustering.

Lee, En-Shiun Annie; Sze-To, Ho-Yin Antonio; Wong, Man-Hon; Leung, Kwong-Sak; Lau, Terrence Chi-Kong; Wong, Andrew K C.

IEEE/ACM Trans Comput Biol Bioinform ; 14(2): 254-263, 2017.

Article in English | MEDLINE | ID: mdl-26336137

ABSTRACT

Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and gene regulation. Limited by expensive experiments, it is promising to discover them with variations directly from sequence data. Although existing computational methods have produced satisfactory results, they are one-to-one mappings with no site-specific information on residue/nucleotide variations, where these variations in binding cores may impact binding specificity. This study presents a new representation for modeling binding cores by incorporating variations and an algorithm to discover them from only sequence data. Our algorithm takes protein and DNA sequences from TRANSFAC (a Protein-DNA Binding Database) as input; discovers from both sets of sequences conserved regions in Aligned Pattern Clusters (APCs); associates them as Protein-DNA Co-Occurring APCs; ranks the Protein-DNA Co-Occurring APCs according to their co-occurrence, and among the top ones, finds three-dimensional structures to support each binding core candidate. If successful, candidates are verified as binding cores. Otherwise, homology modeling is applied to their close matches in PDB to attain new chemically feasible binding cores. Our algorithm obtains binding cores with higher precision and much faster runtime ( ≥ 1,600x) than that of its contemporaries, discovering candidates that do not co-occur as one-to-one associated patterns in the raw data. AVAILABILITY: http://www.pami.uwaterloo.ca/~ealee/files/tcbbPnDna2015/Release.zip.

Subject(s)

Cluster Analysis , Computational Biology/methods , DNA-Binding Proteins/chemistry , DNA/chemistry , Sequence Alignment/methods , Algorithms , DNA/analysis , DNA/genetics , DNA/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Data Mining , Protein Binding , Sequence Analysis, DNA , Sequence Analysis, Protein

Discovering Binding Cores in Protein-DNA Binding Using Association Rule Mining with Statistical Measures.

Wong, Man-Hon; Sze-To, Ho-Yin Antonio; Lo, Leung-Yau Peter; Chan, Tak-Ming Cyrus; Leung, Kwong-Sak.

IEEE/ACM Trans Comput Biol Bioinform ; 12(1): 142-54, 2015.

Article in English | MEDLINE | ID: mdl-26357085

ABSTRACT

Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and for the deep understanding of gene regulation. Traditionally, binding cores are identified in resolved high-resolution 3D structures. However, it is expensive, labor-intensive and time-consuming to obtain these structures. Hence, it is promising to discover binding cores computationally on a large scale. Previous studies successfully applied association rule mining to discover binding cores from TF-TFBS binding sequence data only. Despite the successful results, there are limitations such as the use of tight support and confidence thresholds, the distortion by statistical bias in counting pattern occurrences, and the lack of a unified scheme to rank TF-TFBS associated patterns. In this study, we proposed an association rule mining algorithm incorporating statistical measures and ranking to address these limitations. Experimental results demonstrated that, even when the threshold on support was lowered to one-tenth of the value used in previous studies, a satisfactory verification ratio was consistently observed under different confidence levels. Moreover, we proposed a novel ranking scheme for TF-TFBS associated patterns based on p-values and co-support values. By comparing with other discovery approaches, the effectiveness of our algorithm was demonstrated. Eighty-four binding cores with PDB support are uniquely identified.

Subject(s)

Binding Sites , Computational Biology/methods , DNA-Binding Proteins/chemistry , DNA/chemistry , Models, Statistical , Algorithms , DNA/metabolism , DNA-Binding Proteins/metabolism , Data Mining , Protein Binding

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL