Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
BMC Med Inform Decis Mak ; 22(1): 103, 2022 04 15.
Article in English | MEDLINE | ID: mdl-35428291

ABSTRACT

BACKGROUND: Clinical data repositories (CDR) including electronic health record (EHR) data have great potential for outcome prediction and risk modeling. We built a prediction tool integrated with CDR based on pattern discovery and demonstrated a case study on contrast related acute kidney injury (AKI). METHODS: Patients undergoing cardiac catheterization from January 2015 to April 2017 were included. AKI was identified based on Acute Kidney Injury Network definition. Predictive model including 16 variables covered in existing AKI models was built. A visual analytics tool based on pattern discovery was trained on 70% data up to August 2016 with three interactive knowledge incorporation modes to develop 3 models: (1) pure data-driven, (2) domain knowledge, and (3) clinician-interactive, which were tested and compared on 30% consecutive cases dated afterwards. RESULTS: Among 2560 patients in the final dataset, 189 (7.3%) had AKI. We measured 4 existing models, whose areas under curves (AUCs) of receiver operating characteristics curve for the test dataset were 0.70 (Mehran's), 0.72 (Chen's), 0.67 (Gao's) and 0.62 (AGEF), respectively. A pure data-driven machine learning method achieves AUC of 0.72 (Easy Ensemble). The AUCs of our 3 models are 0.77, 0.80, 0.82, respectively, with the last being top where physician knowledge is incorporated. CONCLUSIONS: We developed a novel pattern-discovery-based outcome prediction tool integrated with CDR and purely using EHR data. On the case of predicting contrast related AKI, the tool showed user-friendliness by physicians, and demonstrated a competitive performance in comparison with the state-of-the-art models.


Subject(s)
Acute Kidney Injury , Acute Kidney Injury/chemically induced , Acute Kidney Injury/diagnosis , Area Under Curve , Female , Humans , Machine Learning , Male , Prognosis , ROC Curve , Retrospective Studies , Risk Factors
2.
Article in English | MEDLINE | ID: mdl-33488072

ABSTRACT

BACKGROUND: The Manchester Respiratory Activities of Daily Living Questionnaire (MRADLQ) is a valid and reliable tool measuring the functional level of patients with COPD in multidimensional aspects. However, a local validation of the questionnaire is lacking in Hong Kong. OBJECTIVE: To develop a Chinese version of MRADLQ with pictorial enhancement (C-MRADLQ) and study its reliability and validity. PATIENTS AND METHODS: A total of 238 patients suffering from COPD were recruited from nine public hospitals and five Nurse and Allied Health Respiratory Clinics by convenient sampling. A total of 64 patients with normal spirometry results and no previous clinical diagnosis of COPD were invited to complete the C-MRADLQ for comparison and examination of its validity. Ten out of 302 patients were re-assessed with the C-MRADLQ after one week by the same rater for test-retest reliability. The C-MRADLQ was correlated with spirometry result, COPD classifications and groups by Global Initiative for Chronic Obstructive Lung Disease (GOLD), the modified Medical Research Council Dyspnea Scale (mMRC Dyspnea Scale), COPD Assessment Test (CAT), Chinese Version of the Shortness of Breath Questionnaire (C-SOBQ), number of admission and the ADO index. RESULTS: The C-MRADLQ shows good test-retest reliability as indicated by an intra-class correlation coefficient value of 0.975. It is significantly correlated with COPD stage, COPD group, SOBQ score, CAT score, mMRC, ADO index, spirometry results, and number of admissions. The SOBQ score, number of admissions, FEV1/FVC, and COPD group could significantly predict the total C-MRADLQ score. A total of 67.9% of participants' mMRC levels were correctly classified by using the C-MRADLQ total score. The agreement of the original and new versions of questions 20 and 21 of C-MRADLQ was 97.3% and 90.1%, respectively. CONCLUSION: The pictorial version of the C-MRADLQ is a validated and reliable functional assessment tool to measure functional status among patients with COPD in the Chinese population.


Subject(s)
Pulmonary Disease, Chronic Obstructive , Activities of Daily Living , China , Dyspnea/diagnosis , Hong Kong , Humans , Pulmonary Disease, Chronic Obstructive/diagnosis , Reproducibility of Results , Severity of Illness Index , Surveys and Questionnaires
3.
J Healthc Eng ; 2017: 6493016, 2017.
Article in English | MEDLINE | ID: mdl-29065631

ABSTRACT

Electronic Health Record (EHR) system enables clinical decision support. In this study, a set of 112 abdominal computed tomography imaging examination reports, consisting of 59 cases of hepatocellular carcinoma (HCC) or liver metastases (so-called HCC group for simplicity) and 53 cases with no abnormality detected (NAD group), were collected from four hospitals in Hong Kong. We extracted terms related to liver cancer from the reports and mapped them to ontological features using Systematized Nomenclature of Medicine (SNOMED) Clinical Terms (CT). The primary predictor panel was formed by these ontological features. Association levels between every two features in the HCC and NAD groups were quantified using Pearson's correlation coefficient. The HCC group reveals a distinct association pattern that signifies liver cancer and provides clinical decision support for suspected cases, motivating the inclusion of new features to form the augmented predictor panel. Logistic regression analysis with stepwise forward procedure was applied to the primary and augmented predictor sets, respectively. The obtained model with the new features attained 84.7% sensitivity and 88.4% overall accuracy in distinguishing HCC from NAD cases, which were significantly improved when compared with that without the new features.


Subject(s)
Carcinoma, Hepatocellular/physiopathology , Decision Support Systems, Clinical , Electronic Health Records , Liver Neoplasms/physiopathology , Algorithms , Hong Kong , Humans , Systematized Nomenclature of Medicine , Tomography, X-Ray Computed
4.
BMC Med Inform Decis Mak ; 17(1): 47, 2017 Apr 20.
Article in English | MEDLINE | ID: mdl-28427384

ABSTRACT

BACKGROUND: Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice. METHODS: We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed. RESULTS: We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance. CONCLUSIONS: Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies.


Subject(s)
Computer Simulation , Data Mining/methods , Databases as Topic/organization & administration , Pattern Recognition, Automated/methods , Algorithms , Computer Heuristics , Forecasting , Health Information Systems/organization & administration
5.
J Biomed Inform ; 66: 161-170, 2017 02.
Article in English | MEDLINE | ID: mdl-28065840

ABSTRACT

OBJECTIVES: Major adverse cardiac events (MACE) of acute coronary syndrome (ACS) often occur suddenly resulting in high mortality and morbidity. Recently, the rapid development of electronic medical records (EMR) provides the opportunity to utilize the potential of EMR to improve the performance of MACE prediction. In this study, we present a novel data-mining based approach specialized for MACE prediction from a large volume of EMR data. METHODS: The proposed approach presents a new classification algorithm by applying both over-sampling and under-sampling on minority-class and majority-class samples, respectively, and integrating the resampling strategy into a boosting framework so that it can effectively handle imbalance of MACE of ACS patients analogous to domain practice. The method learns a new and stronger MACE prediction model each iteration from a more difficult subset of EMR data with wrongly predicted MACEs of ACS patients by a previous weak model. RESULTS: We verify the effectiveness of the proposed approach on a clinical dataset containing 2930 ACS patient samples with 268 feature types. While the imbalanced ratio does not seem extreme (25.7%), MACE prediction targets pose great challenge to traditional methods. As these methods degenerate dramatically with increasing imbalanced ratios, the performance of our approach for predicting MACE remains robust and reaches 0.672 in terms of AUC. On average, the proposed approach improves the performance of MACE prediction by 4.8%, 4.5%, 8.6% and 4.8% over the standard SVM, Adaboost, SMOTE, and the conventional GRACE risk scoring system for MACE prediction, respectively. CONCLUSIONS: We consider that the proposed iterative boosting approach has demonstrated great potential to meet the challenge of MACE prediction for ACS patients using a large volume of EMR.


Subject(s)
Acute Coronary Syndrome/diagnosis , Algorithms , Electronic Health Records , Data Mining , Databases, Factual , Humans
6.
Stud Health Technol Inform ; 245: 398-402, 2017.
Article in English | MEDLINE | ID: mdl-29295124

ABSTRACT

Clinical risk prediction of acute coronary syndrome (ACS) plays a critical role for clinical decision support, treatment management and quality of care assessment in ACS patients. Admission records contain a wealth of patient information in the early stages of hospitalization, which offers the opportunity to support the ACS risk prediction in a proactive manner. However, ACS patient risks aren't recorded in hospital admission records, thus impeding the construction of supervised risk prediction models. In our study, we propose a novel approach for ACS risk prediction, which employs a well-known ACS risk prediction model (GRACE) as the benchmark methods to stratify patient risks, and then utilizes a state-of-the-art supervised machine learning algorithm to establish our risk prediction models. The experiment was conducted with a collection of 3,643 ACS patient samples from a Chinese hospital. Our best model achieved 0.616 accuracy for risk prediction, which indicates our learned model can achieve a better performance than the benchmark GRACE model and can obtain significant improvement by mixing up patient samples that were manually labeled risks.


Subject(s)
Acute Coronary Syndrome , Algorithms , Risk Assessment , Hospitalization , Humans , Prognosis , Risk Factors
7.
Article in English | MEDLINE | ID: mdl-27649220

ABSTRACT

BACKGROUND: Clinical major adverse cardiovascular event (MACE) prediction of acute coronary syndrome (ACS) is important for a number of applications including physician decision support, quality of care assessment, and efficient healthcare service delivery on ACS patients. Admission records, as typical media to contain clinical information of patients at the early stage of their hospitalizations, provide significant potential to be explored for MACE prediction in a proactive manner. METHODS: We propose a hybrid approach for MACE prediction by utilizing a large volume of admission records. Firstly, both a rule-based medical language processing method and a machine learning method (i.e., Conditional Random Fields (CRFs)) are developed to extract essential patient features from unstructured admission records. After that, state-of-the-art supervised machine learning algorithms are applied to construct MACE prediction models from data. RESULTS: We comparatively evaluate the performance of the proposed approach on a real clinical dataset consisting of 2930 ACS patient samples collected from a Chinese hospital. Our best model achieved 72% AUC in MACE prediction. In comparison of the performance between our models and two well-known ACS risk score tools, i.e., GRACE and TIMI, our learned models obtain better performances with a significant margin. CONCLUSIONS: Experimental results reveal that our approach can obtain competitive performance in MACE prediction. The comparison of classifiers indicates the proposed approach has a competitive generality with datasets extracted by different feature extraction methods. Furthermore, our MACE prediction model obtained a significant improvement by comparison with both GRACE and TIMI. It indicates that using admission records can effectively provide MACE prediction service for ACS patients at the early stage of their hospitalizations.


Subject(s)
Acute Coronary Syndrome/epidemiology , Acute Coronary Syndrome/physiopathology , Algorithms , Health Status Indicators , Patient Admission/statistics & numerical data , Aged , Female , Humans , Male , Middle Aged , Prognosis , Risk Assessment
8.
Genome Res ; 26(4): 440-50, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26888265

ABSTRACT

Identification of functional genetic variants and elucidation of their regulatory mechanisms represent significant challenges of the post-genomic era. A poorly understood topic is the involvement of genetic variants in mediating post-transcriptional RNA processing, including alternative splicing. Thus far, little is known about the genomic, evolutionary, and regulatory features of genetically modulated alternative splicing (GMAS). Here, we systematically identified intronic tag variants for genetic modulation of alternative splicing using RNA-seq data specific to cellular compartments. Combined with our previous method that identifies exonic tags for GMAS, this study yielded 622 GMAS exons. We observed that GMAS events are highly cell type independent, indicating that splicing-altering genetic variants could have widespread function across cell types. Interestingly, GMAS genes, exons, and single-nucleotide variants (SNVs) all demonstrated positive selection or accelerated evolution in primates. We predicted that GMAS SNVs often alter binding of splicing factors, with SRSF1 affecting the most GMAS events and demonstrating global allelic binding bias. However, in contrast to their GMAS targets, the predicted splicing factors are more conserved than expected, suggesting that cis-regulatory variation is the major driving force of splicing evolution. Moreover, GMAS-related splicing factors had stronger consensus motifs than expected, consistent with their susceptibility to SNV disruption. Intriguingly, GMAS SNVs in general do not alter the strongest consensus position of the splicing factor motif, except the more than 100 GMAS SNVs in linkage disequilibrium with polymorphisms reported by genome-wide association studies. Our study reports many GMAS events and enables a better understanding of the evolutionary and regulatory features of this phenomenon.


Subject(s)
Alternative Splicing , Evolution, Molecular , Genetic Variation , Proteins/genetics , Animals , Binding Sites , Cell Line , Computational Biology/methods , Conserved Sequence , Exons , Gene Expression Regulation , Genome-Wide Association Study , Humans , Introns , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Primates/genetics , Protein Binding , Proteins/chemistry , RNA/chemistry , RNA/genetics , Regulatory Sequences, Nucleic Acid , Reproducibility of Results
9.
Mol Endocrinol ; 30(2): 254-71, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26745669

ABSTRACT

Male vertebrate social displays vary from physically simple to complex, with the latter involving exquisite motor command of the body and appendages. Studies of these displays have, in turn, provided substantial insight into neuromotor mechanisms. The neotropical golden-collared manakin (Manacus vitellinus) has been used previously as a model to investigate intricate motor skills because adult males of this species perform an acrobatic and androgen-dependent courtship display. To support this behavior, these birds express elevated levels of androgen receptors (AR) in their skeletal muscles. Here we use RNA sequencing to explore how testosterone (T) modulates the muscular transcriptome to support male manakin courtship displays. In addition, we explore how androgens influence gene expression in the muscles of the zebra finch (Taenopygia guttata), a model passerine bird with a limited courtship display and minimal muscle AR. We identify androgen-dependent, muscle-specific gene regulation in both species. In addition, we identify manakin-specific effects that are linked to muscle use during the manakin display, including androgenic regulation of genes associated with muscle fiber contractility, cellular homeostasis, and energetic efficiency. Overall, our results point to numerous genes and gene networks impacted by androgens in male birds, including some that underlie optimal muscle function necessary for performing acrobatic display routines. Manakins are excellent models to explore gene regulation promoting athletic ability.


Subject(s)
Androgens/pharmacology , Athletes , Biomedical Research , Birds/genetics , Muscle, Skeletal/metabolism , Transcriptome/drug effects , Animals , Courtship , Gene Expression Profiling , Gene Expression Regulation/drug effects , Gene Ontology , Gene Regulatory Networks/drug effects , Humans , Male , Molecular Sequence Annotation , Muscle, Skeletal/drug effects , Principal Component Analysis , RNA, Messenger/genetics , RNA, Messenger/metabolism , Receptors, Androgen/genetics , Receptors, Androgen/metabolism , Sequence Analysis, RNA , Transcriptome/genetics
10.
Article in English | MEDLINE | ID: mdl-26357085

ABSTRACT

Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and for the deep understanding of gene regulation. Traditionally, binding cores are identified in resolved high-resolution 3D structures. However, it is expensive, labor-intensive and time-consuming to obtain these structures. Hence, it is promising to discover binding cores computationally on a large scale. Previous studies successfully applied association rule mining to discover binding cores from TF-TFBS binding sequence data only. Despite the successful results, there are limitations such as the use of tight support and confidence thresholds, the distortion by statistical bias in counting pattern occurrences, and the lack of a unified scheme to rank TF-TFBS associated patterns. In this study, we proposed an association rule mining algorithm incorporating statistical measures and ranking to address these limitations. Experimental results demonstrated that, even when the threshold on support was lowered to one-tenth of the value used in previous studies, a satisfactory verification ratio was consistently observed under different confidence levels. Moreover, we proposed a novel ranking scheme for TF-TFBS associated patterns based on p-values and co-support values. By comparing with other discovery approaches, the effectiveness of our algorithm was demonstrated. Eighty-four binding cores with PDB support are uniquely identified.


Subject(s)
Binding Sites , Computational Biology/methods , DNA-Binding Proteins/chemistry , DNA/chemistry , Models, Statistical , Algorithms , DNA/metabolism , DNA-Binding Proteins/metabolism , Data Mining , Protein Binding
11.
Clin Chem ; 61(1): 221-30, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25376581

ABSTRACT

BACKGROUND: Extracellular RNAs (exRNAs) in human body fluids are emerging as effective biomarkers for detection of diseases. Saliva, as the most accessible and noninvasive body fluid, has been shown to harbor exRNA biomarkers for several human diseases. However, the entire spectrum of exRNA from saliva has not been fully characterized. METHODS: Using high-throughput RNA sequencing (RNA-Seq), we conducted an in-depth bioinformatic analysis of noncoding RNAs (ncRNAs) in human cell-free saliva (CFS) from healthy individuals, with a focus on microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and circular RNAs (circRNAs). RESULTS: Our data demonstrated robust reproducibility of miRNA and piRNA profiles across individuals. Furthermore, individual variability of these salivary RNA species was highly similar to those in other body fluids or cellular samples, despite the direct exposure of saliva to environmental impacts. By comparative analysis of >90 RNA-Seq data sets of different origins, we observed that piRNAs were surprisingly abundant in CFS compared with other body fluid or intracellular samples, with expression levels in CFS comparable to those found in embryonic stem cells and skin cells. Conversely, miRNA expression profiles in CFS were highly similar to those in serum and cerebrospinal fluid. Using a customized bioinformatics method, we identified >400 circRNAs in CFS. These data represent the first global characterization and experimental validation of circRNAs in any type of extracellular body fluid. CONCLUSIONS: Our study provides a comprehensive landscape of ncRNA species in human saliva that will facilitate further biomarker discoveries and lay a foundation for future studies related to ncRNAs in human saliva.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , MicroRNAs/analysis , RNA, Small Interfering/analysis , RNA/analysis , Saliva/chemistry , Sequence Analysis, RNA/methods , Base Sequence , Biomarkers/analysis , Humans , MicroRNAs/blood , MicroRNAs/cerebrospinal fluid , MicroRNAs/genetics , Molecular Sequence Data , RNA/blood , RNA/cerebrospinal fluid , RNA/genetics , RNA, Circular , RNA, Small Interfering/blood , RNA, Small Interfering/cerebrospinal fluid , RNA, Small Interfering/genetics , Reproducibility of Results , Sensitivity and Specificity
12.
Article in English | MEDLINE | ID: mdl-24091402

ABSTRACT

Understanding protein-DNA interactions, specifically transcription factor (TF) and transcription factor binding site (TFBS) bindings, is crucial in deciphering gene regulation. The recent associated TF-TFBS pattern discovery combines one-sided motif discovery on both the TF and the TFBS sides. Using sequences only, it identifies the short protein-DNA binding cores available only in high-resolution 3D structures. The discovered patterns lead to promising subtype and disease analysis applications. While the related studies use either association rule mining or existing TFBS annotations, none has proposed any formal unified (both-sided) model to prioritize the top verifiable associated patterns. We propose the unified scores and develop an effective pipeline for associated TF-TFBS pattern discovery. Our stringent instance-level evaluations show that the patterns with the top unified scores match with the binding cores in 3D structures considerably better than the previous works, where up to 90 percent of the top 20 scored patterns are verified. We also introduce extended verification from literature surveys, where the high unified scores correspond to even higher verification percentage. The top scored patterns are confirmed to match the known WRKY binding cores with no available 3D structures and agree well with the top binding affinities of in vivo experiments.


Subject(s)
Binding Sites , Computational Biology/methods , DNA/chemistry , Transcription Factors/chemistry , Algorithms , DNA/metabolism , Databases, Protein , Models, Molecular , Protein Binding , Transcription Factors/metabolism
13.
Nucleic Acids Res ; 41(16): e153, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23814189

ABSTRACT

Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k=8∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.


Subject(s)
DNA-Binding Proteins/metabolism , DNA/chemistry , Protein Array Analysis , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Algorithms , Animals , Binding Sites , DNA/metabolism , Markov Chains , Mice , Nucleotide Motifs
14.
BMC Bioinformatics ; 14: 198, 2013 Jun 19.
Article in English | MEDLINE | ID: mdl-23777239

ABSTRACT

BACKGROUND: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0

Subject(s)
Gene Expression Regulation , Logistic Models , Neoplasms/classification , Neoplasms/genetics , Algorithms , Genetic Markers , Humans , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis/methods
15.
Nucleic Acids Res ; 40(19): 9392-403, 2012 Oct.
Article in English | MEDLINE | ID: mdl-22904079

ABSTRACT

In protein-DNA interactions, particularly transcription factor (TF) and transcription factor binding site (TFBS) bindings, associated residue variations form patterns denoted as subtypes. Subtypes may lead to changed binding preferences, distinguish conserved from flexible binding residues and reveal novel binding mechanisms. However, subtypes must be studied in the context of core bindings. While solving 3D structures would require huge experimental efforts, recent sequence-based associated TF-TFBS pattern discovery has shown to be promising, upon which a large-scale subtype study is possible and desirable. In this article, we investigate residue-varying subtypes based on associated TF-TFBS patterns. By re-categorizing the patterns with respect to varying TF amino acids, statistically significant (P values ≤ 0.005) subtypes leading to varying TFBS patterns are discovered without using TF family or domain annotations. Resultant subtypes have various biological meanings. The subtypes reflect familial and functional properties and exhibit changed binding preferences supported by 3D structures. Conserved residues critical for maintaining TF-TFBS bindings are revealed by analyzing the subtypes. In-depth analysis on the subtype pair PKVVIL-CACGTG versus PKVEIL-CAGCTG shows the V/E variation is indicative for distinguishing Myc from MRF families. Discovered from sequences only, the TF-TFBS subtypes are informative and promising for more biological findings, complementing and extending recent one-sided subtype and familial studies with comprehensive evidence.


Subject(s)
DNA/chemistry , Transcription Factors/chemistry , Transcription Factors/classification , Binding Sites , Chromatin Immunoprecipitation , DNA/metabolism , Databases, Protein , Models, Molecular , Nucleotide Motifs , Position-Specific Scoring Matrices , Protein Binding , Sequence Analysis, DNA , Transcription Factors/metabolism
16.
J Proteomics ; 75(15): 4833-43, 2012 Aug 03.
Article in English | MEDLINE | ID: mdl-22677112

ABSTRACT

Hepatocellular carcinoma (HCC) is a global public health problem which causes approximately 500,000 deaths annually. Considering that the limited therapeutic options for HCC, novel therapeutic targets and drugs are urgently needed. In this study, we discovered that 1,3,5-trihydroxy-13,13-dimethyl-2H-pyran [7,6-b] xanthone (TDP), isolated from the traditional Chinese medicinal herb, Garcinia oblongifolia, effectively inhibited cell growth and induced the caspase-dependent mitochondrial apoptosis in HCC. A two-dimensional gel electrophoresis and mass spectrometry-based comparative proteomics were performed to find the molecular targets of TDP in HCC cells. Eighteen proteins were identified as differently expressed, with Hsp27 protein being one of the most significantly down-regulated proteins induced by TDP. In addition, the following gain- and loss-of-function studies indicated that Hsp27 mediates mitochondrial apoptosis induced by TDP. Furthermore, a nude mice model also demonstrated the suppressive effect of TDP on HCC. Our study suggests that TDP plays apoptosis-inducing roles by strongly suppressing the Hsp27 expression that is specifically associated with the mitochondrial death of the caspase-dependent pathway. In conclusion, TDP may be a potential anti-cancer drug candidate, especially to cancers with an abnormally high expression of Hsp27.


Subject(s)
Antineoplastic Agents/pharmacology , Apoptosis/drug effects , Gene Expression Regulation, Neoplastic/drug effects , HSP27 Heat-Shock Proteins/biosynthesis , Liver Neoplasms/metabolism , Mitochondria, Liver/metabolism , Neoplasm Proteins/biosynthesis , Xanthones/pharmacology , Animals , Antineoplastic Agents/chemistry , Caspases/metabolism , Female , Garcinia/chemistry , Heat-Shock Proteins , Hep G2 Cells , Humans , Liver Neoplasms/drug therapy , Liver Neoplasms/pathology , Male , Mice , Mice, Nude , Mitochondria, Liver/pathology , Molecular Chaperones , Neoplasm Transplantation , Proteomics/methods , Transplantation, Heterologous , Xanthones/chemistry , Xenograft Model Antitumor Assays/methods
17.
Apoptosis ; 17(8): 842-51, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22610480

ABSTRACT

Gamboge is a traditional Chinese medicine and our previous study showed that gambogic acid and gambogenic acid suppress the proliferation of HCC cells. In the present study, another active component, 1,3,6,7-tetrahydroxyxanthone (TTA), was identified to effectively suppress HCC cell growth. In addition, our Hoechst-PI staining and flow cytometry analyses indicated that TTA induced apoptosis in HCC cells. In order to identify the targets of TTA in HCC cells, a two-dimensional gel electrophoresis was performed, and proteins in different expressions were identified by MALDA-TOF MS and MS/MS analyses. In summary, eighteen proteins with different expressions were identified in which twelve were up-regulated and six were down-regulated. Among them, the four most distinctively expressed proteins were further studied and validated by western blotting. The ß-tubulin and translationally controlled tumor protein were decreased while the 14-3-3σ and P16 protein expressions were up-regulated. In addition, TTA suppressed tumorigenesis partially through P16-pRb signaling. 14-3-3σ silence reversed the suppressive effect of cell growth and apoptosis induced by introducing TTA. In conclusion, TTA effectively suppressed cell growth through, at least partially, up-regulation of P16 and 14-3-3σ.


Subject(s)
Antineoplastic Agents, Phytogenic/pharmacology , Apoptosis/drug effects , Carcinoma, Hepatocellular/drug therapy , Drugs, Chinese Herbal/pharmacology , Liver Neoplasms/drug therapy , Proteome/metabolism , Xanthones/pharmacology , 14-3-3 Proteins/genetics , 14-3-3 Proteins/metabolism , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Carcinoma, Hepatocellular/metabolism , Carcinoma, Hepatocellular/pathology , Cell Line, Tumor , Cell Proliferation/drug effects , Cell Survival/drug effects , Cyclin-Dependent Kinase Inhibitor p16/genetics , Cyclin-Dependent Kinase Inhibitor p16/metabolism , Exonucleases/genetics , Exonucleases/metabolism , Exoribonucleases , Garcinia/chemistry , Gene Expression/drug effects , Gene Knockdown Techniques , Humans , Liver Neoplasms/metabolism , Liver Neoplasms/pathology , Proteome/genetics , Proteomics , RNA Interference , Signal Transduction
18.
Bioinformatics ; 27(4): 471-8, 2011 Feb 15.
Article in English | MEDLINE | ID: mdl-21193520

ABSTRACT

MOTIVATION: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations. RESULTS: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules.


Subject(s)
Algorithms , DNA-Binding Proteins/genetics , DNA/genetics , Transcription Factors/genetics , Base Sequence , Binding Sites , Computational Biology/methods , DNA/metabolism , DNA-Binding Proteins/metabolism , Gene Expression Regulation , Protein Binding , Protein Structure, Tertiary , Transcription Factors/metabolism
19.
Article in English | MEDLINE | ID: mdl-21030733

ABSTRACT

Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is an NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then employs an effective greedy refinement to search for optimal motifs from the candidate motifs. The refinement is fast, and it changes the number of motif instances based on the adaptive thresholds. The performance of CRMD is further enhanced if the problem has one occurrence of motif instance per sequence. Using an appropriate similarity test of motifs, CRMD is also able to find multiple motifs. CRMD has been tested extensively on synthetic and real data sets. The experimental results verify that CRMD usually outperforms four other state-of-the-art algorithms in terms of the qualities of the solutions with competitive computing time. It finds a good balance between finding true motif instances and screening false motif instances, and is robust on problems of various levels of difficulty.


Subject(s)
Algorithms , Computational Biology/methods , Regulatory Elements, Transcriptional , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Binding Sites , Cluster Analysis , Entropy
20.
Nucleic Acids Res ; 38(19): 6324-37, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20529874

ABSTRACT

Protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein-DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF-TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF-TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF-TFBS bindings.


Subject(s)
DNA-Binding Proteins/chemistry , DNA/chemistry , Data Mining/methods , Regulatory Elements, Transcriptional , Sequence Analysis, DNA , Transcription Factors/chemistry , Algorithms , Binding Sites , DNA/metabolism , DNA-Binding Proteins/metabolism , Databases, Genetic , Structural Homology, Protein , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...