Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Artif Intell Med ; 113: 102033, 2021 03.
Article in English | MEDLINE | ID: mdl-33685589

ABSTRACT

Sentiments associated with assessments and observations recorded in a clinical narrative can often indicate a patient's health status. To perform sentiment analysis on clinical narratives, domain-specific knowledge concerning meanings of medical terms is required. In this study, semantic types in the Unified Medical Language System (UMLS) are exploited to improve lexicon-based sentiment classification methods. For sentiment classification using SentiWordNet, the overall accuracy is improved from 0.582 to 0.710 by using logistic regression to determine appropriate polarity scores for UMLS 'Disorders' semantic types. For sentiment classification using a trained lexicon, when disorder terms in a training set are replaced with their semantic types, classification accuracies are improved on some data segments containing specific semantic types. To select an appropriate classification method for a given data segment, classifier combination is proposed. Using classifier combination, classification accuracies are improved on most data segments, with the overall accuracy of 0.882 being obtained.


Subject(s)
Semantics , Unified Medical Language System , Humans
2.
Curr Pharm Des ; 25(10): 1134-1143, 2019.
Article in English | MEDLINE | ID: mdl-31038058

ABSTRACT

BACKGROUND: Post-marketing pharmaceutical surveillance, a.k.a. pragmatic clinical trials (i.e., PCT), plays a vital role in preventing accidents in practical treatment. The most important and difficult task in PCT is to assess which drug causes adverse reactions (i.e., ADRs) from clinical texts. The confounding (i.e., factors cause confusions in causality assessment) is generated by the polypharmacy (i.e., multiple drugs use), which makes most of existing methods poor for detecting drugs that capably cause observed ADRs. OBJECTIVE: We aim to improve the performance of detecting drug-ADR causal relations from clinical texts. To this end, a mechanism for reducing the impact of confounding on the detecting process is needful. METHODS: We proposed a novel model which is called the analogy-based active voting (i.e., AAV) for improving the ability of detecting causal drug-ADR pairs, in case multiple drugs are prescribed for treating the comorbidity. This model is inspired by the analogy principle which was proposed by Bradford Hill. RESULTS: The experimental results show the improvement of recognizing causal relations between drugs and ADRs that are confirmed by the SIDER. In addition, the proposed model is promising to detect infrequently observed causal drug-ADR pairs when the drug is not commonly used. CONCLUSION: The proposed model demonstrates its ability for controlling the polypharmacy-induced confounding, to improve the quality of causality assessment of ADRs. Additionally, this also shows that the analogy principle is applicable for the assessment.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Polypharmacy , Causality , Comorbidity , Humans , Product Surveillance, Postmarketing
3.
J Healthc Inform Res ; 3(2): 220-244, 2019 Jun.
Article in English | MEDLINE | ID: mdl-35415423

ABSTRACT

The existence of massive quantity of clinical text in electronic medical records (EMRs) has created significant demand for clinical text processing and information extraction in the field of health care and medical research. Detailed clinical observations of patients are typically recorded chronologically. Temporal information in such clinical texts consist of three elements: temporal expressions, temporal events, and temporal relations. Due to the implicit expression of temporal information, lack of writing quality, and domain-specific nature in the clinical text, extraction of temporal information is much more complex than for newswire texts. In spite of these difficulties, to extract temporal information using the annotated corpora, few research works reported rule-based, machine-learning, and hybrid methods. On the other hand, creating the annotated corpora is expensive, time-consuming, and demands significant human effort; the processing quality is inevitably affected by the small size of corpora. Motivated by this issue, in this research work, we present a novel method to effectively extract the temporal information from EMR clinical texts. The essential idea of this method is first to build a feature set appropriately for clinical expressions, followed by the development of a semi-supervised framework for temporal event extraction, and finally detection of temporal relations among events with a newly formulated hypothesis. Comparative experimental evaluation on the I2B2 data set has clearly shown improved performance of the proposed methods. Specifically, temporal event and relation extraction is possible with an F-measure 89.98 and 67.1% respectively.

4.
J Med Internet Res ; 18(12): e323, 2016 12 16.
Article in English | MEDLINE | ID: mdl-27986644

ABSTRACT

BACKGROUND: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. OBJECTIVE: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. METHODS: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. RESULTS: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. CONCLUSIONS: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community.


Subject(s)
Biomedical Research/methods , Data Interpretation, Statistical , Machine Learning , Biomedical Research/standards , Humans , Interdisciplinary Studies , Models, Biological
5.
Curr Pharm Des ; 22(23): 3498-526, 2016.
Article in English | MEDLINE | ID: mdl-27157416

ABSTRACT

BACKGROUND: Many factors that directly or indirectly cause adverse drug reaction (ADRs) varying from pharmacological, immunological and genetic factors to ethnic, age, gender, social factors as well as drug and disease related ones. On the other hand, advanced methods of statistics, machine learning and data mining allow the users to more effectively analyze the data for descriptive and predictive purposes. The fast changes in this field make it difficult to follow the research progress and context on ADR detection and prediction. METHODS: A large amount of articles on ADRs in the last twenty years is collected. These articles are grouped by recent data types used to study ADRs: omics, social media and electronic medical records (EMRs), and reviewed in terms of the problem addressed, the datasets used and methods. RESULTS: Corresponding three tables are established providing brief information on the research for ADRs detection and prediction. CONCLUSION: The data-driven approach has shown to be powerful in ADRs detection and prediction. The review helps researchers and pharmacists to have a quick overview on the current status of ADRs detection and prediction.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Data Mining , Electronic Health Records , Female , Humans , Male
6.
BMC Bioinformatics ; 16: 80, 2015 Mar 13.
Article in English | MEDLINE | ID: mdl-25888201

ABSTRACT

BACKGROUND: Short interfering RNAs (siRNAs) can knockdown target genes and thus have an immense impact on biology and pharmacy research. The key question of which siRNAs have high knockdown ability in siRNA research remains challenging as current known results are still far from expectation. RESULTS: This work aims to develop a generic framework to enhance siRNA knockdown efficacy prediction. The key idea is first to enrich siRNA sequences by incorporating them with rules found for designing effective siRNAs and representing them as enriched matrices, then to employ the bilinear tensor regression to predict knockdown efficacy of those matrices. Experiments show that the proposed method achieves better results than existing models in most cases. CONCLUSIONS: Our model not only provides a suitable siRNA representation but also can predict siRNA efficacy more accurate and stable than most of state-of-the-art models. Source codes are freely available on the web at: http://www.jaist.ac.jp/\~bao/BiLTR/ .


Subject(s)
Algorithms , RNA, Small Interfering/genetics , Regression Analysis , Sequence Analysis, RNA/methods , Computer Simulation , Humans , RNA Interference
7.
J Chem Phys ; 140(22): 225101, 2014 Jun 14.
Article in English | MEDLINE | ID: mdl-24929413

ABSTRACT

Antarctic bacterium antifreeze proteins (AFPs) protect and support the survival of cold-adapted organisms by binding and inhibiting the growth of ice crystals. The mechanism of the anti-freezing process in a water environment at low temperature of Antarctic bacterium AFPs remains unclear. In this research, we study the effects of Antarctic bacterium AFPs by coarse grained simulations solution at a temperature range from 262 to 273 K. The results indicated that Antarctic bacterium AFPs were fully active in temperatures greater than 265 K. Additionally, the specific temperature ranges at which the water molecules become completely frozen, partially frozen, and not frozen were identified.


Subject(s)
Antifreeze Proteins/chemistry , Crystallization , Ice , Bacteria/chemistry , Bacterial Physiological Phenomena , Freezing , Water/chemistry
8.
J Chem Phys ; 140(4): 044101, 2014 Jan 28.
Article in English | MEDLINE | ID: mdl-25669499

ABSTRACT

We develop a method that combines data mining and first principles calculation to guide the designing of distorted cubane Mn(4+)Mn3(3+) single molecule magnets. The essential idea of the method is a process consisting of sparse regressions and cross-validation for analyzing calculated data of the materials. The method allows us to demonstrate that the exchange coupling between Mn(4+) and Mn(3+) ions can be predicted from the electronegativities of constituent ligands and the structural features of the molecule by a linear regression model with high accuracy. The relations between the structural features and magnetic properties of the materials are quantitatively and consistently evaluated and presented by a graph. We also discuss the properties of the materials and guide the material design basing on the obtained results.

9.
Article in English | MEDLINE | ID: mdl-22848139

ABSTRACT

UNLABELLED: Eukaryotic gene transcription is a complex process, which requires the orchestrated recruitment of a large number of proteins, such as sequence-specific DNA binding factors, chromatin remodelers and modifiers, and general transcription machinery, to regulatory regions. Previous works have shown that these regulatory proteins favor specific organizational theme along promoters. Details about how they cooperatively regulate transcriptional process, however, remain unclear. We developed a method to reconstruct a Bayesian network (BN) model representing functional relationships among various transcriptional components. The positive/negative influence between these components was measured from protein binding and nucleosome occupancy data and embedded into the model. Application on S.cerevisiae ChIP-Chip data showed that the proposed method can recover confirmed relationships, such as Isw1-Pol II, TFIIH-Pol II, TFIIB-TBP, Pol II-H3K36Me3, H3K4Me3-H3K14Ac, etc. Moreover, it can distinguish colocating components from functionally related ones. Novel relationships, e.g., ones between Mediator and chromatin remodeling complexes (CRCs), and the combinatorial regulation of Pol II recruitment and activity by CRCs and general transcription factors (GTFs), were also suggested. CONCLUSION: protein binding events during transcription positively influence each other. Among contributing components, GTFs and CRCs play pivotal roles in transcriptional regulation. These findings provide insights into the regulatory mechanism.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks , Models, Genetic , Transcription Factors/genetics , Transcription, Genetic , Bayes Theorem , Chromatin Assembly and Disassembly/genetics , Chromatin Immunoprecipitation , Histones/chemistry , Histones/genetics , Histones/metabolism , Oligonucleotide Array Sequence Analysis , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics
10.
Artif Intell Med ; 54(1): 63-71, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22000346

ABSTRACT

OBJECTIVE: Predicting or prioritizing the human genes that cause disease, or "disease genes", is one of the emerging tasks in biomedicine informatics. Research on network-based approach to this problem is carried out upon the key assumption of "the network-neighbour of a disease gene is likely to cause the same or a similar disease", and mostly employs data regarding well-known disease genes, using supervised learning methods. This work aims to find an effective method to exploit the disease gene neighbourhood and the integration of several useful omics data sources, which potentially enhance disease gene predictions. METHODS: We have presented a novel method to effectively predict disease genes by exploiting, in the semi-supervised learning (SSL) scheme, data regarding both disease genes and disease gene neighbours via protein-protein interaction network. Multiple proteomic and genomic data were integrated from six biological databases, including Universal Protein Resource, Interologous Interaction Database, Reactome, Gene Ontology, Pfam, and InterDom, and a gene expression dataset. RESULTS: By employing a 10 times stratified 10-fold cross validation, the SSL method performs better than the k-nearest neighbour method and the support vector machines method in terms of sensitivity of 85%, specificity of 79%, precision of 81%, accuracy of 82%, and a balanced F-function of 83%. The other comparative experimental evaluations demonstrate advantages of the proposed method given a small amount of labeled data with accuracy of 78%. We have applied the proposed method to detect 572 putative disease genes, which are biologically validated by some indirect ways. CONCLUSION: Semi-supervised learning improved ability to study disease genes, especially a specific disease when the known disease genes (as labeled data) are very often limited. In addition to the computational improvement, the analysis of predicted disease proteins indicates that the findings are beneficial in deciphering the pathogenic mechanisms.


Subject(s)
Artificial Intelligence , Disease/genetics , Genetic Predisposition to Disease/genetics , Probability Learning , Protein Interaction Maps/genetics , Algorithms , Databases, Nucleic Acid , Databases, Protein , Genomics , Humans , Protein Interaction Mapping/methods , Proteomics , Sensitivity and Specificity
11.
BMC Genomics ; 11 Suppl 4: S3, 2010 Dec 02.
Article in English | MEDLINE | ID: mdl-21143812

ABSTRACT

BACKGROUND: Nucleosome, the fundamental unit of chromatin, is formed by wrapping nearly 147bp of DNA around an octamer of histone proteins. This histone core has many variants that are different from each other by their biochemical compositions as well as biological functions. Although the deposition of histone variants onto chromatin has been implicated in many important biological processes, such as transcription and replication, the mechanisms of how they are deposited on target sites are still obscure. RESULTS: By analyzing genomic sequences of nucleosomes bearing different histone variants from human, including H2A.Z, H3.3 and both (H3.3/H2A.Z, so-called double variant histones), we found that genomic sequence contributes in part to determining target sites for different histone variants. Moreover, dinucleotides CA/TG are remarkably important in distinguishing target sites of H2A.Z-only nucleosomes with those of H3.3-containing (both H3.3-only and double variant) nucleosomes. CONCLUSIONS: There exists a DNA-related mechanism regulating the deposition of different histone variants onto chromatin and biological outcomes thereof. This provides additional insights into epigenetic regulatory mechanisms of many important cellular processes.


Subject(s)
Histones/chemistry , Histones/metabolism , Amino Acid Motifs/genetics , Base Sequence , Chromatin/chemistry , Chromatin/metabolism , Computational Biology , DNA Replication , Epigenomics , Genetic Variation , Histones/genetics , Humans , Nucleosomes/chemistry , Nucleosomes/genetics , Nucleosomes/metabolism , Protein Structure, Tertiary/genetics , Reproducibility of Results
12.
Bioinformation ; 4(8): 371-7, 2010 Feb 28.
Article in English | MEDLINE | ID: mdl-20975901

ABSTRACT

MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression at the post-transcriptional level. They play an important role in several biological processes such as cell development and differentiation. Similar to transcription factors (TFs), miRNAs regulate gene expression in a combinatorial fashion, i.e., an individual miRNA can regulate multiple genes, and an individual gene can be regulated by multiple miRNAs. The functions of TFs in biological regulatory networks have been well explored. And, recently, a few studies have explored miRNA functions in the context of gene regulation networks. However, how TFs and miRNAs function together in the gene regulatory network has not yet been examined. In this paper, we propose a new computational method to discover the gene regulatory modules that consist of miRNAs, TFs, and genes regulated by them. We analyzed the regulatory associations among the sets of predicted miRNAs and sets of TFs on the sets of genes regulated by them in the human genome. We found 182 gene regulatory modules of combinatorial regulation by miRNAs and TFs (miR-TF modules). By validating these modules with the Gene Ontology (GO) and the literature, it was found that our method allows us to detect functionally-correlated gene regulatory modules involved in specific biological processes. Moreover, our miR-TF modules provide a global view of coordinated regulation of target genes by miRNAs and TFs.

13.
BMC Genomics ; 10 Suppl 3: S27, 2009 Dec 03.
Article in English | MEDLINE | ID: mdl-19958491

ABSTRACT

BACKGROUND: Eukaryotic genomes are packaged into chromatin, a compact structure containing fundamental repeating units, the nucleosomes. The mobility of nucleosomes plays important roles in many DNA-related processes by regulating the accessibility of regulatory elements to biological machineries. Although it has been known that various factors, such as DNA sequences, histone modifications, and chromatin remodelling complexes, could affect nucleosome stability, the mechanisms of how they regulate this stability are still unclear. RESULTS: In this paper, we propose a novel computational method based on rule induction learning to characterize nucleosome dynamics using both genomic and histone modification information. When applied on S. cerevisiae data, our method produced totally 98 rules characterizing nucleosome dynamics on chromosome III and promoter regions. Analyzing these rules we discovered that, some DNA motifs and post-translational modifications of histone proteins play significant roles in regulating nucleosome stability. Notably, these DNA motifs are strong determinants for nucleosome forming and inhibiting potential; and these histone modifications have strong relation with transcriptional activities, i.e. activation and repression. We also found some new patterns which may reflect the cooperation between these two factors in regulating the stability of nucleosomes. CONCLUSION: DNA motifs and histone modifications can individually and, in some cases, cooperatively regulate nucleosome stability. This suggests additional insights into mechanisms by which cells control important biological processes, such as transcription, replication, and DNA repair.


Subject(s)
Epigenesis, Genetic , Genome , Genomics/methods , Nucleosomes/chemistry , Base Sequence , Histones/metabolism , Nucleosomes/genetics , Nucleosomes/metabolism , Protein Processing, Post-Translational , Saccharomyces cerevisiae/chemistry , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism
14.
BMC Bioinformatics ; 9 Suppl 12: S5, 2008 Dec 12.
Article in English | MEDLINE | ID: mdl-19091028

ABSTRACT

BACKGROUND: MicroRNAs (miRNAs) are a class of small non-coding RNA molecules (20-24 nt), which are believed to participate in repression of gene expression. They play important roles in several biological processes (e.g. cell death and cell growth). Both experimental and computational approaches have been used to determine the function of miRNAs in cellular processes. Most efforts have concentrated on identification of miRNAs and their target genes. However, understanding the regulatory mechanism of miRNAs in the gene regulatory network is also essential to the discovery of functions of miRNAs in complex cellular systems. To understand the regulatory mechanism of miRNAs in complex cellular systems, we need to identify the functional modules involved in complex interactions between miRNAs and their target genes. RESULTS: We propose a rule-based learning method to identify groups of miRNAs and target genes that are believed to participate cooperatively in the post-transcriptional gene regulation, so-called miRNA regulatory modules (MRMs). Applying our method to human genes and miRNAs, we found 79 MRMs. The MRMs are produced from multiple information sources, including miRNA-target binding information, gene expression and miRNA expression profiles. Analysis of two first MRMs shows that these MRMs consist of highly-related miRNAs and their target genes with respect to biological processes. CONCLUSION: The MRMs found by our method have high correlation in expression patterns of miRNAs as well as mRNAs. The mRNAs included in the same module shared similar biological functions, indicating the ability of our method to detect functionality-related genes. Moreover, review of the literature reveals that miRNAs in a module are involved in several types of human cancer.


Subject(s)
Computational Biology/methods , Gene Expression Regulation, Neoplastic , Gene Expression Regulation , Gene Regulatory Networks , MicroRNAs , Neoplasms/metabolism , Databases, Factual , Genome, Human , Humans , Models, Biological , Models, Genetic , Models, Statistical , Neoplasms/genetics , RNA Processing, Post-Transcriptional , Transcription, Genetic
15.
J Bioinform Comput Biol ; 6(6): 1115-32, 2008 Dec.
Article in English | MEDLINE | ID: mdl-19090020

ABSTRACT

Protein-protein interactions (PPIs) are intrinsic to almost all cellular processes. Different computational methods offer new chances to study PPIs. To predict PPIs, while the integrative methods use multiple data sources instead of a single source, the domain-based methods often use only protein domain features. Integration of both protein domain features and genomic/proteomic features from multiple databases can more effectively predict PPIs. Moreover, it allows discovering the reciprocal relationships between PPIs and biological features of their interacting partners. We developed a novel integrative domain-based method for predicting PPIs using inductive logic programming (ILP). Two principal domain features used were domain fusions and domain-domain interactions (DDIs). Various relevant features of proteins were exploited from five popular genomic and proteomic databases. By integrating these features, we constructed biologically significant ILP background knowledge of more than 278,000 ground facts. The experimental results through multiple 10-fold cross-validations demonstrated that our method predicts PPIs better than other computational methods in terms of typical performance measures. The proposed ILP framework can be applied to predict DDIs with high sensitivity and specificity. The induced ILP rules gave us many interesting, biologically reciprocal relationships among PPIs, protein domains, and PPI-related genomic/proteomic features. Supplementary material is available at (http://www.jaist.ac.jp/~s0560205/PPIandDDI/).


Subject(s)
Protein Interaction Domains and Motifs , Protein Interaction Mapping/statistics & numerical data , Algorithms , Computational Biology , Databases, Genetic , Databases, Protein , Genomics/statistics & numerical data , Proteomics/statistics & numerical data
16.
Stud Health Technol Inform ; 129(Pt 2): 1304-8, 2007.
Article in English | MEDLINE | ID: mdl-17911925

ABSTRACT

To analyze the laboratory data by data mining, user-centered universal tools have not been available in medicine. We analyzed 1,565,877 laboratory data of 771 patients with viral hepatitis in order to find the difference of the temporal changes in laboratory test data between Hepatitis B and Hepatitis C by the combination of temporal abstraction and data mining. The data for one patient is temporal for more than 5 years. After pretreatment the data was converted to abstract patterns and then selected into sets of data combination and rules to identify Hepatitis B or C by D2MS and LUPC which were originally produced by ourselves. Not only data pattern, but also temporal relations were considered as a part of the rules. In the course of evaluating the results by domain experts, even though there were not so remarkable hypotheses, visualization tools made it easier for them to understand the relations of the complicated rules.


Subject(s)
Data Display , Hepatitis B/diagnosis , Hepatitis C/diagnosis , Information Storage and Retrieval/methods , Liver Function Tests , Humans , Time Factors
17.
Genome Inform ; 17(2): 35-45, 2006.
Article in English | MEDLINE | ID: mdl-17503377

ABSTRACT

The objective of this paper is twofold. One objective is to present a method of predicting signaling domain-domain interactions (signaling DDI) using inductive logic programming (ILP), and the other is to present a method of discovering signal transduction networks (STN) using signaling DDI. The research on computational methods for discovering signal transduction networks (STN) has received much attention because of the importance of STN to transmit inter- and intra-cellular signals. Unlike previous STN works functioning at the protein/gene levels, our STN method functions at the protein domain level, on signal domain interactions, which allows discovering more reliable and stable STN. We can mostly reconstruct the STN of yeast MAPK pathways from the inferred signaling domain interactions, with coverage of 85%. For the problem of prediction of signaling DDI, we have successfully constructed a database of more than twenty four thousand ground facts from five popular genomic and proteomic databases. We also showed the advantage of ILP in signaling DDI prediction from the constructed database, with high sensitivity (88%) and accuracy (83%). Studying yeast MAPK STN, we found some new signaling domain interactions that do not exist in the well-known InterDom database. Supplementary materials are now available from http://www.jaist.ac.jp/s0560205/STP_DDI/.


Subject(s)
Protein Interaction Mapping/methods , Protein Structure, Tertiary , Signal Transduction , Algorithms , Artificial Intelligence , Computational Biology/methods , Databases, Genetic , Databases, Protein , False Positive Reactions , MAP Kinase Signaling System/physiology , Mitogen-Activated Protein Kinases/metabolism , Models, Biological , Proteome/chemistry , Proteome/genetics , Proteome/metabolism , Reproducibility of Results , Saccharomyces cerevisiae/enzymology , Saccharomyces cerevisiae/genetics , Sensitivity and Specificity
18.
Bioinformatics ; 21 Suppl 2: ii101-7, 2005 Sep 01.
Article in English | MEDLINE | ID: mdl-16204087

ABSTRACT

MOTIVATION: Even in a simple organism like yeast Saccharomyces cerevisiae, transcription is an extremely complex process. The expression of sets of genes can be turned on or off by the binding of specific transcription factors to the promoter regions of genes. Experimental and computational approaches have been proposed to establish mappings of DNA-binding locations of transcription factors. However, although location data obtained from experimental methods are noisy owing to imperfections in the measuring methods, computational approaches suffer from over-prediction problems owing to the short length of the sequence motifs bound by the transcription factors. Also, these interactions are usually environment-dependent: many regulators only bind to the promoter region of genes under specific environmental conditions. Even more, the presence of regulators at a promoter region indicates binding but not necessarily function: the regulator may act positively, negatively or not act at all. Therefore, identifying true and functional interactions between transcription factors and genes in specific environment conditions and describing the relationship between them are still open problems. RESULTS: We developed a method that combines expression data with genomic location information to discover (1) relevant transcription factors from the set of potential transcription factors of a target gene; and (2) the relationship between the expression behavior of a target gene and that of its relevant transcription factors. Our method is based on rule induction, a machine learning technique that can efficiently deal with noisy domains. When applied to genomic location data with a confidence criterion relaxed to P-value = 0.005, and three different expression datasets of yeast S.cerevisiae, we obtained a set of regulatory rules describing the relationship between the expression behavior of a specific target gene and that of its relevant transcription factors. The resulting rules provide strong evidence of true positive gene-regulator interactions, as well as of protein-protein interactions that could serve to identify transcription complexes. AVAILABILITY: Supplementary files are available from http://www.jaist.ac.jp/~h-pham/regulatory-rules


Subject(s)
Chromosome Mapping/methods , Gene Expression Profiling/methods , Gene Expression Regulation/genetics , Regulatory Sequences, Nucleic Acid/genetics , Sequence Analysis, DNA/methods , Transcription Factors/genetics , Transcription, Genetic/genetics , Base Sequence , Binding Sites , Molecular Sequence Data , Promoter Regions, Genetic/genetics , Protein Binding
19.
J Bioinform Comput Biol ; 3(2): 343-58, 2005 Apr.
Article in English | MEDLINE | ID: mdl-15852509

ABSTRACT

Tight turns have long been recognized as one of the three important features of proteins, together with alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns and most of the rest are gamma-turns. Analysis and prediction of beta-turns and gamma-turns is very useful for design of new molecules such as drugs, pesticides, and antigens. In this paper we investigated two aspects of applying support vector machine (SVM), a promising machine learning method for bioinformatics, to prediction and analysis of beta-turns and gamma-turns. First, we developed two SVM-based methods, called BTSVM and GTSVM, which predict beta-turns and gamma-turns in a protein from its sequence. When compared with other methods, BTSVM has a superior performance and GTSVM is competitive. Second, we used SVMs with a linear kernel to estimate the support of amino acids for the formation of beta-turns and gamma-turns depending on their position in a protein. Our analysis results are more comprehensive and easier to use than the previous results in designing turns in proteins.


Subject(s)
Algorithms , Amino Acids/chemistry , Artificial Intelligence , Models, Molecular , Pattern Recognition, Automated/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acid Sequence , Cluster Analysis , Computer Simulation , Models, Chemical , Molecular Sequence Data , Protein Structure, Secondary , Proteins/analysis , Proteins/classification , Sequence Alignment/methods
20.
Genome Inform ; 16(2): 3-11, 2005.
Article in English | MEDLINE | ID: mdl-16901084

ABSTRACT

Eukaryotic genomes are packaged by the wrapping of DNA around histone octamers to form nucleosomes. Nucleosome occupancy, acetylation, and methylation, which have a major impact on all nuclear processes involving DNA, have been recently mapped across the yeast genome using chromatin immunoprecipitation and DNA microarrays. However, this experimental protocol is laborious and expensive. Moreover, experimental methods often produce noisy results. In this paper, we introduce a computational approach to the qualitative prediction of nucleosome occupancy, acetylation, and methylation areas in DNA sequences. Our method uses support vector machines to discriminate between DNA areas with high and low relative occupancy, acetylation, or methylation, and rank k-gram features based on their support for these DNA modifications. Experimental results on the yeast genome reveal genetic area preferences of nucleosome occupancy, acetylation, and methylation that are consistent with previous studies. Supplementary files are available from http://www.jaist.ac.jp/~tran/nucleosome/.


Subject(s)
DNA Methylation , DNA/chemistry , Sequence Analysis, DNA , Acetylation , Computational Biology/statistics & numerical data , DNA/metabolism , Histones/metabolism , Nucleosomes/chemistry , Nucleosomes/metabolism , Predictive Value of Tests , Sequence Analysis, DNA/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL
...