Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Mol Inform ; 43(6): e202300312, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38850133

ABSTRACT

Pregnant females may use medications to manage health problems that develop during pregnancy or that they had prior to pregnancy. However, using medications during pregnancy has a potential risk to the fetus. Assessing the fetotoxicity of drugs is essential to ensure safe treatments, but the current process is challenged by ethical issues, time, and cost. Therefore, the need for in silico models to efficiently assess the fetotoxicity of drugs has recently emerged. Previous studies have proposed successful machine learning models for fetotoxicity prediction and even suggest molecular substructures that are possibly associated with fetotoxicity risks or protective effects. However, the interpretation of the decisions of the models on fetotoxicity prediction for each drug is still insufficient. This study constructed machine learning-based models that can predict the fetotoxicity of drugs while providing explanations for the decisions. For this, permutation feature importance was used to identify the general features that the model made significant in predicting the fetotoxicity of drugs. In addition, features associated with fetotoxicity for each drug were analyzed using the attention mechanism. The predictive performance of all the constructed models was significantly high (AUROC: 0.854-0.974, AUPR: 0.890-0.975). Furthermore, we conducted literature reviews on the predicted important features and found that they were highly associated with fetotoxicity. We expect that our model will benefit fetotoxicity research by providing an evaluation of fetotoxicity risks for drugs or drug candidates, along with an interpretation of that prediction.


Subject(s)
Machine Learning , Humans , Pregnancy , Female , Drug-Related Side Effects and Adverse Reactions , Fetus/drug effects , Computer Simulation
2.
Article in English | MEDLINE | ID: mdl-38381638

ABSTRACT

The emergence of the novel coronavirus, designated as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has posed a significant threat to public health worldwide. There has been progress in reducing hospitalizations and deaths due to SARS-CoV-2. However, challenges stem from the emergence of SARS-CoV-2 variants, which exhibit high transmission rates, increased disease severity, and the ability to evade humoral immunity. Epitope-specific T-cell receptor (TCR) recognition is key in determining the T-cell immunogenicity for SARS-CoV-2 epitopes. Although several data-driven methods for predicting epitope-specific TCR recognition have been proposed, they remain challenging due to the enormous diversity of TCRs and the lack of available training data. Self-supervised transfer learning has recently been proven useful for extracting information from unlabeled protein sequences, increasing the predictive performance of fine-tuned models, and using a relatively small amount of training data. This study presents a deep-learning model generated by fine-tuning pre-trained protein embeddings from a large corpus of protein sequences. The fine-tuned model showed markedly high predictive performance and outperformed the recent Gaussian process-based prediction model. The output attentions captured by the deep-learning model suggested critical amino acid positions in the SARS-CoV-2 epitope-specific TCRß sequences that are highly associated with the viral escape of T-cell immune response.


Subject(s)
COVID-19 , Computational Biology , Epitopes, T-Lymphocyte , Receptors, Antigen, T-Cell , SARS-CoV-2 , SARS-CoV-2/immunology , Humans , Epitopes, T-Lymphocyte/immunology , Epitopes, T-Lymphocyte/chemistry , Receptors, Antigen, T-Cell/immunology , Receptors, Antigen, T-Cell/chemistry , Receptors, Antigen, T-Cell/genetics , COVID-19/immunology , COVID-19/virology , Computational Biology/methods
3.
J Cheminform ; 16(1): 1, 2024 Jan 03.
Article in English | MEDLINE | ID: mdl-38173043

ABSTRACT

Safety is one of the important factors constraining the distribution of clinical drugs on the market. Drug-induced liver injury (DILI) is the leading cause of safety problems produced by drug side effects. Therefore, the DILI risk of approved drugs and potential drug candidates should be assessed. Currently, in vivo and in vitro methods are used to test DILI risk, but both methods are labor-intensive, time-consuming, and expensive. To overcome these problems, many in silico methods for DILI prediction have been suggested. Previous studies have shown that DILI prediction models can be utilized as prescreening tools, and they achieved a good performance. However, there are still limitations in interpreting the prediction results. Therefore, this study focused on interpreting the model prediction to analyze which features could potentially cause DILI. For this, five publicly available datasets were collected to train and test the model. Then, various machine learning methods were applied using substructure and physicochemical descriptors as inputs and the DILI label as the output. The interpretation of feature importance was analyzed by recognizing the following general-to-specific patterns: (i) identifying general important features of the overall DILI predictions, and (ii) highlighting specific molecular substructures which were highly related to the DILI prediction for each compound. The results indicated that the model not only captured the previously known properties to be related to DILI but also proposed a new DILI potential substructural of physicochemical properties. The models for the DILI prediction achieved an area under the receiver operating characteristic (AUROC) of 0.88-0.97 and an area under the Precision-Recall curve (AUPRC) of 0.81-0.95. From this, we hope the proposed models can help identify the potential DILI risk of drug candidates at an early stage and offer valuable insights for drug development.

4.
Genes (Basel) ; 14(9)2023 09 20.
Article in English | MEDLINE | ID: mdl-37761960

ABSTRACT

Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein-protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis.


Subject(s)
Breast Neoplasms , Neoplasms, Second Primary , RGS Proteins , Humans , Female , Breast Neoplasms/pathology , Transcriptome , Protein Interaction Maps , Machine Learning , Membrane Proteins/genetics , RGS Proteins/genetics , Melanoma, Cutaneous Malignant
5.
BMC Pediatr ; 23(1): 487, 2023 09 26.
Article in English | MEDLINE | ID: mdl-37752492

ABSTRACT

BACKGROUND: Children with physical or brain disabilities experience several functional impairments and declining health complications that must be considered for adequate medical support. This study investigated the current medical service utilization of children expressing physical or brain disabilities in South Korea by analyzing medical visits, expenses, and comorbidities. METHODS: We used a database linked to the National Rehabilitation Center of South Korea to extract information on medical services utilized by children with physical or brain disabilities, the number of children with a disability, medical visits for each child, medical expenses per visit, total medical treatment cost, copayments by age group, condition severity, and disability type. RESULTS: Brain disorder comorbidities significantly differed between those with mild and severe disabilities. Visits per child, total medical treatment cost, and copayments were higher in children with severe physical disabilities; however, medical expenses per visit were lower than those with mild disabilities. These parameters were higher in children with severe brain disabilities than in mild cases. Total medical expenses incurred by newborns to three-year-old children with physical disorders were highest due to increased visits per child. However, medical expenses per visit were highest for children aged 13-18. CONCLUSION: Medical service utilization varied by age, condition severity, and disability type. Severe cases and older children with potentially fatal comorbidities required additional economic support. Therefore, a healthcare delivery system for children with disabilities should be established to set affordable medical costs and provide comprehensive medical services based on disability type and severity.


Subject(s)
Brain Diseases , Brain , Infant, Newborn , Child , Humans , Adolescent , Physical Examination , Republic of Korea , Brain Diseases/therapy , Health Care Costs
6.
Nutrients ; 14(23)2022 Nov 23.
Article in English | MEDLINE | ID: mdl-36500992

ABSTRACT

Cataracts are a prevalent ophthalmic disease worldwide, and research on the risk factors for cataracts occurrence is actively being conducted. This study aimed to investigate the relationship between nutrient intake and cataracts in the older adult population in Korea. We analyzed data from Korean adults over the age of 60 years (cataract: 2137, non-cataract: 3497) using the Korean National Health and Nutrition Examination Survey. We performed univariate simple and multiple logistic regressions, adjusting for socio-demographic, medical history, and lifestyle, to identify the associations between nutrient intake and cataracts. A higher intake of vitamin B1 in the male group was associated with a lower incidence of cataracts. A lower intake of polyunsaturated fatty acids and vitamin A, and a higher intake of vitamin B2 in the female group were associated with a higher incidence of cataracts. Our study demonstrated that polyunsaturated fatty acids, vitamin A, and vitamin B2 could affect the incidence of cataracts according to sex. The findings could be used to control nutrient intake for cataract prevention.


Subject(s)
Cataract , Vitamin A , Male , Humans , Female , Aged , Middle Aged , Nutrition Surveys , Cross-Sectional Studies , Cataract/epidemiology , Cataract/etiology , Cataract/prevention & control , Riboflavin , Republic of Korea/epidemiology
7.
Front Biosci (Landmark Ed) ; 27(3): 80, 2022 03 04.
Article in English | MEDLINE | ID: mdl-35345312

ABSTRACT

BACKGROUND: Atrial fibrillation (AF) is a well-known risk factor for stroke. Predicting the risk is important to prevent the first and secondary attacks of cerebrovascular diseases by determining early treatment. This study aimed to predict the ischemic stroke in AF patients based on the massive and complex Korean National Health Insurance (KNHIS) data through a machine learning approach. METHODS: We extracted 65-dimensional features, including demographics, health examination, and medical history information, of 754,949 patients with AF from KNHIS. Logistic regression was used to determine whether the extracted features had a statistically significant association with ischemic stroke occurrence. Then, we constructed the ischemic stroke prediction model using an attention-based deep neural network. The extracted features were used as input, and the occurrence of ischemic stroke after the diagnosis of AF was the output used to train the model. RESULTS: We found 48 features significantly associated with ischemic stroke occurrence through regression analysis (p-value < 0.001). When the proposed deep learning model was applied to 150,989 AF patients, it was confirmed that the occurrence ischemic stroke was predicted to be higher AUROC (AUROC = 0.727 ± 0.003) compared to CHA2DS2-VASc score (AUROC = 0.651 ± 0.007) and other machine learning methods. CONCLUSIONS: As part of preventive medicine, this study could help AF patients prepare for ischemic stroke prevention based on predicted stoke associated features and risk scores.


Subject(s)
Atrial Fibrillation , Ischemic Stroke , Stroke , Atrial Fibrillation/complications , Atrial Fibrillation/diagnosis , Humans , Machine Learning , Risk Assessment/methods , Risk Factors , Stroke/diagnosis , Stroke/epidemiology , Stroke/etiology
8.
Int J Mol Sci ; 22(20)2021 Oct 15.
Article in English | MEDLINE | ID: mdl-34681774

ABSTRACT

Genetic interactions (GIs), such as the synthetic lethal interaction, are promising therapeutic targets in precision medicine. However, despite extensive efforts to characterize GIs by large-scale perturbation screening, considerable false positives have been reported in multiple studies. We propose a new computational approach for improved precision in GI identification by applying constraints that consider actual biological phenomena. In this study, GIs were characterized by assessing mutation, loss of function, and expression profiles in the DEPMAP database. The expression profiles were used to exclude loss-of-function data for nonexpressed genes in GI characterization. More importantly, the characterized GIs were refined based on Kyoto Encyclopedia of Genes and Genomes (KEGG) or protein-protein interaction (PPI) networks, under the assumption that genes genetically interacting with a certain mutated gene are adjacent in the networks. As a result, the initial GIs characterized with CRISPR and RNAi screenings were refined to 65 and 23 GIs based on KEGG networks and to 183 and 142 GIs based on PPI networks. The evaluation of refined GIs showed improved precision with respect to known synthetic lethal interactions. The refining process also yielded a synthetic partner network (SPN) for each mutated gene, which provides insight into therapeutic strategies for the mutated genes; specifically, exploring the SPN of mutated BRAF revealed ELAVL1 as a potential target for treating BRAF-mutated cancer, as validated by previous research. We expect that this work will advance cancer therapeutic research.


Subject(s)
Gene Regulatory Networks/physiology , Neoplasms/genetics , Protein Interaction Maps/genetics , Cell Line, Tumor , Computational Biology/methods , Epistasis, Genetic/physiology , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Genes, Neoplasm/genetics , Humans , Loss of Function Mutation , Mutation , Transcriptome
9.
Nutrients ; 13(4)2021 Apr 19.
Article in English | MEDLINE | ID: mdl-33921610

ABSTRACT

While several studies have explored nutrient intake and dietary habits associated with depression, few studies have reflected recent trends and demographic factors. Therefore, we examined how nutrient intake and eating habits are associated with depression, according to gender and age. We performed simple and multiple regressions using nationally representative samples of 10,106 subjects from the Korea National Health and Nutrition Examination Survey. The results indicated that cholesterol, dietary fiber, sodium, frequency of breakfast, lunch, dinner, and eating out were significantly associated with depression (p-value < 0.05). Moreover, depression was associated with nutrient intake and dietary habits by gender and age group: sugar, breakfast, lunch, and eating out frequency in the young women's group; sodium and lunch frequency among middle-age men; dietary fibers, breakfast, and eating out frequency among middle-age women; energy, moisture, carbohydrate, lunch, and dinner frequency in late middle-age men; breakfast and lunch frequency among late middle-age women; vitamin A, carotene, lunch, and eating out frequency among older age men; and fat, saturated fatty acids, omega-3 fatty acid, omega-6 fatty acid, and eating out frequency among the older age women's group (p-value < 0.05). This study can be used to establish dietary strategies for depression prevention, considering gender and age.


Subject(s)
Depression/epidemiology , Diet/statistics & numerical data , Eating/psychology , Feeding Behavior/psychology , Adult , Age Factors , Aged , Cluster Analysis , Depression/etiology , Diet/psychology , Female , Humans , Male , Middle Aged , Nutrition Surveys , Regression Analysis , Republic of Korea/epidemiology , Sex Factors , Young Adult
10.
Epidemiol Health ; 43: e2021010, 2021.
Article in English | MEDLINE | ID: mdl-33494129

ABSTRACT

Researchers have been interested in probing how the environmental factors associated with allergic diseases affect the use of medical services. Considering this demand, we have constructed a database, named the Allergic Disease Database, based on the National Health Insurance Database (NHID). The NHID contains information on demographic and medical service utilization for approximately 99% of the Korean population. This study targeted 3 major allergic diseases, including allergic rhinitis, atopic dermatitis, and asthma. For the target diseases, our database provides daily medical service information, including the number of daily visits from 2013 and 2017, categorized by patients' characteristics such as address, sex, age, and duration of residence. We provide additional information, including yearly population, a number of patients, and averaged geocoding coordinates by eup, myeon, and dong district code (the smallest-scale administrative units in Korea). This information enables researchers to analyze how daily changes in the environmental factors of allergic diseases (e.g., particulate matter, sulfur dioxide, and ozone) in certain regions would influence patients' behavioral patterns of medical service utilization. Moreover, researchers can analyze long-term trends in allergic diseases and the health effects caused by environmental factors such as daily climate and pollution data. The advantages of this database are easy access to data, additional levels of geographic detail, time-efficient data-refining and processing, and a de-identification process that minimizes the exposure of identifiable personal information. All datasets included in the Allergic Disease Database can be downloaded by accessing the National Health Insurance Service data sharing webpage (https://nhiss.nhis.or.kr).


Subject(s)
Asthma/epidemiology , Databases, Factual , Dermatitis, Atopic/epidemiology , National Health Programs , Rhinitis, Allergic/epidemiology , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Middle Aged , Republic of Korea/epidemiology , Young Adult
11.
PLoS One ; 15(9): e0238290, 2020.
Article in English | MEDLINE | ID: mdl-32946464

ABSTRACT

A well-defined protocol for a clinical trial guarantees a successful outcome report. When designing the protocol, most researchers refer to electronic databases and extract protocol elements using a keyword search. However, state-of-the-art database systems only offer text-based searches for user-entered keywords. In this study, we present a database system with a context-dependent and protocol-element-selection function for successfully designing a clinical trial protocol. To do this, we first introduce a database for a protocol retrieval system constructed from individual protocol data extracted from 184,634 clinical trials and 13,210 frame structures of clinical trial protocols. The database contains a variety of semantic information that allows the filtering of protocols during the search operation. Based on the database, we developed a web application called the clinical trial protocol database system (CLIPS; available at https://corus.kaist.edu/clips). This system enables an interactive search by utilizing protocol elements. To enable an interactive search for combinations of protocol elements, CLIPS provides optional next element selection according to the previous element in the form of a connected tree. The validation results show that our method achieves better performance than that of existing databases in predicting phenotypic features.


Subject(s)
Clinical Trial Protocols as Topic , Clinical Trials as Topic/standards , Computational Biology/methods , Databases, Factual , Information Storage and Retrieval , Software , Humans , User-Computer Interface
12.
Front Pharmacol ; 11: 584875, 2020.
Article in English | MEDLINE | ID: mdl-33519445

ABSTRACT

Medicinal plants and their extracts have been used as important sources for drug discovery. In particular, plant-derived natural compounds, including phytochemicals, antioxidants, vitamins, and minerals, are gaining attention as they promote health and prevent disease. Although several in vitro methods have been developed to confirm the biological activities of natural compounds, there is still considerable room to reduce time and cost. To overcome these limitations, several in silico methods have been proposed for conducting large-scale analysis, but they are still limited in terms of dealing with incomplete and heterogeneous natural compound data. Here, we propose a deep learning-based approach to identify the medicinal uses of natural compounds by exploiting massive and heterogeneous drug and natural compound data. The rationale behind this approach is that deep learning can effectively utilize heterogeneous features to alleviate incomplete information. Based on latent knowledge, molecular interactions, and chemical property features, we generated 686 dimensional features for 4,507 natural compounds and 2,882 approved and investigational drugs. The deep learning model was trained using the generated features and verified drug indication information. When the features of natural compounds were applied as input to the trained model, potential efficacies were successfully predicted with high accuracy, sensitivity, and specificity.

13.
Nutrients ; 10(8)2018 Aug 08.
Article in English | MEDLINE | ID: mdl-30096807

ABSTRACT

Identifying the health benefits of phytochemicals is an essential step in drug and functional food development. While many in vitro screening methods have been developed to identify the health effects of phytochemicals, there is still room for improvement because of high cost and low productivity. Therefore, researchers have alternatively proposed in silico methods, primarily based on three types of approaches; utilizing molecular, chemical or ethnopharmacological information. Although each approach has its own strength in analyzing the characteristics of phytochemicals, previous studies have not considered them all together. Here, we apply an integrated in silico analysis to identify the potential health benefits of phytochemicals based on molecular analysis and chemical properties as well as ethnopharmacological evidence. From the molecular analysis, we found an average of 415.6 health effects for 591 phytochemicals. We further investigated ethnopharmacological evidence of phytochemicals and found that on average 129.1 (31%) of the predicted health effects had ethnopharmacological evidence. Lastly, we investigated chemical properties to confirm whether they are orally bio-available, drug available or effective on certain tissues. The evaluation results indicate that the health effects can be predicted more accurately by cooperatively considering the molecular analysis, chemical properties and ethnopharmacological evidence.


Subject(s)
Algorithms , Data Mining/methods , Ethnopharmacology/methods , Phytochemicals/therapeutic use , Plant Preparations/therapeutic use , Systems Integration , Databases, Factual , Humans , Molecular Structure , Phytochemicals/adverse effects , Phytochemicals/chemistry , Plant Preparations/adverse effects , Plant Preparations/chemistry , Structure-Activity Relationship
14.
Sci Rep ; 8(1): 11667, 2018 08 03.
Article in English | MEDLINE | ID: mdl-30076354

ABSTRACT

Although natural compounds have provided a wealth of leads and clues in drug development, the process of identifying their pharmacological effects is still a challenging task. Over the last decade, many in vitro screening methods have been developed to identify the pharmacological effects of natural compounds, but they are still costly processes with low productivity. Therefore, in silico methods, primarily based on molecular information, have been proposed. However, large-scale analysis is rarely considered, since many natural compounds do not have molecular structure and target protein information. Empirical knowledge of medicinal plants can be used as a key resource to solve the problem, but this information is not fully exploited and is used only as a preliminary tool for selecting plants for specific diseases. Here, we introduce a novel method to identify pharmacological effects of natural compounds from herbal medicine based on phenotype-oriented network analysis. In this study, medicinal plants with similar efficacy were clustered by investigating hierarchical relationships between the known efficacy of plants and 5,021 phenotypes in the phenotypic network. We then discovered significantly enriched natural compounds in each plant cluster and mapped the averaged pharmacological effects of the plant cluster to the natural compounds. This approach allows us to predict unexpected effects of natural compounds that have not been found by molecular analysis. When applied to verified medicinal compounds, our method successfully identified their pharmacological effects with high specificity and sensitivity.


Subject(s)
Biological Products/pharmacology , Algorithms , Area Under Curve , Phenotype , Plants, Medicinal , ROC Curve , Reproducibility of Results
15.
BMC Bioinformatics ; 19(Suppl 8): 205, 2018 06 13.
Article in English | MEDLINE | ID: mdl-29897322

ABSTRACT

BACKGROUND: Natural products have been widely investigated in the drug development field. Their traditional use cases as medicinal agents and their resemblance of our endogenous compounds show the possibility of new drug development. Many researchers have focused on identifying therapeutic effects of natural products, yet the resemblance of natural products and human metabolites has been rarely touched. METHODS: We propose a novel method which predicts therapeutic effects of natural products based on their similarity with human metabolites. In this study, we compare the structure, target and phenotype similarities between natural products and human metabolites to capture molecular and phenotypic properties of both compounds. With the generated similarity features, we train support vector machine model to identify similar natural product and human metabolite pairs. The known functions of human metabolites are then mapped to the paired natural products to predict their therapeutic effects. RESULTS: With our selected three feature sets, structure, target and phenotype similarities, our trained model successfully paired similar natural products and human metabolites. When applied to the natural product derived drugs, we could successfully identify their indications with high specificity and sensitivity. We further validated the found therapeutic effects of natural products with the literature evidence. CONCLUSIONS: These results suggest that our model can match natural products to similar human metabolites and provide possible therapeutic effects of natural products. By utilizing the similar human metabolite information, we expect to find new indications of natural products which could not be covered by previous in silico methods.


Subject(s)
Biological Products/pharmacology , Classification/methods , Metabolome/drug effects , Area Under Curve , Biological Products/chemistry , Computer Simulation , Humans , Phenotype , ROC Curve , Reproducibility of Results
16.
Sci Rep ; 8(1): 1612, 2018 01 25.
Article in English | MEDLINE | ID: mdl-29371651

ABSTRACT

Identifying unexpected drug interactions is an essential step in drug development. Most studies focus on predicting whether a drug pair interacts or is effective on a certain disease without considering the mechanism of action (MoA). Here, we introduce a novel method to infer effects and interactions of drug pairs with MoA based on the profiling of systemic effects of drugs. By investigating propagated drug effects from the molecular and phenotypic networks, we constructed profiles of 5,441 approved and investigational drugs for 3,833 phenotypes. Our analysis indicates that highly connected phenotypes between drug profiles represent the potential effects of drug pairs and the drug pairs with strong potential effects are more likely to interact. When applied to drug interactions with verified effects, both therapeutic and adverse effects have been successfully identified with high specificity and sensitivity. Finally, tracing drug interactions in molecular and phenotypic networks allows us to understand the MoA.


Subject(s)
Computational Biology/methods , Drug Interactions , Drug-Related Side Effects and Adverse Reactions , Pharmaceutical Preparations , Pharmacology , Humans , Sensitivity and Specificity
17.
BMC Bioinformatics ; 17 Suppl 6: 219, 2016 Jul 28.
Article in English | MEDLINE | ID: mdl-27490208

ABSTRACT

BACKGROUND: Verifying the proteins that are targeted by compounds of natural herbs will be helpful to select natural herb-based drug candidates. However, this entails a great deal of effort to clarify the interaction throughout in vitro or in vivo experiments. In this light, in silico prediction of the interactions between compounds and target proteins can help ease the efforts. RESULTS: In this study, we performed in silico predictions of herbal compound target identification. First, data related to compounds, target proteins, and interactions between them are taken from the DrugBank database. Then we characterized six classes of compound-target interaction in humans including G-protein-coupled receptors (GPCRs), ion channel, enzymes, receptors, transporters, and other proteins. Also, classification-prediction models that predict the interactions between compounds and target proteins through a machine learning method were constructed using these matrices. As a result, AUC values of six classes are 0.94, 0.93, 0.90, 0.89, 0.91, and 0.76 respectively. Finally, the interactions of compounds from natural products were predicted using the constructed classification models. Furthermore, from our predicted results, we confirmed that several important disease related proteins were predicted as targets of natural herbal compounds. CONCLUSIONS: We constructed classification-prediction models that predict the interactions between compounds and target proteins. The constructed models showed good prediction performances, and numbers of potential natural compounds target proteins were predicted from our results.


Subject(s)
Biological Products/analysis , Computer Simulation , Drug Discovery , Plants, Medicinal/chemistry , Models, Chemical , Protein Binding , Support Vector Machine
18.
Interact J Med Res ; 1(2): e14, 2012 Nov 13.
Article in English | MEDLINE | ID: mdl-23612074

ABSTRACT

Electronic Health Records (EHRs) enable the sharing of patients' medical data. Since EHRs include patients' private data, access by researchers is restricted. Therefore k-anonymity is necessary to keep patients' private data safe without damaging useful medical information. However, k-anonymity cannot prevent sensitive attribute disclosure. An alternative, l-diversity, has been proposed as a solution to this problem and is defined as: each Q-block (ie, each set of rows corresponding to the same value for identifiers) contains at least l well-represented values for each sensitive attribute. While l-diversity protects against sensitive attribute disclosure, it is limited in that it focuses only on diversifying sensitive attributes. The aim of the study is to develop a k-anonymity method that not only minimizes information loss but also achieves diversity of the sensitive attribute. This paper proposes a new privacy protection method that uses conditional entropy and mutual information. This method considers both information loss as well as diversity of sensitive attributes. Conditional entropy can measure the information loss by generalization, and mutual information is used to achieve the diversity of sensitive attributes. This method can offer appropriate Q-blocks for generalization. We used the adult database from the UCI Machine Learning Repository and found that the proposed method can greatly reduce information loss compared with a recent l-diversity study. It can also achieve the diversity of sensitive attributes by counting the number of Q-blocks that have leaks of diversity. This study provides a privacy protection method that can improve data utility and protect against sensitive attribute disclosure. The method is viable and should be of interest for further privacy protection in EHR applications.

SELECTION OF CITATIONS
SEARCH DETAIL
...