Search | VHL Regional Portal

Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder.

Kim, Hwanhee; Ko, Soohyun; Kim, Byung Ju; Ryu, Sung Jin; Ahn, Jaegyoon.

J Cheminform ; 14(1): 83, 2022 Dec 09.

Article in English | MEDLINE | ID: mdl-36494855

ABSTRACT

In this paper, a reinforcement learning model is proposed that can maximize the predicted binding affinity between a generated molecule and target proteins. The model used to generate molecules in the proposed model was the Stacked Conditional Variation AutoEncoder (Stack-CVAE), which acts as an agent in reinforcement learning so that the resulting chemical formulas have the desired chemical properties and show high binding affinity with specific target proteins. We generated 1000 chemical formulas using the chemical properties of sorafenib and the three target kinases of sorafenib. Then, we confirmed that Stack-CVAE generates more of the valid and unique chemical compounds that have the desired chemical properties and predicted binding affinity better than other generative models. More detailed analysis for 100 of the top scoring molecules show that they are novel ones not found in existing chemical databases. Moreover, they reveal significantly higher predicted binding affinity score for Raf kinases than for other kinases. Furthermore, they are highly druggable and synthesizable.

Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN.

Kwon, ChangHyuk; Park, Sangjin; Ko, Soohyun; Ahn, Jaegyoon.

PLoS One ; 16(4): e0250458, 2021.

Article in English | MEDLINE | ID: mdl-33905431

ABSTRACT

Accurate prediction of cancer stage is important in that it enables more appropriate treatment for patients with cancer. Many measures or methods have been proposed for more accurate prediction of cancer stage, but recently, machine learning, especially deep learning-based methods have been receiving increasing attention, mostly owing to their good prediction accuracy in many applications. Machine learning methods can be applied to high throughput DNA mutation or RNA expression data to predict cancer stage. However, because the number of genes or markers generally exceeds 10,000, a considerable number of data samples is required to guarantee high prediction accuracy. To solve this problem of a small number of clinical samples, we used a Generative Adversarial Networks (GANs) to augment the samples. Because GANs are not effective with whole genes, we first selected significant genes using DNA mutation data and random forest feature ranking. Next, RNA expression data for selected genes were expanded using GANs. We compared the classification accuracies using original dataset and expanded datasets generated by proposed and existing methods, using random forest, Deep Neural Networks (DNNs), and 1-Dimensional Convolutional Neural Networks (1DCNN). When using the 1DCNN, the F1 score of GAN5 (a 5-fold increase in data) was improved by 39% in relation to the original data. Moreover, the results using only 30% of the data were better than those using all of the data. Our attempt is the first to use GAN for augmentation using numeric data for both DNA and RNA. The augmented datasets obtained using the proposed method demonstrated significantly increased classification accuracy for most cases. By using GAN and 1DCNN in the prediction of cancer stage, we confirmed that good results can be obtained even with small amounts of samples, and it is expected that a great deal of the cost and time required to obtain clinical samples will be reduced. The proposed sample augmentation method could also be applied for other purposes, such as prognostic prediction or cancer classification.

Subject(s)

Machine Learning , Neoplasms/diagnosis , Prognosis , Humans , Image Processing, Computer-Assisted , Mutation/genetics , Neoplasm Staging , Neoplasms/classification , Neoplasms/pathology , Neural Networks, Computer , Principal Component Analysis

GVES: machine learning model for identification of prognostic genes with a small dataset.

Ko, Soohyun; Choi, Jonghwan; Ahn, Jaegyoon.

Sci Rep ; 11(1): 439, 2021 01 11.

Article in English | MEDLINE | ID: mdl-33431999

ABSTRACT

Machine learning may be a powerful approach to more accurate identification of genes that may serve as prognosticators of cancer outcomes using various types of omics data. However, to date, machine learning approaches have shown limited prediction accuracy for cancer outcomes, primarily owing to small sample numbers and relatively large number of features. In this paper, we provide a description of GVES (Gene Vector for Each Sample), a proposed machine learning model that can be efficiently leveraged even with a small sample size, to increase the accuracy of identification of genes with prognostic value. GVES, an adaptation of the continuous bag of words (CBOW) model, generates vector representations of all genes for all samples by leveraging gene expression and biological network data. GVES clusters samples using their gene vectors, and identifies genes that divide samples into good and poor outcome groups for the prediction of cancer outcomes. Because GVES generates gene vectors for each sample, the sample size effect is reduced. We applied GVES to six cancer types and demonstrated that GVES outperformed existing machine learning methods, particularly for cancer datasets with a small number of samples. Moreover, the genes identified as prognosticators were shown to reside within a number of significant prognostic genetic pathways associated with pancreatic cancer.

Subject(s)

Biomarkers, Tumor/genetics , Computer Simulation , Machine Learning , Neoplasms/diagnosis , Algorithms , Biomarkers, Tumor/isolation & purification , Computational Biology , Datasets as Topic , Genes, Neoplasm , Humans , Neoplasms/genetics , Prognosis

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL