Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38082648

ABSTRACT

Drug-target affinity (DTA) prediction is crucial to speed up drug development. The advance in deep learning allows accurate DTA prediction. However, most deep learning methods treat protein as a 1D string which is not informative to models compared to a graph representation. In this paper, we present a deep-learning-based DTA prediction method called N-gram Graph DTA (NG-DTA) that takes molecular graphs of drugs and n-gram molecular sub-graphs of proteins as inputs which are then processed by graph neural networks (GNNs). Without using any prediction tool for protein structure, NG-DTA performs better than other methods on two datasets in terms of concordance index (CI) and mean square error (MSE) (CI: 0.905, MSE: 0.196 for the Davis dataset; CI: 0.904, MSE: 0.120 for Kiba dataset). Our results showed that using n-gram molecular sub-graphs of proteins as input improves deep learning models' performance in DTA prediction.


Subject(s)
Drug Development , Neural Networks, Computer , Records
2.
Article in English | MEDLINE | ID: mdl-38083007

ABSTRACT

An accurate prediction of breast cancer is essential to help physicians make appropriate treatment recommendations to reduce the chance of excessive treatment, avoiding unnecessary anxiety for patients. Cancer prognosis is highly related to patients' genomic features, which are high-dimensional in nature. In this study, we utilize a systems biology feature selector for dimension reduction to select 20 prognostic biomarkers that are considered closely related to breast cancer prognosis from the high dimensional RNA Sequencing (RNA-Seq) data. Furthermore, we establish a graph neural network (GNN) and a multi-layer perception (MLP) graph-level readout method to better extract the underlying gene interactions from the corresponding gene interaction network (GIN). With the help of GINs, the model performs the best among all baseline models, especially in the area under the precision-recall curve (AUPRC) by as large as 23%. The results demonstrate that our approach using GNNs can successfully extract high-dimensional and complicated interactions within genomic data.


Subject(s)
Breast Neoplasms , Humans , Female , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Breast , Genomics , Anxiety , Neural Networks, Computer
3.
Article in English | MEDLINE | ID: mdl-38083168

ABSTRACT

Data imbalance is a practical and crucial issue in deep learning. Moreover, real-world datasets, such as electronic health records (EHR), often suffer from high missing rates. Both issues can be understood as noises in data that may lead to bad generalization results for standard deep-learning algorithms. This paper introduces a novel meta-learning approach to deal with these noise issues in an EHR dataset for a binary classification task. This meta-learning approach leverages the information from a selected subset of balanced, low-missing rate data to automatically assign proper weight to each sample. Such weights would enhance the informative samples and suppress the opposites during training. Furthermore, the meta-learning approach is model-agnostic for deep learning-based architectures that simultaneously handle the high imbalanced ratio and high missing rate problems. Through experiments, we demonstrate that this meta-learning approach is better in extreme cases. In the most extreme one, with an imbalance ratio of 172 and a 74.6% missing rate, our method outperforms the original model without meta-learning by as much as 10.3% of the area under the receiver-operating characteristic curve (AUROC) and 3.2% of the area under the precision-recall curve (AUPRC). Our results mark the first step towards training a robust model for extremely noisy EHR datasets.


Subject(s)
Electronic Health Records , Machine Learning , Algorithms
4.
Bioinform Adv ; 3(1): vbac100, 2023.
Article in English | MEDLINE | ID: mdl-36698767

ABSTRACT

Motivation: Cancer is one of the world's leading mortality causes, and its prognosis is hard to predict due to complicated biological interactions among heterogeneous data types. Numerous challenges, such as censorship, high dimensionality and small sample size, prevent researchers from using deep learning models for precise prediction. Results: We propose a robust Semi-supervised Cancer prognosis classifier with bAyesian variational autoeNcoder (SCAN) as a structured machine-learning framework for cancer prognosis prediction. SCAN incorporates semi-supervised learning for predicting 5-year disease-specific survival and overall survival in breast and non-small cell lung cancer (NSCLC) patients, respectively. SCAN achieved significantly better AUROC scores than all existing benchmarks (81.73% for breast cancer; 80.46% for NSCLC), including our previously proposed bimodal neural network classifiers (77.71% for breast cancer; 78.67% for NSCLC). Independent validation results showed that SCAN still achieved better AUROC scores (74.74% for breast; 72.80% for NSCLC) than the bimodal neural network classifiers (64.13% for breast; 67.07% for NSCLC). SCAN is general and can potentially be trained on more patient data. This paves the foundation for personalized medicine for early cancer risk screening. Availability and implementation: The source codes reproducing the main results are available on GitHub: https://gitfront.io/r/user-4316673/36e8714573f3fbfa0b24690af5d1a9d5ca159cf4/scan/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2030-2033, 2021 11.
Article in English | MEDLINE | ID: mdl-34891686

ABSTRACT

Fast and accurate cancer prognosis stratification models are essential for treatment designs. Large labeled patient data can power advanced deep learning models to obtain precise predictions. However, since fully labeled patient data are hard to acquire in practical scenarios, deep models are prone to make non-robust predictions biased toward data partition and model hyper-parameter selection. Given a small training set, we applied the systems biology feature selector in our previous study to avoid over-fitting and select 18 prognostic biomarkers. Combined with three other clinical features, we trained Bayesian binary classifiers to predict the 5-year overall survival (OS) of colon cancer patients in this study. Results showed that Bayesian models could provide better and more robust predictions compared to their non-Bayesian counterparts. Specifically, in terms of the area under the receiver operating characteristic curve (AUC), macro F1-score (maF1), and concordance index (CI), we found that the Bayesian bimodal neural network (late fusion) classifier (B-Bimodal) achieved the best results (AUC: 0.8083 ± 0.0736; maF1: 0.7300 ± 0.0659; CI: 0.7238 ± 0.0440). The single modal Bayesian neural network classifier (B-Concat) fed with concatenated patient data (early fusion) achieved slightly worse but more robust performance in terms of AUC and CI (AUC: 0.7105 ± 0.0692; maF1: 0.7156 ± 0.0690; CI: 0.6627 ± 0.0558). Such robustness is essential to training learning models with small medical data.


Subject(s)
Colonic Neoplasms , Neural Networks, Computer , Bayes Theorem , Humans , ROC Curve , Systems Biology
6.
Sci Rep ; 11(1): 14914, 2021 07 21.
Article in English | MEDLINE | ID: mdl-34290286

ABSTRACT

Breast cancer is a heterogeneous disease. To guide proper treatment decisions for each patient, robust prognostic biomarkers, which allow reliable prognosis prediction, are necessary. Gene feature selection based on microarray data is an approach to discover potential biomarkers systematically. However, standard pure-statistical feature selection approaches often fail to incorporate prior biological knowledge and select genes that lack biological insights. Besides, due to the high dimensionality and low sample size properties of microarray data, selecting robust gene features is an intrinsically challenging problem. We hence combined systems biology feature selection with ensemble learning in this study, aiming to select genes with biological insights and robust prognostic predictive power. Moreover, to capture breast cancer's complex molecular processes, we adopted a multi-gene approach to predict the prognosis status using deep learning classifiers. We found that all ensemble approaches could improve feature selection robustness, wherein the hybrid ensemble approach led to the most robust result. Among all prognosis prediction models, the bimodal deep neural network (DNN) achieved the highest test performance, further verified by survival analysis. In summary, this study demonstrated the potential of combining ensemble learning and bimodal DNN in guiding precision medicine.


Subject(s)
Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Neural Networks, Computer , Systems Biology/methods , Biomarkers, Tumor/genetics , Breast Neoplasms/mortality , Deep Learning , Female , Forecasting , Humans , Prognosis , Support Vector Machine , Survival Analysis
7.
Oncogene ; 40(4): 791-805, 2021 01.
Article in English | MEDLINE | ID: mdl-33262462

ABSTRACT

Epithelial-mesenchymal transition (EMT)/mesenchymal-epithelial transition (MET) processes are proposed to be a driving force of cancer metastasis. By studying metastasis in bone marrow-derived mesenchymal stem cell (BM-MSC)-driven lung cancer models, microarray time-series data analysis by systems biology approaches revealed BM-MSC-induced signaling triggers early dissemination of CD133+/CD83+ cancer stem cells (CSCs) from primary sites shortly after STAT3 activation but promotes proliferation towards secondary sites. The switch from migration to proliferation was regulated by BM-MSC-secreted LIF and activated LIFR/p-ERK/pS727-STAT3 signaling to promote early disseminated cancer cells MET and premetastatic niche formation. Then, tumor-tropic BM-MSCs circulated to primary sites and triggered CD151+/CD38+ cells acquiring EMT-associated CSC properties through IL6R/pY705-STAT3 signaling to promote tumor initiation and were also attracted by and migrated towards the premetastatic niche. In summary, STAT3 phosphorylation at tyrosine 705 and serine 727 differentially regulates the EMT-MET switch within the distinct molecular subtypes of CSCs to complete the metastatic process.


Subject(s)
Epithelial-Mesenchymal Transition , Neoplasm Metastasis , STAT3 Transcription Factor/metabolism , Cell Line, Tumor , Humans , Leukemia Inhibitory Factor Receptor alpha Subunit/physiology , Lung Neoplasms/pathology , MAP Kinase Signaling System/physiology , Mesenchymal Stem Cells/physiology , Neoplastic Stem Cells/physiology , Phosphorylation , Protein Interaction Maps , Receptors, Interleukin-6/physiology , Serine , Tyrosine
8.
Cancers (Basel) ; 12(12)2020 Dec 11.
Article in English | MEDLINE | ID: mdl-33322441

ABSTRACT

Hepatocellular carcinoma (HCC) is one of the leading causes of cancer mortality. Cancer stem cells (CSCs) are responsible for the maintenance, metastasis, and relapse of various tumors. The effects of CSCs on the tumorigenesis of HCC are still not fully understood, however. We have recently established two new rat HCC cell lines HTC and TW-1, which we isolated from diethylnitrosamine-induced rat liver cancer. Results showed that TW-1 expressed the genetic markers of CSCs, including CD133, GSTP1, CD44, CD90, and EpCAM. Moreover, TW-1 showed higher tolerance to sorafenib than HTC did. In addition, tumorigenesis and metastasis were observed in nude mice and wild-type rats with TW-1 xenografts. Finally, we combined highly expressed genes in TW-1/HTC with well-known biomarkers from recent HCC studies to predict HCC-related biomarkers and able to identify HCC with AUCs > 0.9 after machine learning. These results indicated that TW-1 was a novel rat CSC line, and the mice or rat models we established with TW-1 has great potential on HCC studies in the future.

9.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 5669-5672, 2020 07.
Article in English | MEDLINE | ID: mdl-33019263

ABSTRACT

Accurate cancer patient prognosis stratification is essential for oncologists to recommend proper treatment plans. Deep learning models are capable of providing good prediction power for such stratification. The main challenge is that only a limited number of labeled patients are available for cancer prognosis. To overcome this, we proposed Wasserstein Generative Adversarial Network-based Deep Adversarial Data Augmentation (wDADA) that leverages generative adversarial networks to perform data augmentation and assist in model training. We used the proposed framework to train our model for predicting disease-specific survival (DSS) of breast cancer patients from the METABRIC dataset. We found that wDADA achieved 0.6726± 0.0278, 0.7538±0.0328, and 0.6507 ±0.0248 in terms of accuracy, AUC, and concordance index in predicting 5-year DSS, respectively, which is comparable to our previously proposed Bimodal model (accuracy: 0.6889±0.0159; AUC: 0.7546± 0.0183; concordance index: 0.6542±0.0120), which needs careful calibration and extensive search on pre-trained network architectures. The flexibility of the proposed wDADA allows us to incorporate it with ensemble learning and semi-supervised learning to further improve performance. Our results indicate that it is possible to utilize generative adversarial networks to train deep models in medical applications, wherein only limited data are available.


Subject(s)
Breast Neoplasms , Humans , Prognosis , Supervised Machine Learning
10.
Sci Rep ; 10(1): 4679, 2020 03 13.
Article in English | MEDLINE | ID: mdl-32170141

ABSTRACT

Non-small cell lung cancer (NSCLC) is one of the most common lung cancers worldwide. Accurate prognostic stratification of NSCLC can become an important clinical reference when designing therapeutic strategies for cancer patients. With this clinical application in mind, we developed a deep neural network (DNN) combining heterogeneous data sources of gene expression and clinical data to accurately predict the overall survival of NSCLC patients. Based on microarray data from a cohort set (614 patients), seven well-known NSCLC biomarkers were used to group patients into biomarker- and biomarker+ subgroups. Then, by using a systems biology approach, prognosis relevance values (PRV) were then calculated to select eight additional novel prognostic gene biomarkers. Finally, the combined 15 biomarkers along with clinical data were then used to develop an integrative DNN via bimodal learning to predict the 5-year survival status of NSCLC patients with tremendously high accuracy (AUC: 0.8163, accuracy: 75.44%). Using the capability of deep learning, we believe that our prediction can be a promising index that helps oncologists and physicians develop personalized therapy and build the foundation of precision medicine in the future.


Subject(s)
Carcinoma, Non-Small-Cell Lung/diagnosis , Carcinoma, Non-Small-Cell Lung/mortality , Computational Biology , Deep Learning , Lung Neoplasms/diagnosis , Lung Neoplasms/mortality , Area Under Curve , Biomarkers, Tumor , Carcinoma, Non-Small-Cell Lung/etiology , Computational Biology/methods , Humans , Kaplan-Meier Estimate , Lung Neoplasms/etiology , Microarray Analysis/methods , Neoplasm Grading , Neoplasm Metastasis , Neoplasm Staging , Reproducibility of Results , Support Vector Machine , Workflow
11.
BMC Syst Biol ; 12(Suppl 2): 29, 2018 03 19.
Article in English | MEDLINE | ID: mdl-29560825

ABSTRACT

BACKGROUND: Regeneration is an important biological process for the restoration of organ mass, structure, and function after damage, and involves complex bio-physiological mechanisms including cell differentiation and immune responses. We constructed four regenerative protein-protein interaction (PPI) networks using dynamic models and AIC (Akaike's Information Criterion), based on time-course microarray data from the regeneration of four zebrafish organs: heart, cerebellum, fin, and retina. We extracted core and organ-specific proteins, and proposed a recalled-blastema-like formation model to uncover regeneration strategies in zebrafish. RESULTS: It was observed that the core proteins were involved in TGF-ß signaling for each step in the recalled-blastema-like formation model and TGF-ß signaling may be vital for regeneration. Integrins, FGF, and PDGF accelerate hemostasis during heart injury, while Bdnf shields retinal neurons from secondary damage and augments survival during the injury response. Wnt signaling mediates the growth and differentiation of cerebellum and fin neural stem cells, potentially providing a signal to trigger differentiation. CONCLUSION: Through our analysis of all four zebrafish regenerative PPI networks, we provide insights that uncover the underlying strategies of zebrafish organ regeneration.


Subject(s)
Animal Fins/physiology , Cerebellum/physiology , Heart/physiology , Regeneration , Retina/physiology , Systems Biology , Zebrafish/physiology , Animals , Protein Interaction Mapping , Zebrafish/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...