Your browser doesn't support javascript.
loading
Application value of machine learning algorithms and COX nomogram in the survival prediction of hepatocellular carcinoma after resection / 中华消化外科杂志
Chinese Journal of Digestive Surgery ; (12): 166-178, 2020.
Article in Chinese | WPRIM | ID: wpr-865030
ABSTRACT

Objective:

To investigate the application value of machine learning algorithms and COX nomogram in the survival prediction of hepatocellular carcinoma (HCC) after resection.

Methods:

The retrospective and descriptive study was conducted. The clinicopathological data of 375 patients with HCC who underwent radical resection in the Cancer Hospital of Chinese Academy of Medical Sciences and Peking Union Medical College from January 2012 to January 2017 were collected. There were 304 males and 71 females, aged from 21 to 79 years, with a median age of 57 years. According to the random numbers showed in the computer, 375 patients were divided into training dataset consisting of 300 patients and validation dataset consisting of 75 patients, with a ratio of 8∶2. Machine learning algorithms including logistic regression (LR), supporting vector machine (SVM), decision tree (DT), random forest (RF), and artificial neural network (ANN) were used to construct survival prediction models for HCC after resection, so as to identify the optimal machine learning algorithm prediction model. A COX nomogram prediction model for predicting postoperative survival in patients with HCC was also constructed. Comparison of performance for predicting postoperative survival of HCC patients was conducted between the optimal machine learning algorithm prediction model and the COX nomogram prediction model. Observation indicators (1) analysis of clinicopathological data of patients in the training dataset and validation dataset; (2) follow-up and survival of patients in the training dataset and validation dataset; (3) construction and evaluation of machine learning algorithm prediction models; (4) construction and evaluation of COX nomogram prediction model; (5) evaluation of prediction performance between RF machine learning algorithm prediction model and COX nomogram prediction model. Follow-up was performed using outpatient examination or telephone interview to detect survival of patients up to December 2019 or death. Measurement data with normal distribution were expressed as Mean± SD, and comparison between groups was analyzed by the paired t test. Measurement data with skewed distribution were expressed as M ( P25, P75) or M (range), and comparison between groups was analyzed by the Mann-Whitney U test. Count data were represented as absolute numbers. Comparison between groups was performed using the chi-square test when Tmin ≥5 and N ≥40, using the calibration chi-square test when 1≤ Tmin ≤5 and N ≥40, and using Fisher exact probability when Tmin <1 or N <40. The Kaplan-Meier method was used to calculate survival rate and draw survival curve. The COX proportional hazard model was used for univariate analysis, and variables with P<0.2 were included for the Lasso regression analysis. According to the lambda value, variables affecting prognosis were screened for COX proportional hazard model to perform multivariate analysis.

Results:

(1) Analysis of clinicopathological data of patients in the training dataset and validation dataset cases without microvascular invasion or with microvascular invasion, cases without liver cirrhosis or with liver cirrhosis of the training dataset were 292, 8, 105, 195, respectively, versus 69, 6, 37, 38 of the validation dataset, showing significant differences between the two groups ( χ2=4.749, 5.239, P<0.05). (2) Follow-up and survival of patients in the training dataset and validation dataset all the 375 patients received follow-up. The 300 patients in the training dataset were followed up for 1.1-85.5 months, with a median follow-up time of 50.3 months. Seventy-five patients in the validation dataset were followed up for 1.0-85.7 months, with a median follow-up time of 46.7 months. The postoperative 1-, 3-year overall survival rates of the 375 patients were 91.7%, 79.5%. The postoperative 1-, 3-year overall survival rates of the training dataset were 92.0%, 79.7%, versus 90.7%, 81.9% of the validation dataset, showing no significant difference in postoperative survival between the two groups ( χ2=0.113, P>0.05). (3) Construction and evaluation of machine learning algorithm prediction models. ① Selection of the optimal machine learning algorithm prediction model according to information divergence of variables for prediction of 3 years postoperative survival of HCC, five machine learning algorithms were used to comprehensively rank the variables of clinicopathological factors of HCC, including LR, SVM, DT, RF, and ANN. The main predictive factors were screened out, as hepatitis B e antigen (HBeAg), surgical procedure, maximum tumor diameter, perioperative blood transfusion, liver capsule invasion, and liver segment Ⅳ invasion. The rank sequence 3, 6, 9, 12, 15, 18, 21, 24, 27, 29 variables of predictive factors were introduced into 5 machine learning algorithms in turn. The results showed that the area under curve (AUC) of the receiver operating charateristic curve of LR, SVM, DT, and RF machine learning algorithm prediction models tended to be stable when 9 variables are introduced. When more than 12 variables were introduced, the AUC of ANN machine learning algorithm prediction model fluctuated significantly, the stability of AUC of LR and SVM machine learning algorithm prediction models continued to improve, and the AUC of RF machine learning algorithm prediction model was nearly 0.990, suggesting RF machine learning algorithm prediction model as the optimal machine learning algorithm prediction model. ② Optimization and evaluation of RF machine learning algorithm prediction model 29 variables of predictive factors were sequentially introduced into the RF machine learning algorithm to construct the optimal RF machine learning algorithm prediction model in the training dataset. The results showed that when 10 variables were introduced, results of grid search method showed 4 as the optimal number of nodes in DT, and 1 000 as the optimal number of DT. When the number of introduced variables were not less than 10, the AUC of RF machine learning algorithm prediction model was about 0.990. When 10 variables were introduced, the RF machine learning algorithm prediction model had an AUC of 0.992 for postoperative overall survival of 3 years, a sensitivity of 0.629, a specificity of 0.996 in the training dataset, an AUC of 0.723 for postoperative overall survival of 3 years, a sensitivity of 0.177, a specificity of 0.948 in the validation dataset. (4) Construction and evaluation of COX nomogram prediction model. ① Analysis of postoperative survival factors of HCC patients in the training dataset. Results of univariate analysis showed that HBeAg, alpha fetoprotein (AFP), preoperative blood transfusion, maximum tumor diameter, liver capsule invasion, and degree of tumor differentiation were related factors for postoperative survival of HCC patients [ hazard ratio ( HR)=1.958, 1.878, 2.170, 1.188, 2.052, 0.222, 95% confidence interval ( CI) 1.185-3.235, 1.147-3.076, 1.389-3.393, 1.092-1.291, 1.240-3.395, 0.070-0.703, P<0.05]. Clinico-pathological data with P<0.2 were included for Lasso regression analysis, and the results showed that age, HBeAg, AFP, surgical procedure, perioperative blood transfusion, maximum tumor diameter, tumor located at liver segment Ⅴ or Ⅷ, liver capsule invasion, and degree of tumor differentiation as high differentiation, moderate-high differentiation, moderate differentiation, moderate-low differentiation were related factors for postoperative survival of HCC patients. The above factors were included for further multivariate COX analysis, and the results showed that HBeAg, surgical procedure, maximum tumor diameter were independent factors affecting postoperative survival of HCC patients ( HR=1.770, 8.799, 1.142, 95% CI 1.049- 2.987, 1.203-64.342, 1.051-1.242, P<0.05). ② Construction and evaluation of COX nomogram prediction model the clinicopathological factors of P≤0.1 in the COX multivariate analysis were induced to Rstudio software and rms software package to construct COX nomogram prediction model in the training dataset. The COX nomogram prediction model for predicting postoperative overall survival had an consistency index of 0.723 (se=0.028), an AUC of 0.760 for postoperative overall survival of 3 years in the training dataset, an AUC of 0.795 for postoperative overall survival of 3 years in the validation dataset. The verification of the calibration plot in the training dataset showed that the COX nomogram prediction model had a good prediction performance for postoperative survival. COX nomogram score=0.627 06×HBeAg (normal=0, abnormal=1)+ 0.134 34×maximum tumor diameter (cm)+ 2.107 58×surgical procedure (laparoscopy=0, laparotomy=1)+ 0.545 58×perioperative blood transfusion (without blood transfusion=0, with blood transfusion=1)-1.421 33×high differentiation (non-high differentiation=0, high differentiation=1). The COX nomogram risk scores of all patients were calculated. Xtile software was used to find the optimal threshold of COX nomogram risk scores. Patients with risk scores ≥2.9 were assigned into high risk group, and patients with risk scores <2.9 were assigned into low risk group. Results of Kaplan-Meier overall survival curve showed a significant difference in the postoperative overall survival between low risk group and high risk group of the training dataset ( χ2=33.065, P<0.05). There was a significant difference in the postoperative overall survival between low risk group and high risk group of the validation dataset ( χ2=6.585, P<0.05). Results of further analysis by the decision-making curve showed that COX nomogram prediction model based on the combination of HBeAg, surgical procedure, perioperative blood transfusion, maximum tumor diameter, and degree of tumor differentiation was superior to any of the above individual factors in prediction performance. (5) Evaluation of prediction performance between RF machine learning algorithm prediction model and COX nomogram prediction model prediction difference between two models was investigated by analyzing maximun tumor diameter (the important variable shared in both models), and by comparing the predictive error curve of both models. The results showed that the postoperative 3-year survival rates predicted by RF machine learning algorithm prediction model and COX nomogram prediction model were 77.17% and 74.77% respectively for tumor with maximum diameter of 2.2 cm ( χ2=0.182, P>0.05), 57.51% and 61.65% for tumor with maximum diameter of 6.3 cm ( χ2=0.394, P>0.05), 51.03% and 27.52% for tumor with maximum diameter of 14.2 cm ( χ2=12.762, P<0.05). With the increase of the maximum tumor diameter, the difference in survival rates predicted between the two models turned larger. In the validation dataset, the AUC for postoperative overall survival of 3 years of RF machine learning algorithm prediction model and COX nomogram prediction model was 0.723 and 0.795, showing a significant difference between the two models ( t=3.353, P<0.05). Resluts of Bootstrap cross-validation for prediction error showed that the integrated Brier scores of RF machine learning algorithm prediction model and COX nomogram prediction model for predicting 3-year survival were 0.139 and 0.134, respectively. The prediction error of COX nomogram prediction model was lower than that of RF machine learning algorithm prediction model.

Conclusion:

Compared with machine learning algorithm prediction models, the COX nomogram prediction model performs better in predicting 3 years postoperative survival of HCC, with fewer variables, which is easy for clinical use.
Full text: Available Index: WPRIM (Western Pacific) Type of study: Prognostic study Language: Chinese Journal: Chinese Journal of Digestive Surgery Year: 2020 Type: Article

Similar

MEDLINE

...
LILACS

LIS

Full text: Available Index: WPRIM (Western Pacific) Type of study: Prognostic study Language: Chinese Journal: Chinese Journal of Digestive Surgery Year: 2020 Type: Article