RESUMO
Objective Only 10%-30% of locally advanced rectal cancer (LARC) respond pathologically to neoadjuvant chemoradiotherapy (NCRT). This study was to search for a feasible gene signature predicting pathological response to NCRT in LARC. Methods Four datasets GSE35452, GSE46862, GSE68204 and GSE53781 relating to the mRNA expression matrix and tumor regression grading of LARC after NCRT were obtained from the Gene Expression Omnibus. The first three datasets were merged into one and divided into training sets (n = 121) and internal validation sets (n = 53) after batch effect removal, and the last dataset was used as external validation sets (n = 26). Pathological response-related genes in the training sets were identified by univariate logistic regression and t-test (crude P < 0.05) and ranked by the P-value. All the genes with P < 0.05 were subjected to the least absolute shrinkage and selection operator (LASSO) and the first 50 to the support vector machine algorithm (SVM) for the establishment of predicting models, followed by verification in the corresponding validation set. Random sampling was repeated 500 times to determine the stability of the selected gene signatures and models. With the 21 most important genes revealed by LASSO as the candidates for model construction, the sensitivity index for NCRT was calculated as the total sum of coefficients in logistic regression and expression values in the merged datasets and external validation sets. The differentially expressed genes were identified between the response and non-response groups in the 174 merged datasets and subjected to regulatory network analysis. Results A total of 12 803 genes from the GSE35452, GSE46862 and GSE68204 datasets were included in the analysis. The accuracy, specificity and sensitivity of LASSO for predicting the pathological response in the internal validation sets were 0.523 (95% CI: 0.396-0.642, 0.578 (95% CI: 0.373-0.762) and 0.464 (95% CI: 0.258-0.700), while those of SVM were 0.504 (95% CI: 0.377-0.623), 0.596 (95% CI: 0.393-0.830) and 0.405(95% CI: 0.182-0.650), respectively. The area under the ROC curve (AUC) for pathological response prediction was 0.863 (95% CI: 0.811-0.912) in the 174 merged datasets and 0.925 (95% CI: 0.817-1.000) in the external validation sets. Conclusion The model for predicting response to NCRT established using the expression of candidate genes identified from a specific set of patients has a frustratingly low capacity in an independent set, probably because of high tumor heterogeneity among different individuals. Regulatory network analysis indicates that radiotherapy-resistance in rectal cancer may be mediated by the mechanisms underlying the invasion, metastasis and transformation of the malignancy.