RESUMEN
BACKGROUND: Diffuse large B-cell lymphoma (DLBCL) exhibits remarkable heterogeneity but still remains undiagnosed in identifying the subpopulation of DLBCL to predict the prognosis and guide clinical treatment. METHODS: Molecular subgroups were identified in gene expression data from GSE10846 by a consensus clustering algorithm. And gene set enrichment analysis, immune infiltration, and the proposed cell cycle algorithm were applied to explore the biological functions of different subtypes. Meanwhile, univariate and multivariate Cox regression analyses were used to evaluate independent prognostic factors of DLBCL. Finally, the prognostic model, including some key genes screened by Lasso regression, Random Forest algorithm, and point-biserial correlation, was constructed by an optimal classifier from seven machine learning algorithms and validated by another three external datasets (GSE34171, GSE87371, GSE31312). RESULTS: Comprehensive genomic analysis of 1,143 DLBCL samples identify 2 molecularly, prognostically relevant subtypes: immune-enriched (IME) and cell-cycle-enriched (CCE). Then a new predictive model including seven key genes (SERPING1, TIMP2, NME1, DCTPP1, RFC4, POLE2, and SNRPD1) was developed with high prediction accuracy (88.6%) and strong predictive power (AUC = 0.973) based on the Support Vector Machine (SVM) algorithm in 414 patients from GSE10846. The predictive power was similar in another three testing sets (HR > 1.400, p < 0.05). CONCLUSION: This model could evaluate survival independently with strong predictive power compared with other clinical risk factors. Our study constructed a reliable model to predict two new subtypes of DLBCL patients, which could guide the implementation of individualized treatment.