ABSTRACT
BACKGROUND: Highly parallel analysis of gene expression has recently been used to identify gene sets or 'signatures' to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures. PRINCIPAL FINDINGS: A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to â¼40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number. CONCLUSIONS: We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited.
Subject(s)
Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Neoplasms/genetics , Neoplasms/metabolism , Area Under Curve , Breast Neoplasms/metabolism , Data Interpretation, Statistical , Humans , Kidney Neoplasms/metabolism , Lung Neoplasms/metabolism , Models, Statistical , Oligonucleotide Array Sequence Analysis , Prognosis , ROC Curve , Reproducibility of ResultsABSTRACT
PURPOSE: The current tumor, node, metastasis system needs refinement to improve its ability to predict survival of patients with non-small-cell lung cancer (NSCLC) treated with (chemo)radiation. In this study, we investigated the prognostic value of tumor volume and N status, assessed by using fluorodeoxyglucose-positron emission tomography (PET). PATIENTS AND METHODS: Clinical data from 270 consecutive patients with inoperable NSCLC Stages I-IIIB treated radically with (chemo)radiation were collected retrospectively. Diagnostic imaging was performed using either integrated PET-computed tomography or computed tomography and PET separately. The Kaplan-Meier method, as well as Cox regression, was used to analyze data. RESULTS: Univariate survival analysis showed that number of positive lymph node stations (PLNSs), as well as N stage on PET, was associated significantly with survival. The final multivariate Cox model consisted of number of PLNSs, gross tumor volume (i.e., volume of the primary tumor plus lymph nodes), sex, World Health Organization performance status, and equivalent radiation dose corrected for time; N stage was no longer significant. CONCLUSIONS: Number of PLNSs, assessed by means of fluorodeoxyglucose-PET, was a significant factor for survival of patients with inoperable NSCLC treated with (chemo)radiation. Risk stratification for this group of patients should be based on gross tumor volume, number of PLNSs, sex, World Health Organization performance status, and equivalent radiation dose corrected for time.