Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Metabolomics ; 17(4): 37, 2021 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-33772663

RESUMO

INTRODUCTION: The identification of metabolomic biomarkers predictive of cancer patient response to therapy and of disease stage has been pursued as a "holy grail" of modern oncology, relying on the metabolic dysfunction that characterizes cancer progression. In spite of the evaluation of many candidate biomarkers, however, determination of a consistent set with practical clinical utility has proven elusive. OBJECTIVE: In this study, we systematically examine the combined role of data pre-treatment and imputation methods on the performance of multivariate data analysis methods and their identification of potential biomarkers. METHODS: Uniquely, we are able to systematically evaluate both unsupervised and supervised methods with a metabolomic data set obtained from patient-derived lung cancer core biopsies with true missing values. Eight pre-treatment methods, ten imputation methods, and two data analysis methods were applied in combination. RESULTS: The combined choice of pre-treatment and imputation methods is critical in the definition of candidate biomarkers, with deficient or inappropriate selection of these methods leading to inconsistent results, and with important biomarkers either being overlooked or reported as a false positive. The log transformation appeared to normalize the original tumor data most effectively, but the performance of the imputation applied after the transformation was highly dependent on the characteristics of the data set. CONCLUSION: The combined choice of pre-treatment and imputation methods may need careful evaluation prior to metabolomic data analysis of human tumors, in order to enable consistent identification of potential biomarkers predictive of response to therapy and of disease stage.


Assuntos
Biomarcadores , Neoplasias Pulmonares/metabolismo , Metabolômica/métodos , Análise de Dados , Humanos , Neoplasias Pulmonares/terapia , Análise de Componente Principal
2.
Int J Med Inform ; 108: 1-8, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29132615

RESUMO

Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods.


Assuntos
Neoplasias Pulmonares/mortalidade , Aprendizado de Máquina , Máquina de Vetores de Suporte , Bases de Dados Factuais , Humanos , Taxa de Sobrevida
3.
PLoS One ; 12(9): e0184370, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28910336

RESUMO

This study applies unsupervised machine learning techniques for classification and clustering to a collection of descriptive variables from 10,442 lung cancer patient records in the Surveillance, Epidemiology, and End Results (SEER) program database. The goal is to automatically classify lung cancer patients into groups based on clinically measurable disease-specific variables in order to estimate survival. Variables selected as inputs for machine learning include Number of Primaries, Age, Grade, Tumor Size, Stage, and TNM, which are numeric or can readily be converted to numeric type. Minimal up-front processing of the data enables exploring the out-of-the-box capabilities of established unsupervised learning techniques, with little human intervention through the entire process. The output of the techniques is used to predict survival time, with the efficacy of the prediction representing a proxy for the usefulness of the classification. A basic single variable linear regression against each unsupervised output is applied, and the associated Root Mean Squared Error (RMSE) value is calculated as a metric to compare between the outputs. The results show that self-ordering maps exhibit the best performance, while k-Means performs the best of the simpler classification techniques. Predicting against the full data set, it is found that their respective RMSE values (15.591 for self-ordering maps and 16.193 for k-Means) are comparable to supervised regression techniques, such as Gradient Boosting Machine (RMSE of 15.048). We conclude that unsupervised data analysis techniques may be of use to classify patients by defining the classes as effective proxies for survival prediction.


Assuntos
Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/mortalidade , Adulto , Idoso , Idoso de 80 Anos ou mais , Análise por Conglomerados , Bases de Dados Factuais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Estadiamento de Neoplasias , Programa de SEER , Análise de Sobrevida , Carga Tumoral , Aprendizado de Máquina não Supervisionado , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...