Pesquisa | Portal Regional da BVS (teste)

Stratification by Tumor Grade Groups in a Holistic Evaluation of Machine Learning for Brain Tumor Segmentation.

Prabhudesai, Snehal; Wang, Nicholas Chandler; Ahluwalia, Vinayak; Huan, Xun; Bapuraj, Jayapalli Rajiv; Banovic, Nikola; Rao, Arvind.

Front Neurosci ; 15: 740353, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34690680

RESUMO

Accurate and consistent segmentation plays an important role in the diagnosis, treatment planning, and monitoring of both High Grade Glioma (HGG), including Glioblastoma Multiforme (GBM), and Low Grade Glioma (LGG). Accuracy of segmentation can be affected by the imaging presentation of glioma, which greatly varies between the two tumor grade groups. In recent years, researchers have used Machine Learning (ML) to segment tumor rapidly and consistently, as compared to manual segmentation. However, existing ML validation relies heavily on computing summary statistics and rarely tests the generalizability of an algorithm on clinically heterogeneous data. In this work, our goal is to investigate how to holistically evaluate the performance of ML algorithms on a brain tumor segmentation task. We address the need for rigorous evaluation of ML algorithms and present four axes of model evaluation-diagnostic performance, model confidence, robustness, and data quality. We perform a comprehensive evaluation of a glioma segmentation ML algorithm by stratifying data by specific tumor grade groups (GBM and LGG) and evaluate these algorithms on each of the four axes. The main takeaways of our work are-(1) ML algorithms need to be evaluated on out-of-distribution data to assess generalizability, reflective of tumor heterogeneity. (2) Segmentation metrics alone are limited to evaluate the errors made by ML algorithms and their describe their consequences. (3) Adoption of tools in other domains such as robustness (adversarial attacks) and model uncertainty (prediction intervals) lead to a more comprehensive performance evaluation. Such a holistic evaluation framework could shed light on an algorithm's clinical utility and help it evolve into a more clinically valuable tool.

Stress Testing Pathology Models with Generated Artifacts.

Wang, Nicholas Chandler; Kaplan, Jeremy; Lee, Joonsang; Hodgin, Jeffrey; Udager, Aaron; Rao, Arvind.

J Pathol Inform ; 12: 54, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35070483

RESUMO

BACKGROUND: Machine learning models provide significant opportunities for improvement in health care, but their "black-box" nature poses many risks. METHODS: We built a custom Python module as part of a framework for generating artifacts that are meant to be tunable and describable to allow for future testing needs. We conducted an analysis of a previously published digital pathology classification model and an internally developed kidney tissue segmentation model, utilizing a variety of generated artifacts including testing their effects. The artifacts simulated were bubbles, tissue folds, uneven illumination, marker lines, uneven sectioning, altered staining, and tissue tears. RESULTS: We found that there is some performance degradation on the tiles with artifacts, particularly with altered stains but also with marker lines, tissue folds, and uneven sectioning. We also found that the response of deep learning models to artifacts could be nonlinear. CONCLUSIONS: Generated artifacts can provide a useful tool for testing and building trust in machine learning models by understanding where these models might fail.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA