Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Artif Intell ; 7: 1330919, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38469161

RESUMO

Convolutional Neural Networks (CNNs) are frequently and successfully used in medical prediction tasks. They are often used in combination with transfer learning, leading to improved performance when training data for the task are scarce. The resulting models are highly complex and typically do not provide any insight into their predictive mechanisms, motivating the field of "explainable" artificial intelligence (XAI). However, previous studies have rarely quantitatively evaluated the "explanation performance" of XAI methods against ground-truth data, and transfer learning and its influence on objective measures of explanation performance has not been investigated. Here, we propose a benchmark dataset that allows for quantifying explanation performance in a realistic magnetic resonance imaging (MRI) classification task. We employ this benchmark to understand the influence of transfer learning on the quality of explanations. Experimental results show that popular XAI methods applied to the same underlying model differ vastly in performance, even when considering only correctly classified examples. We further observe that explanation performance strongly depends on the task used for pre-training and the number of CNN layers pre-trained. These results hold after correcting for a substantial correlation between explanation and classification performance.

2.
Alzheimers Res Ther ; 15(1): 84, 2023 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-37081528

RESUMO

INTRODUCTION: Although machine learning classifiers have been frequently used to detect Alzheimer's disease (AD) based on structural brain MRI data, potential bias with respect to sex and age has not yet been addressed. Here, we examine a state-of-the-art AD classifier for potential sex and age bias even in the case of balanced training data. METHODS: Based on an age- and sex-balanced cohort of 432 subjects (306 healthy controls, 126 subjects with AD) extracted from the ADNI data base, we trained a convolutional neural network to detect AD in MRI brain scans and performed ten different random training-validation-test splits to increase robustness of the results. Classifier decisions for single subjects were explained using layer-wise relevance propagation. RESULTS: The classifier performed significantly better for women (balanced accuracy [Formula: see text]) than for men ([Formula: see text]). No significant differences were found in clinical AD scores, ruling out a disparity in disease severity as a cause for the performance difference. Analysis of the explanations revealed a larger variance in regional brain areas for male subjects compared to female subjects. DISCUSSION: The identified sex differences cannot be attributed to an imbalanced training dataset and therefore point to the importance of examining and reporting classifier performance across population subgroups to increase transparency and algorithmic fairness. Collecting more data especially among underrepresented subgroups and balancing the dataset are important but do not always guarantee a fair outcome.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Masculino , Feminino , Doença de Alzheimer/diagnóstico por imagem , Disfunção Cognitiva/diagnóstico , Imageamento por Ressonância Magnética/métodos , Neuroimagem , Aprendizado de Máquina
3.
Mach Learn ; 111(5): 1903-1923, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35611184

RESUMO

Machine learning (ML) is increasingly often used to inform high-stakes decisions. As complex ML models (e.g., deep neural networks) are often considered black boxes, a wealth of procedures has been developed to shed light on their inner workings and the ways in which their predictions come about, defining the field of 'explainable AI' (XAI). Saliency methods rank input features according to some measure of 'importance'. Such methods are difficult to validate since a formal definition of feature importance is, thus far, lacking. It has been demonstrated that some saliency methods can highlight features that have no statistical association with the prediction target (suppressor variables). To avoid misinterpretations due to such behavior, we propose the actual presence of such an association as a necessary condition and objective preliminary definition for feature importance. We carefully crafted a ground-truth dataset in which all statistical dependencies are well-defined and linear, serving as a benchmark to study the problem of suppressor variables. We evaluate common explanation methods including LRP, DTD, PatternNet, PatternAttribution, LIME, Anchors, SHAP, and permutation-based methods with respect to our objective definition. We show that most of these methods are unable to distinguish important features from suppressors in this setting. Supplementary Information: The online version contains supplementary material available at 10.1007/s10994-022-06167-y.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...