Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
J Imaging Inform Med ; 2024 May 06.
Article in English | MEDLINE | ID: mdl-38710971

ABSTRACT

Saliency maps are popularly used to "explain" decisions made by modern machine learning models, including deep convolutional neural networks (DCNNs). While the resulting heatmaps purportedly indicate important image features, their "trustworthiness," i.e., utility and robustness, has not been evaluated for musculoskeletal imaging. The purpose of this study was to systematically evaluate the trustworthiness of saliency maps used in disease diagnosis on upper extremity X-ray images. The underlying DCNNs were trained using the Stanford MURA dataset. We studied four trustworthiness criteria-(1) localization accuracy of abnormalities, (2) repeatability, (3) reproducibility, and (4) sensitivity to underlying DCNN weights-across six different gradient-based saliency methods (Grad-CAM (GCAM), gradient explanation (GRAD), integrated gradients (IG), Smoothgrad (SG), smooth IG (SIG), and XRAI). Ground-truth was defined by the consensus of three fellowship-trained musculoskeletal radiologists who each placed bounding boxes around abnormalities on a holdout saliency test set. Compared to radiologists, all saliency methods showed inferior localization (AUPRCs: 0.438 (SG)-0.590 (XRAI); average radiologist AUPRC: 0.816), repeatability (IoUs: 0.427 (SG)-0.551 (IG); average radiologist IOU: 0.613), and reproducibility (IoUs: 0.250 (SG)-0.502 (XRAI); average radiologist IOU: 0.613) on abnormalities such as fractures, orthopedic hardware insertions, and arthritis. Five methods (GCAM, GRAD, IG, SG, XRAI) passed the sensitivity test. Ultimately, no saliency method met all four trustworthiness criteria; therefore, we recommend caution and rigorous evaluation of saliency maps prior to their clinical use.

2.
Nat Comput Sci ; 4(2): 110-118, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38374361

ABSTRACT

To automate the discovery of new scientific and engineering principles, artificial intelligence must distill explicit rules from experimental data. This has proven difficult because existing methods typically search through the enormous space of possible functions. Here we introduce deep distilling, a machine learning method that does not perform searches but instead learns from data using symbolic essence neural networks and then losslessly condenses the network parameters into a concise algorithm written in computer code. This distilled code, which can contain loops and nested logic, is equivalent to the neural network but is human-comprehensible and orders-of-magnitude more compact. On arithmetic, vision and optimization tasks, the distilled code is capable of out-of-distribution systematic generalization to solve cases orders-of-magnitude larger and more complex than the training data. The distilled algorithms can sometimes outperform human-designed algorithms, demonstrating that deep distilling is able to discover generalizable principles complementary to human expertise.

3.
J Diabetes Sci Technol ; 18(2): 302-308, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37798955

ABSTRACT

OBJECTIVE: In the pivotal clinical trial that led to Food and Drug Administration De Novo "approval" of the first fully autonomous artificial intelligence (AI) diabetic retinal disease diagnostic system, a reflexive dilation protocol was used. Using real-world deployment data before implementation of reflexive dilation, we identified factors associated with nondiagnostic results. These factors allow a novel predictive dilation workflow, where patients most likely to benefit from pharmacologic dilation are dilated a priori to maximize efficiency and patient satisfaction. METHODS: Retrospective review of patients who were assessed with autonomous AI at Johns Hopkins Medicine (8/2020 to 5/2021). We constructed a multivariable logistic regression model for nondiagnostic results to compare characteristics of patients with and without diagnostic results, using adjusted odds ratio (aOR). P < .05 was considered statistically significant. RESULTS: Of 241 patients (59% female; median age = 59), 123 (51%) had nondiagnostic results. In multivariable analysis, type 1 diabetes (T1D, aOR = 5.82, 95% confidence interval [CI]: 1.45-23.40, P = .01), smoking (aOR = 2.86, 95% CI: 1.36-5.99, P = .005), and age (every 10-year increase, aOR = 2.12, 95% CI: 1.62-2.77, P < .001) were associated with nondiagnostic results. Following feature elimination, a predictive model was created using T1D, smoking, age, race, sex, and hypertension as inputs. The model showed an area under the receiver-operator characteristics curve of 0.76 in five-fold cross-validation. CONCLUSIONS: We used factors associated with nondiagnostic results to design a novel, predictive dilation workflow, where patients most likely to benefit from pharmacologic dilation are dilated a priori. This new workflow has the potential to be more efficient than reflexive dilation, thus maximizing the number of at-risk patients receiving their diabetic retinal examinations.


Subject(s)
Delivery of Health Care, Integrated , Diabetes Mellitus, Type 1 , Diabetic Retinopathy , Female , Humans , Male , Middle Aged , Artificial Intelligence , Diabetic Retinopathy/diagnostic imaging , Dilatation , Risk Factors , United States , Workflow , Retrospective Studies , Clinical Trials as Topic
4.
Radiol Artif Intell ; 4(6): e220012, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36523640

ABSTRACT

Purpose: To compare performance, sample efficiency, and hidden stratification of visual transformer (ViT) and convolutional neural network (CNN) architectures for diagnosis of disease on chest radiographs and extremity radiographs using transfer learning. Materials and Methods: In this HIPAA-compliant retrospective study, the authors fine-tuned data-efficient image transformers (DeiT) ViT and CNN classification models pretrained on ImageNet using the National Institutes of Health Chest X-ray 14 dataset (112 120 images) and MURA dataset (14 656 images) for thoracic disease and extremity abnormalities, respectively. Performance was assessed on internal test sets and 75 000 external chest radiographs (three datasets). The primary comparison was DeiT-B ViT vs DenseNet121 CNN; secondary comparisons included DeiT-Ti (Tiny), ResNet152, and EfficientNetB7. Sample efficiency was evaluated by training models on varying dataset sizes. Hidden stratification was evaluated by comparing prevalence of chest tubes in pneumothorax false-positive and false-negative predictions and specific abnormalities for MURA false-negative predictions. Results: DeiT-B weighted area under the receiver operating characteristic curve (wAUC) was slightly lower than that for DenseNet121 on chest radiograph (0.78 vs 0.79; P < .001) and extremity (0.887 vs 0.893; P < .001) internal test sets and chest radiograph external test sets (P < .001 for each). DeiT-B and DeiT-Ti both performed slightly worse than all CNNs for chest radiograph and extremity tasks. DeiT-B and DenseNet121 showed similar sample efficiency. DeiT-B had lower chest tube prevalence in false-positive predictions than DenseNet121 (43.1% [324 of 5088] vs 47.9% [2290 of 4782]). Conclusion: Although DeiT models had lower wAUCs than CNNs for chest radiograph and extremity domains, the differences may be negligible in clinical practice. DeiT-B had sample efficiency similar to that of DenseNet121 and may be less susceptible to certain types of hidden stratification.Keywords: Computer-aided Diagnosis, Informatics, Neural Networks, Thorax, Skeletal-Appendicular, Convolutional Neural Network (CNN), Feature Detection, Supervised Learning, Machine Learning, Deep Learning Supplemental material is available for this article. © RSNA, 2022.

5.
Radiol Artif Intell ; 4(5): e220081, 2022 Sep.
Article in English | MEDLINE | ID: mdl-36204536

ABSTRACT

Purpose: To evaluate code and data sharing practices in original artificial intelligence (AI) scientific manuscripts published in the Radiological Society of North America (RSNA) journals suite from 2017 through 2021. Materials and Methods: A retrospective meta-research study was conducted of articles published in the RSNA journals suite from January 1, 2017, through December 31, 2021. A total of 218 articles were included and evaluated for code sharing practices, reproducibility of shared code, and data sharing practices. Categorical comparisons were conducted using Fisher exact tests with respect to year and journal of publication, author affiliation(s), and type of algorithm used. Results: Of the 218 included articles, 73 (34%) shared code, with 24 (33% of code sharing articles and 11% of all articles) sharing reproducible code. Radiology and Radiology: Artificial Intelligence published the most code sharing articles (48 [66%] and 21 [29%], respectively). Twenty-nine articles (13%) shared data, and 12 of these articles (41% of data sharing articles) shared complete experimental data by using only public domain datasets. Four of the 218 articles (2%) shared both code and complete experimental data. Code sharing rates were statistically higher in 2020 and 2021 compared with earlier years (P < .01) and were higher in Radiology and Radiology: Artificial Intelligence compared with other journals (P < .01). Conclusion: Original AI scientific articles in the RSNA journals suite had low rates of code and data sharing, emphasizing the need for open-source code and data to achieve transparent and reproducible science.Keywords: Meta-Analysis, AI in Education, Machine LearningSupplemental material is available for this article.© RSNA, 2022.

SELECTION OF CITATIONS
SEARCH DETAIL
...