Pesquisa | Portal Regional da BVS

1.

Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study.

Saha, Anindo; Bosma, Joeran S; Twilt, Jasper J; van Ginneken, Bram; Bjartell, Anders; Padhani, Anwar R; Bonekamp, David; Villeirs, Geert; Salomon, Georg; Giannarini, Gianluca; Kalpathy-Cramer, Jayashree; Barentsz, Jelle; Maier-Hein, Klaus H; Rusu, Mirabela; Rouvière, Olivier; van den Bergh, Roderick; Panebianco, Valeria; Kasivisvanathan, Veeru; Obuchowski, Nancy A; Yakar, Derya; Elschot, Mattijs; Veltman, Jeroen; Fütterer, Jurgen J; de Rooij, Maarten; Huisman, Henkjan.

Lancet Oncol ; 25(7): 879-887, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38876123

RESUMO

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.

Assuntos

Inteligência Artificial , Imageamento por Ressonância Magnética , Neoplasias da Próstata , Radiologistas , Humanos , Masculino , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Idoso , Estudos Retrospectivos , Pessoa de Meia-Idade , Gradação de Tumores , Países Baixos , Curva ROC

2.

Using deep learning to optimize the prostate MRI protocol by assessing the diagnostic efficacy of MRI sequences.

Fransen, Stefan J; Roest, Christian; Van Lohuizen, Quintin Y; Bosma, Joeran S; Simonis, Frank F J; Kwee, Thomas C; Yakar, Derya; Huisman, Henkjan.

Eur J Radiol ; 175: 111470, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38640822

RESUMO

PURPOSE: To explore diagnostic deep learning for optimizing the prostate MRI protocol by assessing the diagnostic efficacy of MRI sequences. METHOD: This retrospective study included 840 patients with a biparametric prostate MRI scan. The MRI protocol included a T2-weighted image, three DWI sequences (b50, b400, and b800 s/mm2), a calculated ADC map, and a calculated b1400 sequence. Two accelerated MRI protocols were simulated, using only two acquired b-values to calculate the ADC and b1400. Deep learning models were trained to detect prostate cancer lesions on accelerated and full protocols. The diagnostic performances of the protocols were compared on the patient-level with the area under the receiver operating characteristic (AUROC), using DeLong's test, and on the lesion-level with the partial area under the free response operating characteristic (pAUFROC), using a permutation test. Validation of the results was performed among expert radiologists. RESULTS: No significant differences in diagnostic performance were found between the accelerated protocols and the full bpMRI baseline. Omitting b800 reduced 53% DWI scan time, with a performance difference of + 0.01 AUROC (p = 0.20) and -0.03 pAUFROC (p = 0.45). Omitting b400 reduced 32% DWI scan time, with a performance difference of -0.01 AUROC (p = 0.65) and + 0.01 pAUFROC (p = 0.73). Multiple expert radiologists underlined the findings. CONCLUSIONS: This study shows that deep learning can assess the diagnostic efficacy of MRI sequences by comparing prostate MRI protocols on diagnostic accuracy. Omitting either the b400 or the b800 DWI sequence can optimize the prostate MRI protocol by reducing scan time without compromising diagnostic quality.

Assuntos

Aprendizado Profundo , Imageamento por Ressonância Magnética , Neoplasias da Próstata , Humanos , Masculino , Neoplasias da Próstata/diagnóstico por imagem , Estudos Retrospectivos , Imageamento por Ressonância Magnética/métodos , Pessoa de Meia-Idade , Idoso , Interpretação de Imagem Assistida por Computador/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

3.

Complexities of deep learning-based undersampled MR image reconstruction.

Noordman, Constant Richard; Yakar, Derya; Bosma, Joeran; Simonis, Frank Frederikus Jacobus; Huisman, Henkjan.

Eur Radiol Exp ; 7(1): 58, 2023 10 04.

Artigo em Inglês | MEDLINE | ID: mdl-37789241

RESUMO

Artificial intelligence has opened a new path of innovation in magnetic resonance (MR) image reconstruction of undersampled k-space acquisitions. This review offers readers an analysis of the current deep learning-based MR image reconstruction methods. The literature in this field shows exponential growth, both in volume and complexity, as the capabilities of machine learning in solving inverse problems such as image reconstruction are explored. We review the latest developments, aiming to assist researchers and radiologists who are developing new methods or seeking to provide valuable feedback. We shed light on key concepts by exploring the technical intricacies of MR image reconstruction, highlighting the importance of raw datasets and the difficulty of evaluating diagnostic value using standard metrics.Relevance statement Increasingly complex algorithms output reconstructed images that are difficult to assess for robustness and diagnostic quality, necessitating high-quality datasets and collaboration with radiologists.Key pointsâ¢ Deep learning-based image reconstruction algorithms are increasing both in complexity and performance.â¢ The evaluation of reconstructed images may mistake perceived image quality for diagnostic value.â¢ Collaboration with radiologists is crucial for advancing deep learning technology.

Assuntos

Inteligência Artificial , Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Algoritmos

4.

Erratum for: Prediction Variability to Identify Reduced AI Performance in Cancer Diagnosis at MRI and CT.

Alves, Natália; Bosma, Joeran S; Venkadesh, Kiran V; Jacobs, Colin; Saghir, Zaigham; de Rooij, Maarten; Hermans, John; Huisma, Henkjan.

Radiology ; 309(1): e239023, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37906017

5.

Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI.

Bosma, Joeran S; Saha, Anindo; Hosseinzadeh, Matin; Slootweg, Ivan; de Rooij, Maarten; Huisman, Henkjan.

Radiol Artif Intell ; 5(5): e230031, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37795142

RESUMO

Purpose: To evaluate a novel method of semisupervised learning (SSL) guided by automated sparse information from diagnostic reports to leverage additional data for deep learning-based malignancy detection in patients with clinically significant prostate cancer. Materials and Methods: This retrospective study included 7756 prostate MRI examinations (6380 patients) performed between January 2014 and December 2020 for model development. An SSL method, report-guided SSL (RG-SSL), was developed for detection of clinically significant prostate cancer using biparametric MRI. RG-SSL, supervised learning (SL), and state-of-the-art SSL methods were trained using 100, 300, 1000, or 3050 manually annotated examinations. Performance on detection of clinically significant prostate cancer by RG-SSL, SL, and SSL was compared on 300 unseen examinations from an external center with a histopathologically confirmed reference standard. Performance was evaluated using receiver operating characteristic (ROC) and free-response ROC analysis. P values for performance differences were generated with a permutation test. Results: At 100 manually annotated examinations, mean examination-based diagnostic area under the ROC curve (AUC) values for RG-SSL, SL, and the best SSL were 0.86 ± 0.01 (SD), 0.78 ± 0.03, and 0.81 ± 0.02, respectively. Lesion-based detection partial AUCs were 0.62 ± 0.02, 0.44 ± 0.04, and 0.48 ± 0.09, respectively. Examination-based performance of SL with 3050 examinations was matched by RG-SSL with 169 manually annotated examinations, thus requiring 14 times fewer annotations. Lesion-based performance was matched with 431 manually annotated examinations, requiring six times fewer annotations. Conclusion: RG-SSL outperformed SSL in clinically significant prostate cancer detection and achieved performance similar to SL even at very low annotation budgets.Keywords: Annotation Efficiency, Computer-aided Detection and Diagnosis, MRI, Prostate Cancer, Semisupervised Deep Learning Supplemental material is available for this article. Published under a CC BY 4.0 license.

6.

Prediction Variability to Identify Reduced AI Performance in Cancer Diagnosis at MRI and CT.

Alves, Natália; Bosma, Joeran S; Venkadesh, Kiran V; Jacobs, Colin; Saghir, Zaigham; de Rooij, Maarten; Hermans, John; Huisman, Henkjan.

Radiology ; 308(3): e230275, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37724961

RESUMO

Background A priori identification of patients at risk of artificial intelligence (AI) failure in diagnosing cancer would contribute to the safer clinical integration of diagnostic algorithms. Purpose To evaluate AI prediction variability as an uncertainty quantification (UQ) metric for identifying cases at risk of AI failure in diagnosing cancer at MRI and CT across different cancer types, data sets, and algorithms. Materials and Methods Multicenter data sets and publicly available AI algorithms from three previous studies that evaluated detection of pancreatic cancer on contrast-enhanced CT images, detection of prostate cancer on MRI scans, and prediction of pulmonary nodule malignancy on low-dose CT images were analyzed retrospectively. Each task's algorithm was extended to generate an uncertainty score based on ensemble prediction variability. AI accuracy percentage and partial area under the receiver operating characteristic curve (pAUC) were compared between certain and uncertain patient groups in a range of percentile thresholds (10%-90%) for the uncertainty score using permutation tests for statistical significance. The pulmonary nodule malignancy prediction algorithm was compared with 11 clinical readers for the certain group (CG) and uncertain group (UG). Results In total, 18 022 images were used for training and 838 images were used for testing. AI diagnostic accuracy was higher for the cases in the CG across all tasks (P < .001). At an 80% threshold of certain predictions, accuracy in the CG was 21%-29% higher than in the UG and 4%-6% higher than in the overall test data sets. The lesion-level pAUC in the CG was 0.25-0.39 higher than in the UG and 0.05-0.08 higher than in the overall test data sets (P < .001). For pulmonary nodule malignancy prediction, accuracy of AI was on par with clinicians for cases in the CG (AI results vs clinician results, 80% [95% CI: 76, 85] vs 78% [95% CI: 70, 87]; P = .07) but worse for cases in the UG (AI results vs clinician results, 50% [95% CI: 37, 64] vs 68% [95% CI: 60, 76]; P < .001). Conclusion An AI-prediction UQ metric consistently identified reduced performance of AI in cancer diagnosis. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Babyn in this issue.

Assuntos

Neoplasias Pulmonares , Transtornos Mentais , Masculino , Humanos , Inteligência Artificial , Estudos Retrospectivos , Imageamento por Ressonância Magnética , Neoplasias Pulmonares/diagnóstico por imagem , Tomografia Computadorizada por Raios X

7.

Fully Automatic Deep Learning Framework for Pancreatic Ductal Adenocarcinoma Detection on Computed Tomography.

Alves, Natália; Schuurmans, Megan; Litjens, Geke; Bosma, Joeran S; Hermans, John; Huisman, Henkjan.

Cancers (Basel) ; 14(2)2022 Jan 13.

Artigo em Inglês | MEDLINE | ID: mdl-35053538

RESUMO

Early detection improves prognosis in pancreatic ductal adenocarcinoma (PDAC), but is challenging as lesions are often small and poorly defined on contrast-enhanced computed tomography scans (CE-CT). Deep learning can facilitate PDAC diagnosis; however, current models still fail to identify small (<2 cm) lesions. In this study, state-of-the-art deep learning models were used to develop an automatic framework for PDAC detection, focusing on small lesions. Additionally, the impact of integrating the surrounding anatomy was investigated. CE-CT scans from a cohort of 119 pathology-proven PDAC patients and a cohort of 123 patients without PDAC were used to train a nnUnet for automatic lesion detection and segmentation (nnUnet_T). Two additional nnUnets were trained to investigate the impact of anatomy integration: (1) segmenting the pancreas and tumor (nnUnet_TP), and (2) segmenting the pancreas, tumor, and multiple surrounding anatomical structures (nnUnet_MS). An external, publicly available test set was used to compare the performance of the three networks. The nnUnet_MS achieved the best performance, with an area under the receiver operating characteristic curve of 0.91 for the whole test set and 0.88 for tumors <2 cm, showing that state-of-the-art deep learning can detect small PDAC and benefits from anatomy information.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA