Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 72
Filter
1.
Lancet Oncol ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38876123

ABSTRACT

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.

2.
Med Image Anal ; 95: 103206, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38776844

ABSTRACT

The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical Schools' Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.


Subject(s)
Algorithms , Breast Density , Breast Neoplasms , Mammography , Humans , Female , Mammography/methods , Breast Neoplasms/diagnostic imaging , Machine Learning
3.
Lancet Oncol ; 25(3): 400-410, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38423052

ABSTRACT

BACKGROUND: The extended acquisition times required for MRI limit its availability in resource-constrained settings. Consequently, accelerating MRI by undersampling k-space data, which is necessary to reconstruct an image, has been a long-standing but important challenge. We aimed to develop a deep convolutional neural network (dCNN) optimisation method for MRI reconstruction and to reduce scan times and evaluate its effect on image quality and accuracy of oncological imaging biomarkers. METHODS: In this multicentre, retrospective, cohort study, MRI data from patients with glioblastoma treated at Heidelberg University Hospital (775 patients and 775 examinations) and from the phase 2 CORE trial (260 patients, 1083 examinations, and 58 institutions) and the phase 3 CENTRIC trial (505 patients, 3147 examinations, and 139 institutions) were used to develop, train, and test dCNN for reconstructing MRI from highly undersampled single-coil k-space data with various acceleration rates (R=2, 4, 6, 8, 10, and 15). Independent testing was performed with MRIs from the phase 2/3 EORTC-26101 trial (528 patients with glioblastoma, 1974 examinations, and 32 institutions). The similarity between undersampled dCNN-reconstructed and original MRIs was quantified with various image quality metrics, including structural similarity index measure (SSIM) and the accuracy of undersampled dCNN-reconstructed MRI on downstream radiological assessment of imaging biomarkers in oncology (automated artificial intelligence-based quantification of tumour burden and treatment response) was performed in the EORTC-26101 test dataset. The public NYU Langone Health fastMRI brain test dataset (558 patients and 558 examinations) was used to validate the generalisability and robustness of the dCNN for reconstructing MRIs from available multi-coil (parallel imaging) k-space data. FINDINGS: In the EORTC-26101 test dataset, the median SSIM of undersampled dCNN-reconstructed MRI ranged from 0·88 to 0·99 across different acceleration rates, with 0·92 (95% CI 0·92-0·93) for 10-times acceleration (R=10). The 10-times undersampled dCNN-reconstructed MRI yielded excellent agreement with original MRI when assessing volumes of contrast-enhancing tumour (median DICE for spatial agreement of 0·89 [95% CI 0·88 to 0·89]; median volume difference of 0·01 cm3 [95% CI 0·00 to 0·03] equalling 0·21%; p=0·0036 for equivalence) or non-enhancing tumour or oedema (median DICE of 0·94 [95% CI 0·94 to 0·95]; median volume difference of -0·79 cm3 [95% CI -0·87 to -0·72] equalling -1·77%; p=0·023 for equivalence) in the EORTC-26101 test dataset. Automated volumetric tumour response assessment in the EORTC-26101 test dataset yielded an identical median time to progression of 4·27 months (95% CI 4·14 to 4·57) when using 10-times-undersampled dCNN-reconstructed or original MRI (log-rank p=0·80) and agreement in the time to progression in 374 (95·2%) of 393 patients with data. The dCNN generalised well to the fastMRI brain dataset, with significant improvements in the median SSIM when using multi-coil compared with single-coil k-space data (p<0·0001). INTERPRETATION: Deep-learning-based reconstruction of undersampled MRI allows for a substantial reduction of scan times, with a 10-times acceleration demonstrating excellent image quality while preserving the accuracy of derived imaging biomarkers for the assessment of oncological treatment response. Our developments are available as open source software and hold considerable promise for increasing the accessibility to MRI, pending further prospective validation. FUNDING: Deutsche Forschungsgemeinschaft (German Research Foundation) and an Else Kröner Clinician Scientist Endowed Professorship by the Else Kröner Fresenius Foundation.


Subject(s)
Deep Learning , Glioblastoma , Humans , Artificial Intelligence , Biomarkers , Cohort Studies , Glioblastoma/diagnostic imaging , Magnetic Resonance Imaging , Retrospective Studies
4.
J Magn Reson Imaging ; 59(4): 1409-1422, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37504495

ABSTRACT

BACKGROUND: Weakly supervised learning promises reduced annotation effort while maintaining performance. PURPOSE: To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC). STUDY TYPE: Retrospective. SUBJECTS: One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695). FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging. ASSESSMENT: Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions. STATISTICAL TESTS: Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05. RESULTS: Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70). DATA CONCLUSION: Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Subject(s)
Deep Learning , Prostatic Neoplasms , Male , Humans , Magnetic Resonance Imaging/methods , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Retrospective Studies , Polyesters
5.
Schizophr Res ; 263: 160-168, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37236889

ABSTRACT

The number of magnetic resonance imaging (MRI) studies on neuronal correlates of catatonia has dramatically increased in the last 10 years, but conclusive findings on white matter (WM) tracts alterations underlying catatonic symptoms are still lacking. Therefore, we conduct an interdisciplinary longitudinal MRI study (whiteCAT) with two main objectives: First, we aim to enroll 100 psychiatric patients with and 50 psychiatric patients without catatonia according to ICD-11 who will undergo a deep phenotyping approach with an extensive battery of demographic, psychopathological, psychometric, neuropsychological, instrumental and diffusion MRI assessments at baseline and 12 weeks follow-up. So far, 28 catatonia patients and 40 patients with schizophrenia or other primary psychotic disorders or mood disorders without catatonia have been studied cross-sectionally. 49 out of 68 patients have completed longitudinal assessment, so far. Second, we seek to develop and implement a new method for semi-automatic fiber tract delineation using active learning. By training supportive machine learning algorithms on the fly that are custom tailored to the respective analysis pipeline used to obtain the tractogram as well as the WM tract of interest, we plan to streamline and speed up this tedious and error-prone task while at the same time increasing reproducibility and robustness of the extraction process. The goal is to develop robust neuroimaging biomarkers of symptom severity and therapy outcome based on WM tracts underlying catatonia. If our MRI study is successful, it will be the largest longitudinal study to date that has investigated WM tracts in catatonia patients.


Subject(s)
Catatonia , White Matter , Humans , Catatonia/diagnosis , White Matter/diagnostic imaging , White Matter/pathology , Longitudinal Studies , Reproducibility of Results , Biomarkers
6.
Sci Rep ; 13(1): 19805, 2023 11 13.
Article in English | MEDLINE | ID: mdl-37957250

ABSTRACT

Prostate cancer (PCa) diagnosis on multi-parametric magnetic resonance images (MRI) requires radiologists with a high level of expertise. Misalignments between the MRI sequences can be caused by patient movement, elastic soft-tissue deformations, and imaging artifacts. They further increase the complexity of the task prompting radiologists to interpret the images. Recently, computer-aided diagnosis (CAD) tools have demonstrated potential for PCa diagnosis typically relying on complex co-registration of the input modalities. However, there is no consensus among research groups on whether CAD systems profit from using registration. Furthermore, alternative strategies to handle multi-modal misalignments have not been explored so far. Our study introduces and compares different strategies to cope with image misalignments and evaluates them regarding to their direct effect on diagnostic accuracy of PCa. In addition to established registration algorithms, we propose 'misalignment augmentation' as a concept to increase CAD robustness. As the results demonstrate, misalignment augmentations can not only compensate for a complete lack of registration, but if used in conjunction with registration, also improve the overall performance on an independent test set.


Subject(s)
Prostate , Prostatic Neoplasms , Male , Humans , Prostate/diagnostic imaging , Prostate/pathology , Magnetic Resonance Imaging/methods , Diagnosis, Computer-Assisted/methods , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Computers
7.
Eur Radiol ; 33(11): 7463-7476, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37507610

ABSTRACT

OBJECTIVES: To evaluate a fully automatic deep learning system to detect and segment clinically significant prostate cancer (csPCa) on same-vendor prostate MRI from two different institutions not contributing to training of the system. MATERIALS AND METHODS: In this retrospective study, a previously bi-institutionally validated deep learning system (UNETM) was applied to bi-parametric prostate MRI data from one external institution (A), a PI-RADS distribution-matched internal cohort (B), and a csPCa stratified subset of single-institution external public challenge data (C). csPCa was defined as ISUP Grade Group ≥ 2 determined from combined targeted and extended systematic MRI/transrectal US-fusion biopsy. Performance of UNETM was evaluated by comparing ROC AUC and specificity at typical PI-RADS sensitivity levels. Lesion-level analysis between UNETM segmentations and radiologist-delineated segmentations was performed using Dice coefficient, free-response operating characteristic (FROC), and weighted alternative (waFROC). The influence of using different diffusion sequences was analyzed in cohort A. RESULTS: In 250/250/140 exams in cohorts A/B/C, differences in ROC AUC were insignificant with 0.80 (95% CI: 0.74-0.85)/0.87 (95% CI: 0.83-0.92)/0.82 (95% CI: 0.75-0.89). At sensitivities of 95% and 90%, UNETM achieved specificity of 30%/50% in A, 44%/71% in B, and 43%/49% in C, respectively. Dice coefficient of UNETM and radiologist-delineated lesions was 0.36 in A and 0.49 in B. The waFROC AUC was 0.67 (95% CI: 0.60-0.83) in A and 0.7 (95% CI: 0.64-0.78) in B. UNETM performed marginally better on readout-segmented than on single-shot echo-planar-imaging. CONCLUSION: For same-vendor examinations, deep learning provided comparable discrimination of csPCa and non-csPCa lesions and examinations between local and two independent external data sets, demonstrating the applicability of the system to institutions not participating in model training. CLINICAL RELEVANCE STATEMENT: A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets, indicating the potential of deploying AI models without retraining or fine-tuning, and corroborating evidence that AI models extract a substantial amount of transferable domain knowledge about MRI-based prostate cancer assessment. KEY POINTS: • A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets. • Lesion detection performance and segmentation congruence was similar on the institutional and an external data set, as measured by the weighted alternative FROC AUC and Dice coefficient. • Although the system generalized to two external institutions without re-training, achieving expected sensitivity and specificity levels using the deep learning system requires probability thresholds to be adjusted, underlining the importance of institution-specific calibration and quality control.


Subject(s)
Deep Learning , Prostatic Neoplasms , Male , Humans , Magnetic Resonance Imaging , Prostate/diagnostic imaging , Prostate/pathology , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Retrospective Studies
8.
Nat Methods ; 20(7): 1010-1020, 2023 07.
Article in English | MEDLINE | ID: mdl-37202537

ABSTRACT

The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a significant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions for both developers and users of traditional and machine learning-based cell segmentation and tracking algorithms.


Subject(s)
Benchmarking , Cell Tracking , Cell Tracking/methods , Machine Learning , Algorithms
9.
Sci Adv ; 9(19): eadd0433, 2023 05 12.
Article in English | MEDLINE | ID: mdl-37172093

ABSTRACT

This research addresses the assessment of adipose tissue (AT) and spatial distribution of visceral (VAT) and subcutaneous fat (SAT) in the trunk from standardized magnetic resonance imaging at 3 T, thereby demonstrating the feasibility of deep learning (DL)-based image segmentation in a large population-based cohort in Germany (five sites). Volume and distribution of AT play an essential role in the pathogenesis of insulin resistance, a risk factor of developing metabolic/cardiovascular diseases. Cross-validated training of the DL-segmentation model led to a mean Dice similarity coefficient of >0.94, corresponding to a mean absolute volume deviation of about 22 ml. SAT is significantly increased in women compared to men, whereas VAT is increased in males. Spatial distribution shows age- and body mass index-related displacements. DL-based image segmentation provides robust and fast quantification of AT (≈15 s per dataset versus 3 to 4 hours for manual processing) and assessment of its spatial distribution from magnetic resonance images in large cohort studies.


Subject(s)
Adipose Tissue , Insulin Resistance , Male , Humans , Female , Adipose Tissue/diagnostic imaging , Risk Factors , Cohort Studies , Magnetic Resonance Imaging/methods
10.
Neurooncol Adv ; 4(1): vdac138, 2022.
Article in English | MEDLINE | ID: mdl-36105388

ABSTRACT

Background: Reliable detection and precise volumetric quantification of brain metastases (BM) on MRI are essential for guiding treatment decisions. Here we evaluate the potential of artificial neural networks (ANN) for automated detection and quantification of BM. Methods: A consecutive series of 308 patients with BM was used for developing an ANN (with a 4:1 split for training/testing) for automated volumetric assessment of contrast-enhancing tumors (CE) and non-enhancing FLAIR signal abnormality including edema (NEE). An independent consecutive series of 30 patients was used for external testing. Performance was assessed case-wise for CE and NEE and lesion-wise for CE using the case-wise/lesion-wise DICE-coefficient (C/L-DICE), positive predictive value (L-PPV) and sensitivity (C/L-Sensitivity). Results: The performance of detecting CE lesions on the validation dataset was not significantly affected when evaluating different volumetric thresholds (0.001-0.2 cm3; P = .2028). The median L-DICE and median C-DICE for CE lesions were 0.78 (IQR = 0.6-0.91) and 0.90 (IQR = 0.85-0.94) in the institutional as well as 0.79 (IQR = 0.67-0.82) and 0.84 (IQR = 0.76-0.89) in the external test dataset. The corresponding median L-Sensitivity and median L-PPV were 0.81 (IQR = 0.63-0.92) and 0.79 (IQR = 0.63-0.93) in the institutional test dataset, as compared to 0.85 (IQR = 0.76-0.94) and 0.76 (IQR = 0.68-0.88) in the external test dataset. The median C-DICE for NEE was 0.96 (IQR = 0.92-0.97) in the institutional test dataset as compared to 0.85 (IQR = 0.72-0.91) in the external test dataset. Conclusion: The developed ANN-based algorithm (publicly available at www.github.com/NeuroAI-HD/HD-BM) allows reliable detection and precise volumetric quantification of CE and NEE compartments in patients with BM.

11.
Med Image Anal ; 82: 102605, 2022 11.
Article in English | MEDLINE | ID: mdl-36156419

ABSTRACT

Artificial intelligence (AI) methods for the automatic detection and quantification of COVID-19 lesions in chest computed tomography (CT) might play an important role in the monitoring and management of the disease. We organized an international challenge and competition for the development and comparison of AI algorithms for this task, which we supported with public data and state-of-the-art benchmark methods. Board Certified Radiologists annotated 295 public images from two sources (A and B) for algorithms training (n=199, source A), validation (n=50, source A) and testing (n=23, source A; n=23, source B). There were 1,096 registered teams of which 225 and 98 completed the validation and testing phases, respectively. The challenge showed that AI models could be rapidly designed by diverse teams with the potential to measure disease or facilitate timely and patient-specific interventions. This paper provides an overview and the major outcomes of the COVID-19 Lung CT Lesion Segmentation Challenge - 2020.


Subject(s)
COVID-19 , Pandemics , Humans , COVID-19/diagnostic imaging , Artificial Intelligence , Tomography, X-Ray Computed/methods , Lung/diagnostic imaging
12.
Biomed Opt Express ; 13(3): 1224-1242, 2022 Mar 01.
Article in English | MEDLINE | ID: mdl-35414995

ABSTRACT

Multispectral imaging provides valuable information on tissue composition such as hemoglobin oxygen saturation. However, the real-time application of this technique in interventional medicine can be challenging due to the long acquisition times needed for large amounts of hyperspectral data with hundreds of bands. While this challenge can partially be addressed by choosing a discriminative subset of bands, the band selection methods proposed to date are mainly restricted by the availability of often hard to obtain reference measurements. We address this bottleneck with a new approach to band selection that leverages highly accurate Monte Carlo (MC) simulations. We hypothesize that a so chosen small subset of bands can reproduce or even improve upon the results of a quasi continuous spectral measurement. We further investigate whether novel domain adaptation techniques can address the inevitable domain shift stemming from the use of simulations. Initial results based on in silico and in vivo experiments suggest that 10-20 bands are sufficient to closely reproduce results from spectral measurements with 101 bands in the 500-700 nm range. The investigated domain adaptation technique, which only requires unlabeled in vivo measurements, yielded better results than the pure in silico band selection method. Overall, our method could guide development of fast multispectral imaging systems suited for interventional use without relying on complex hardware setups or manually labeled data.

13.
Lancet Digit Health ; 3(12): e784-e794, 2021 12.
Article in English | MEDLINE | ID: mdl-34688602

ABSTRACT

BACKGROUND: Gadolinium-based contrast agents (GBCAs) are widely used to enhance tissue contrast during MRI scans and play a crucial role in the management of patients with cancer. However, studies have shown gadolinium deposition in the brain after repeated GBCA administration with yet unknown clinical significance. We aimed to assess the feasibility and diagnostic value of synthetic post-contrast T1-weighted MRI generated from pre-contrast MRI sequences through deep convolutional neural networks (dCNN) for tumour response assessment in neuro-oncology. METHODS: In this multicentre, retrospective cohort study, we used MRI examinations to train and validate a dCNN for synthesising post-contrast T1-weighted sequences from pre-contrast T1-weighted, T2-weighted, and fluid-attenuated inversion recovery sequences. We used MRI scans with availability of these sequences from 775 patients with glioblastoma treated at Heidelberg University Hospital, Heidelberg, Germany (775 MRI examinations); 260 patients who participated in the phase 2 CORE trial (1083 MRI examinations, 59 institutions); and 505 patients who participated in the phase 3 CENTRIC trial (3147 MRI examinations, 149 institutions). Separate training runs to rank the importance of individual sequences and (for a subset) diffusion-weighted imaging were conducted. Independent testing was performed on MRI data from the phase 2 and phase 3 EORTC-26101 trial (521 patients, 1924 MRI examinations, 32 institutions). The similarity between synthetic and true contrast enhancement on post-contrast T1-weighted MRI was quantified using the structural similarity index measure (SSIM). Automated tumour segmentation and volumetric tumour response assessment based on synthetic versus true post-contrast T1-weighted sequences was performed in the EORTC-26101 trial and agreement was assessed with Kaplan-Meier plots. FINDINGS: The median SSIM score for predicting contrast enhancement on synthetic post-contrast T1-weighted sequences in the EORTC-26101 test set was 0·818 (95% CI 0·817-0·820). Segmentation of the contrast-enhancing tumour from synthetic post-contrast T1-weighted sequences yielded a median tumour volume of 6·31 cm3 (5·60 to 7·14), thereby underestimating the true tumour volume by a median of -0·48 cm3 (-0·37 to -0·76) with the concordance correlation coefficient suggesting a strong linear association between tumour volumes derived from synthetic versus true post-contrast T1-weighted sequences (0·782, 0·751-0·807, p<0·0001). Volumetric tumour response assessment in the EORTC-26101 trial showed a median time to progression of 4·2 months (95% CI 4·1-5·2) with synthetic post-contrast T1-weighted and 4·3 months (4·1-5·5) with true post-contrast T1-weighted sequences (p=0·33). The strength of the association between the time to progression as a surrogate endpoint for predicting the patients' overall survival in the EORTC-26101 cohort was similar when derived from synthetic post-contrast T1-weighted sequences (hazard ratio of 1·749, 95% CI 1·282-2·387, p=0·0004) and model C-index (0·667, 0·622-0·708) versus true post-contrast T1-weighted MRI (1·799, 95% CI 1·314-2·464, p=0·0003) and model C-index (0·673, 95% CI 0·626-0·711). INTERPRETATION: Generating synthetic post-contrast T1-weighted MRI from pre-contrast MRI using dCNN is feasible and quantification of the contrast-enhancing tumour burden from synthetic post-contrast T1-weighted MRI allows assessment of the patient's response to treatment with no significant difference by comparison with true post-contrast T1-weighted sequences with administration of GBCAs. This finding could guide the application of dCNN in radiology to potentially reduce the necessity of GBCA administration. FUNDING: Deutsche Forschungsgemeinschaft.


Subject(s)
Brain Neoplasms/diagnosis , Brain/pathology , Contrast Media/administration & dosage , Deep Learning , Gadolinium/administration & dosage , Magnetic Resonance Imaging/methods , Neural Networks, Computer , Algorithms , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/pathology , Diffusion Magnetic Resonance Imaging , Disease Progression , Feasibility Studies , Germany , Glioblastoma/diagnosis , Glioblastoma/diagnostic imaging , Humans , Middle Aged , Neoplasms , Prognosis , Radiology/methods , Retrospective Studies , Tumor Burden
14.
Res Sq ; 2021 Jun 04.
Article in English | MEDLINE | ID: mdl-34100010

ABSTRACT

Artificial intelligence (AI) methods for the automatic detection and quantification of COVID-19 lesions in chest computed tomography (CT) might play an important role in the monitoring and management of the disease. We organized an international challenge and competition for the development and comparison of AI algorithms for this task, which we supported with public data and state-of-the-art benchmark methods. Board Certified Radiologists annotated 295 public images from two sources (A and B) for algorithms training (n=199, source A), validation (n=50, source A) and testing (n=23, source A; n=23, source B). There were 1,096 registered teams of which 225 and 98 completed the validation and testing phases, respectively. The challenge showed that AI models could be rapidly designed by diverse teams with the potential to measure disease or facilitate timely and patient-specific interventions. This paper provides an overview and the major outcomes of the COVID-19 Lung CT Lesion Segmentation Challenge - 2020.

15.
Neuroimage ; 238: 118216, 2021 09.
Article in English | MEDLINE | ID: mdl-34052465

ABSTRACT

Accurate detection and quantification of unruptured intracranial aneurysms (UIAs) is important for rupture risk assessment and to allow an informed treatment decision to be made. Currently, 2D manual measures used to assess UIAs on Time-of-Flight magnetic resonance angiographies (TOF-MRAs) lack 3D information and there is substantial inter-observer variability for both aneurysm detection and assessment of aneurysm size and growth. 3D measures could be helpful to improve aneurysm detection and quantification but are time-consuming and would therefore benefit from a reliable automatic UIA detection and segmentation method. The Aneurysm Detection and segMentation (ADAM) challenge was organised in which methods for automatic UIA detection and segmentation were developed and submitted to be evaluated on a diverse clinical TOF-MRA dataset. A training set (113 cases with a total of 129 UIAs) was released, each case including a TOF-MRA, a structural MR image (T1, T2 or FLAIR), annotation of any present UIA(s) and the centre voxel of the UIA(s). A test set of 141 cases (with 153 UIAs) was used for evaluation. Two tasks were proposed: (1) detection and (2) segmentation of UIAs on TOF-MRAs. Teams developed and submitted containerised methods to be evaluated on the test set. Task 1 was evaluated using metrics of sensitivity and false positive count. Task 2 was evaluated using dice similarity coefficient, modified hausdorff distance (95th percentile) and volumetric similarity. For each task, a ranking was made based on the average of the metrics. In total, eleven teams participated in task 1 and nine of those teams participated in task 2. Task 1 was won by a method specifically designed for the detection task (i.e. not participating in task 2). Based on segmentation metrics, the top two methods for task 2 performed statistically significantly better than all other methods. The detection performance of the top-ranking methods was comparable to visual inspection for larger aneurysms. Segmentation performance of the top ranking method, after selection of true UIAs, was similar to interobserver performance. The ADAM challenge remains open for future submissions and improved submissions, with a live leaderboard to provide benchmarking for method developments at https://adam.isi.uu.nl/.


Subject(s)
Cerebral Angiography/methods , Intracranial Aneurysm/diagnostic imaging , Magnetic Resonance Angiography/methods , Datasets as Topic , Educational Measurement , Humans , Magnetic Resonance Imaging , Random Allocation , Risk Assessment
16.
Invest Radiol ; 56(12): 799-808, 2021 12 01.
Article in English | MEDLINE | ID: mdl-34049336

ABSTRACT

BACKGROUND: The potential of deep learning to support radiologist prostate magnetic resonance imaging (MRI) interpretation has been demonstrated. PURPOSE: The aim of this study was to evaluate the effects of increased and diversified training data (TD) on deep learning performance for detection and segmentation of clinically significant prostate cancer-suspicious lesions. MATERIALS AND METHODS: In this retrospective study, biparametric (T2-weighted and diffusion-weighted) prostate MRI acquired with multiple 1.5-T and 3.0-T MRI scanners in consecutive men was used for training and testing of prostate segmentation and lesion detection networks. Ground truth was the combination of targeted and extended systematic MRI-transrectal ultrasound fusion biopsies, with significant prostate cancer defined as International Society of Urological Pathology grade group greater than or equal to 2. U-Nets were internally validated on full, reduced, and PROSTATEx-enhanced training sets and subsequently externally validated on the institutional test set and the PROSTATEx test set. U-Net segmentation was calibrated to clinically desired levels in cross-validation, and test performance was subsequently compared using sensitivities, specificities, predictive values, and Dice coefficient. RESULTS: One thousand four hundred eighty-eight institutional examinations (median age, 64 years; interquartile range, 58-70 years) were temporally split into training (2014-2017, 806 examinations, supplemented by 204 PROSTATEx examinations) and test (2018-2020, 682 examinations) sets. In the test set, Prostate Imaging-Reporting and Data System (PI-RADS) cutoffs greater than or equal to 3 and greater than or equal to 4 on a per-patient basis had sensitivity of 97% (241/249) and 90% (223/249) at specificity of 19% (82/433) and 56% (242/433), respectively. The full U-Net had corresponding sensitivity of 97% (241/249) and 88% (219/249) with specificity of 20% (86/433) and 59% (254/433), not statistically different from PI-RADS (P > 0.3 for all comparisons). U-Net trained using a reduced set of 171 consecutive examinations achieved inferior performance (P < 0.001). PROSTATEx training enhancement did not improve performance. Dice coefficients were 0.90 for prostate and 0.42/0.53 for MRI lesion segmentation at PI-RADS category 3/4 equivalents. CONCLUSIONS: In a large institutional test set, U-Net confirms similar performance to clinical PI-RADS assessment and benefits from more TD, with neither institutional nor PROSTATEx performance improved by adding multiscanner or bi-institutional TD.


Subject(s)
Deep Learning , Prostatic Neoplasms , Humans , Magnetic Resonance Imaging , Magnetic Resonance Spectroscopy , Male , Middle Aged , Prostate/diagnostic imaging , Prostatic Neoplasms/diagnostic imaging , Retrospective Studies
17.
Eur Neuropsychopharmacol ; 50: 64-74, 2021 09.
Article in English | MEDLINE | ID: mdl-33984810

ABSTRACT

The specific role of white matter (WM) microstructure in parkinsonism among patients with schizophrenia spectrum disorders (SSD) is largely unknown. To determine whether topographical alterations of WM microstructure contribute to parkinsonism in SSD patients, we examined healthy controls (HC, n=16) and SSD patients with and without parkinsonism, as defined by Simpson-Angus Scale total score of ≥4 (SSD-P, n=33) or <4 (SSD-nonP, n=62). We used whole brain tract-based spatial statistics (TBSS), tractometry (along tract statistics using TractSeg) and graph analytics (clustering coefficient (CCO), local betweenness centrality (BC)) to provide a framework of specific WM microstructural changes underlying parkinsonism in SSD. Using these methods, post hoc analyses showed (a) decreased fractional anisotrophy (FA), as measured via tractometry, in the corpus callosum, corticospinal tract and striato-fronto-orbital tract, and (b) increased CCO, as derived by graph analytics, in the left orbitofrontal cortex (OFC) and left superior frontal gyrus (SFG), in SSD-P patients when compared to SSD-nonP patients. Increased CCO in the left OFC and SFG was associated with SAS scores. These findings indicate the prominence of OFC alterations and aberrant connectivity with fronto-parietal regions and striatum in the pathogenesis of parkinsonism in SSD. This study further supports the notion of altered "bottom-up modulation" between basal ganglia and fronto-parietal regions in the pathobiology of parkinsonism, which may reflect an interaction between movement disorder intrinsic to SSD and antipsychotic drug-induced sensorimotor dysfunction.


Subject(s)
Parkinsonian Disorders , Schizophrenia , White Matter , Anisotropy , Brain , Gray Matter/pathology , Humans , Parkinsonian Disorders/complications , Parkinsonian Disorders/diagnostic imaging , Parkinsonian Disorders/pathology , Schizophrenia/complications , White Matter/diagnostic imaging , White Matter/pathology
18.
Respiration ; 100(7): 580-587, 2021.
Article in English | MEDLINE | ID: mdl-33857945

ABSTRACT

OBJECTIVE: Evaluation of software tools for segmentation, quantification, and characterization of fibrotic pulmonary parenchyma changes will strengthen the role of CT as biomarkers of disease extent, evolution, and response to therapy in idiopathic pulmonary fibrosis (IPF) patients. METHODS: 418 nonenhanced thin-section MDCTs of 127 IPF patients and 78 MDCTs of 78 healthy individuals were analyzed through 3 fully automated, completely different software tools: YACTA, LUFIT, and IMBIO. The agreement between YACTA and LUFIT on segmented lung volume and 80th (reflecting fibrosis) and 40th (reflecting ground-glass opacity) percentile of the lung density histogram was analyzed using Bland-Altman plots. The fibrosis and ground-glass opacity segmented by IMBIO (lung texture analysis software tool) were included in specific regression analyses. RESULTS: In the IPF-group, LUFIT outperformed YACTA by segmenting more lung volume (mean difference 242 mL, 95% limits of agreement -54 to 539 mL), as well as quantifying higher 80th (76 HU, -6 to 158 HU) and 40th percentiles (9 HU, -73 to 90 HU). No relevant differences were revealed in the control group. The 80th/40th percentile as quantified by LUFIT correlated positively with the percentage of fibrosis/ground-glass opacity calculated by IMBIO (r = 0.78/r = 0.92). CONCLUSIONS: In terms of segmentation of pulmonary fibrosis, LUFIT as a shape model-based segmentation software tool is superior to the threshold-based YACTA, tool, since the density of (severe) fibrosis is similar to that of the surrounding soft tissues. Therefore, shape modeling as used in LUFIT may serve as a valid tool in the quantification of IPF, since this mainly affects the subpleural space.


Subject(s)
Algorithms , Idiopathic Pulmonary Fibrosis/pathology , Lung/pathology , Software , Aged , Case-Control Studies , Diagnosis, Computer-Assisted , Female , Humans , Idiopathic Pulmonary Fibrosis/diagnostic imaging , Linear Models , Lung/diagnostic imaging , Lung Volume Measurements , Male , Middle Aged , Models, Biological , Tomography, X-Ray Computed
19.
Med Image Anal ; 70: 101920, 2021 05.
Article in English | MEDLINE | ID: mdl-33676097

ABSTRACT

Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. While numerous methods for detecting, segmenting and tracking of medical instruments based on endoscopic video images have been proposed in the literature, key limitations remain to be addressed: Firstly, robustness, that is, the reliable performance of state-of-the-art methods when run on challenging images (e.g. in the presence of blood, smoke or motion artifacts). Secondly, generalization; algorithms trained for a specific intervention in a specific hospital should generalize to other interventions or institutions. In an effort to promote solutions for these limitations, we organized the Robust Medical Instrument Segmentation (ROBUST-MIS) challenge as an international benchmarking competition with a specific focus on the robustness and generalization capabilities of algorithms. For the first time in the field of endoscopic image processing, our challenge included a task on binary segmentation and also addressed multi-instance detection and segmentation. The challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures from three different types of surgery. The validation of the competing methods for the three tasks (binary segmentation, multi-instance detection and multi-instance segmentation) was performed in three different stages with an increasing domain gap between the training and the test data. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap. While the average detection and segmentation quality of the best-performing algorithms is high, future research should concentrate on detection and segmentation of small, crossing, moving and transparent instrument(s) (parts).


Subject(s)
Image Processing, Computer-Assisted , Laparoscopy , Algorithms , Artifacts
20.
JMIR Med Inform ; 9(2): e22795, 2021 Feb 03.
Article in English | MEDLINE | ID: mdl-33533728

ABSTRACT

BACKGROUND: Natural Language Understanding enables automatic extraction of relevant information from clinical text data, which are acquired every day in hospitals. In 2018, the language model Bidirectional Encoder Representations from Transformers (BERT) was introduced, generating new state-of-the-art results on several downstream tasks. The National NLP Clinical Challenges (n2c2) is an initiative that strives to tackle such downstream tasks on domain-specific clinical data. In this paper, we present the results of our participation in the 2019 n2c2 and related work completed thereafter. OBJECTIVE: The objective of this study was to optimally leverage BERT for the task of assessing the semantic textual similarity of clinical text data. METHODS: We used BERT as an initial baseline and analyzed the results, which we used as a starting point to develop 3 different approaches where we (1) added additional, handcrafted sentence similarity features to the classifier token of BERT and combined the results with more features in multiple regression estimators, (2) incorporated a built-in ensembling method, M-Heads, into BERT by duplicating the regression head and applying an adapted training strategy to facilitate the focus of the heads on different input patterns of the medical sentences, and (3) developed a graph-based similarity approach for medications, which allows extrapolating similarities across known entities from the training set. The approaches were evaluated with the Pearson correlation coefficient between the predicted scores and ground truth of the official training and test dataset. RESULTS: We improved the performance of BERT on the test dataset from a Pearson correlation coefficient of 0.859 to 0.883 using a combination of the M-Heads method and the graph-based similarity approach. We also show differences between the test and training dataset and how the two datasets influenced the results. CONCLUSIONS: We found that using a graph-based similarity approach has the potential to extrapolate domain specific knowledge to unseen sentences. We observed that it is easily possible to obtain deceptive results from the test dataset, especially when the distribution of the data samples is different between training and test datasets.

SELECTION OF CITATIONS
SEARCH DETAIL
...