Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
Radiol Artif Intell ; 5(5): e230024, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37795137

ABSTRACT

Purpose: To present a deep learning segmentation model that can automatically and robustly segment all major anatomic structures on body CT images. Materials and Methods: In this retrospective study, 1204 CT examinations (from 2012, 2016, and 2020) were used to segment 104 anatomic structures (27 organs, 59 bones, 10 muscles, and eight vessels) relevant for use cases such as organ volumetry, disease characterization, and surgical or radiation therapy planning. The CT images were randomly sampled from routine clinical studies and thus represent a real-world dataset (different ages, abnormalities, scanners, body parts, sequences, and sites). The authors trained an nnU-Net segmentation algorithm on this dataset and calculated Dice similarity coefficients to evaluate the model's performance. The trained algorithm was applied to a second dataset of 4004 whole-body CT examinations to investigate age-dependent volume and attenuation changes. Results: The proposed model showed a high Dice score (0.943) on the test set, which included a wide range of clinical data with major abnormalities. The model significantly outperformed another publicly available segmentation model on a separate dataset (Dice score, 0.932 vs 0.871; P < .001). The aging study demonstrated significant correlations between age and volume and mean attenuation for a variety of organ groups (eg, age and aortic volume [rs = 0.64; P < .001]; age and mean attenuation of the autochthonous dorsal musculature [rs = -0.74; P < .001]). Conclusion: The developed model enables robust and accurate segmentation of 104 anatomic structures. The annotated dataset (https://doi.org/10.5281/zenodo.6802613) and toolkit (https://www.github.com/wasserth/TotalSegmentator) are publicly available.Keywords: CT, Segmentation, Neural Networks Supplemental material is available for this article. © RSNA, 2023See also commentary by Sebro and Mongan in this issue.

2.
Eur J Radiol ; 168: 111093, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37716024

ABSTRACT

PURPOSE/OBJECTIVE: Reliable detection of thoracic aortic dilatation (TAD) is mandatory in clinical routine. For ECG-gated CT angiography, automated deep learning (DL) algorithms are established for diameter measurements according to current guidelines. For non-ECG gated CT (contrast enhanced (CE) and non-CE), however, only a few reports are available. In these reports, classification as TAD is frequently unreliable with variable result quality depending on anatomic location with the aortic root presenting with the worst results. Therefore, this study aimed to explore the impact of re-training on a previously evaluated DL tool for aortic measurements in a cohort of non-ECG gated exams. METHODS & MATERIALS: A cohort of 995 patients (68 ± 12 years) with CE (n = 392) and non-CE (n = 603) chest CT exams was selected which were classified as TAD by the initial DL tool. The re-trained version featured improved robustness of centerline fitting and cross-sectional plane placement. All cases were processed by the re-trained DL tool version. DL results were evaluated by a radiologist regarding plane placement and diameter measurements. Measurements were classified as correctly measured diameters at each location whereas false measurements consisted of over-/under-estimation of diameters. RESULTS: We evaluated 8948 measurements in 995 exams. The re-trained version performed 8539/8948 (95.5%) of diameter measurements correctly. 3765/8948 (42.1%) of measurements were correct in both versions, initial and re-trained DL tool (best: distal arch 655/995 (66%), worst: Aortic sinus (AS) 221/995 (22%)). In contrast, 4456/8948 (49.8%) measurements were correctly measured only by the re-trained version, in particular at the aortic root (AS: 564/995 (57%), sinotubular junction: 697/995 (70%)). In addition, the re-trained version performed 318 (3.6%) measurements which were not available previously. A total of 228 (2.5%) cases showed false measurements because of tilted planes and 181 (2.0%) over-/under-segmentations with a focus at AS (n = 137 (14%) and n = 73 (7%), respectively). CONCLUSION: Re-training of the DL tool improved diameter assessment, resulting in a total of 95.5% correct measurements. Our data suggests that the re-trained DL tool can be applied even in non-ECG-gated chest CT including both, CE and non-CE exams.


Subject(s)
Deep Learning , Humans , Cross-Sectional Studies , Tomography, X-Ray Computed/methods , Aorta , Algorithms
3.
Eur Heart J Cardiovasc Imaging ; 24(8): 1062-1071, 2023 07 24.
Article in English | MEDLINE | ID: mdl-36662127

ABSTRACT

AIMS: Pulmonary transit time (PTT) is the time blood takes to pass from the right ventricle to the left ventricle via pulmonary circulation. We aimed to quantify PTT in routine cardiovascular magnetic resonance imaging perfusion sequences. PTT may help in the diagnostic assessment and characterization of patients with unclear dyspnoea or heart failure (HF). METHODS AND RESULTS: We evaluated routine stress perfusion cardiovascular magnetic resonance scans in 352 patients, including an assessment of PTT. Eighty-six of these patients also had simultaneous quantification of N-terminal pro-brain natriuretic peptide (NTproBNP). NT-proBNP is an established blood biomarker for quantifying ventricular filling pressure in patients with presumed HF. Manually assessed PTT demonstrated low inter-rater variability with a correlation between raters >0.98. PTT was obtained automatically and correctly in 266 patients using artificial intelligence. The median PTT of 182 patients with both left and right ventricular ejection fraction >50% amounted to 6.8 s (Pulmonary transit time: 5.9-7.9 s). PTT was significantly higher in patients with reduced left ventricular ejection fraction (<40%; P < 0.001) and right ventricular ejection fraction (<40%; P < 0.0001). The area under the receiver operating characteristics curve (AUC) of PTT for exclusion of HF (NT-proBNP <125 ng/L) was 0.73 (P < 0.001) with a specificity of 77% and sensitivity of 70%. The AUC of PTT for the inclusion of HF (NT-proBNP >600 ng/L) was 0.70 (P < 0.001) with a specificity of 78% and sensitivity of 61%. CONCLUSION: PTT as an easily, even automatically obtainable and robust non-invasive biomarker of haemodynamics might help in the evaluation of patients with dyspnoea and HF.


Subject(s)
Artificial Intelligence , Heart Failure , Humans , Stroke Volume , Ventricular Function, Left , Ventricular Function, Right , Natriuretic Peptide, Brain , Biomarkers , Hemodynamics , Dyspnea , Peptide Fragments , Magnetic Resonance Spectroscopy
4.
Acad Radiol ; 30(4): 727-736, 2023 04.
Article in English | MEDLINE | ID: mdl-35691879

ABSTRACT

RATIONALE AND OBJECTIVES: To assess the effects of a change from free text reporting to structured reporting on resident reports, the proofreading workload and report turnaround times in the neuroradiology daily routine. MATERIALS AND METHODS: Our neuroradiology section introduced structured reporting templates in July 2019. Reports dictated by residents during dayshifts from January 2019 to March 2020 were retrospectively assessed using quantitative parameters from report comparison. Through automatic analysis of text-string differences between report states (i.e. draft, preliminary and final report), Jaccard similarities and edit distances of reports following read-out sessions as well as after report sign-off were calculated. Furthermore, turnaround times until preliminary and final report availability to clinicians were investigated. Parameters were visualized as trending line graphs and statistically compared between reporting standards. RESULTS: Three thousand five hundred thirty-eight reports were included into analysis. Mean Jaccard similarity of resident drafts and staff-reviewed final reports increased from 0.53 ± 0.37 to 0.79 ± 0.22 after the introduction of structured reporting (p < .001). Both mean overall edits on draft reports by residents following read-out sessions (0.30 ± 0.45 vs. 0.09 ± 0.29; p < .001) and by staff radiologists during report sign-off (0.17 ± 0.28 vs. 0.12 ± 0.23, p < .001) decreased. With structured reporting, mean turnaround time until preliminary report availability to clinicians decreased by 20.7 minutes (246.9 ± 207.0 vs. 226.2 ± 224.9; p < .001). Similarly, final reports were available 35.0 minutes faster on average (558.05 ± 15.1 vs. 523.0 ± 497.3; p = .002). CONCLUSION: Structured reporting is beneficial in the neuroradiology daily routine, as resident drafts require fewer edits in the report review process. This reduction in proofreading workload is likely responsible for lower report turnaround times.


Subject(s)
Radiology Information Systems , Workload , Humans , Retrospective Studies
5.
Front Cardiovasc Med ; 9: 972512, 2022.
Article in English | MEDLINE | ID: mdl-36072871

ABSTRACT

Purpose: Thoracic aortic (TA) dilatation (TAD) is a risk factor for acute aortic syndrome and must therefore be reported in every CT report. However, the complex anatomy of the thoracic aorta impedes TAD detection. We investigated the performance of a deep learning (DL) prototype as a secondary reading tool built to measure TA diameters in a large-scale cohort. Material and methods: Consecutive contrast-enhanced (CE) and non-CE chest CT exams with "normal" TA diameters according to their radiology reports were included. The DL-prototype (AIRad, Siemens Healthineers, Germany) measured the TA at nine locations according to AHA guidelines. Dilatation was defined as >45 mm at aortic sinus, sinotubular junction (STJ), ascending aorta (AA) and proximal arch and >40 mm from mid arch to abdominal aorta. A cardiovascular radiologist reviewed all cases with TAD according to AIRad. Multivariable logistic regression (MLR) was used to identify factors (demographics and scan parameters) associated with TAD classification by AIRad. Results: 18,243 CT scans (45.7% female) were successfully analyzed by AIRad. Mean age was 62.3 ± 15.9 years and 12,092 (66.3%) were CE scans. AIRad confirmed normal diameters in 17,239 exams (94.5%) and reported TAD in 1,004/18,243 exams (5.5%). Review confirmed TAD classification in 452/1,004 exams (45.0%, 2.5% total), 552 cases were false-positive but identification was easily possible using visual outputs by AIRad. MLR revealed that the following factors were significantly associated with correct TAD classification by AIRad: TAD reported at AA [odds ratio (OR): 1.12, p < 0.001] and STJ (OR: 1.09, p = 0.002), TAD found at >1 location (OR: 1.42, p = 0.008), in CE exams (OR: 2.1-3.1, p < 0.05), men (OR: 2.4, p = 0.003) and patients presenting with higher BMI (OR: 1.05, p = 0.01). Overall, 17,691/18,243 (97.0%) exams were correctly classified. Conclusions: AIRad correctly assessed the presence or absence of TAD in 17,691 exams (97%), including 452 cases with previously missed TAD independent from contrast protocol. These findings suggest its usefulness as a secondary reading tool by improving report quality and efficiency.

6.
J Cardiovasc Magn Reson ; 23(1): 133, 2021 11 11.
Article in English | MEDLINE | ID: mdl-34758821

ABSTRACT

BACKGROUND: Artificial intelligence can assist in cardiac image interpretation. Here, we achieved a substantial reduction in time required to read a cardiovascular magnetic resonance (CMR) study to estimate left atrial volume without compromising accuracy or reliability. Rather than deploying a fully automatic black-box, we propose to incorporate the automated LA volumetry into a human-centric interactive image-analysis process. METHODS AND RESULTS: Atri-U, an automated data analysis pipeline for long-axis cardiac cine images, computes the atrial volume by: (i) detecting the end-systolic frame, (ii) outlining the endocardial borders of the LA, (iii) localizing the mitral annular hinge points and constructing the longitudinal atrial diameters, equivalent to the usual workup done by clinicians. In every step human interaction is possible, such that the results provided by the algorithm can be accepted, corrected, or re-done from scratch. Atri-U was trained and evaluated retrospectively on a sample of 300 patients and then applied to a consecutive clinical sample of 150 patients with various heart conditions. The agreement of the indexed LA volume between Atri-U and two experts was similar to the inter-rater agreement between clinicians (average overestimation of 0.8 mL/m2 with upper and lower limits of agreement of - 7.5 and 5.8 mL/m2, respectively). An expert cardiologist blinded to the origin of the annotations rated the outputs produced by Atri-U as acceptable in 97% of cases for step (i), 94% for step (ii) and 95% for step (iii), which was slightly lower than the acceptance rate of the outputs produced by a human expert radiologist in the same cases (92%, 100% and 100%, respectively). The assistance of Atri-U lead to an expected reduction in reading time of 66%-from 105 to 34 s, in our in-house clinical setting. CONCLUSIONS: Our proposal enables automated calculation of the maximum LA volume approaching human accuracy and precision. The optional user interaction is possible at each processing step. As such, the assisted process sped up the routine CMR workflow by providing accurate, precise, and validated measurement results.


Subject(s)
Artificial Intelligence , Magnetic Resonance Imaging, Cine , Heart Atria/diagnostic imaging , Humans , Image Interpretation, Computer-Assisted , Magnetic Resonance Spectroscopy , Predictive Value of Tests , Reproducibility of Results , Retrospective Studies
7.
Quant Imaging Med Surg ; 11(10): 4245-4257, 2021 Oct.
Article in English | MEDLINE | ID: mdl-34603980

ABSTRACT

BACKGROUND: Manually performed diameter measurements on ECG-gated CT-angiography (CTA) represent the gold standard for diagnosis of thoracic aortic dilatation. However, they are time-consuming and show high inter-reader variability. Therefore, we aimed to evaluate the accuracy of measurements of a deep learning-(DL)-algorithm in comparison to those of radiologists and evaluated measurement times (MT). METHODS: We retrospectively analyzed 405 ECG-gated CTA exams of 371 consecutive patients with suspected aortic dilatation between May 2010 and June 2019. The DL-algorithm prototype detected aortic landmarks (deep reinforcement learning) and segmented the lumen of the thoracic aorta (multi-layer convolutional neural network). It performed measurements according to AHA-guidelines and created visual outputs. Manual measurements were performed by radiologists using centerline technique. Human performance variability (HPV), MT and DL-performance were analyzed in a research setting using a linear mixed model based on 21 randomly selected, repeatedly measured cases. DL-algorithm results were then evaluated in a clinical setting using matched differences. If the differences were within 5 mm for all locations, the cases was regarded as coherent; if there was a discrepancy >5 mm at least at one location (incl. missing values), the case was completely reviewed. RESULTS: HPV ranged up to ±3.4 mm in repeated measurements under research conditions. In the clinical setting, 2,778/3,192 (87.0%) of DL-algorithm's measurements were coherent. Mean differences of paired measurements between DL-algorithm and radiologists at aortic sinus and ascending aorta were -0.45±5.52 and -0.02±3.36 mm. Detailed analysis revealed that measurements at the aortic root were over-/underestimated due to a tilted measurement plane. In total, calculated time saved by DL-algorithm was 3:10 minutes/case. CONCLUSIONS: The DL-algorithm provided coherent results to radiologists at almost 90% of measurement locations, while the majority of discrepent cases were located at the aortic root. In summary, the DL-algorithm assisted radiologists in performing AHA-compliant measurements by saving 50% of time per case.

8.
Eur Radiol ; 31(9): 6816-6824, 2021 Sep.
Article in English | MEDLINE | ID: mdl-33742228

ABSTRACT

OBJECTIVES: To evaluate the performance of a deep convolutional neural network (DCNN) in detecting and classifying distal radius fractures, metal, and cast on radiographs using labels based on radiology reports. The secondary aim was to evaluate the effect of the training set size on the algorithm's performance. METHODS: A total of 15,775 frontal and lateral radiographs, corresponding radiology reports, and a ResNet18 DCNN were used. Fracture detection and classification models were developed per view and merged. Incrementally sized subsets served to evaluate effects of the training set size. Two musculoskeletal radiologists set the standard of reference on radiographs (test set A). A subset (B) was rated by three radiology residents. For a per-study-based comparison with the radiology residents, the results of the best models were merged. Statistics used were ROC and AUC, Youden's J statistic (J), and Spearman's correlation coefficient (ρ). RESULTS: The models' AUC/J on (A) for metal and cast were 0.99/0.98 and 1.0/1.0. The models' and residents' AUC/J on (B) were similar on fracture (0.98/0.91; 0.98/0.92) and multiple fragments (0.85/0.58; 0.91/0.70). Training set size and AUC correlated on metal (ρ = 0.740), cast (ρ = 0.722), fracture (frontal ρ = 0.947, lateral ρ = 0.946), multiple fragments (frontal ρ = 0.856), and fragment displacement (frontal ρ = 0.595). CONCLUSIONS: The models trained on a DCNN with report-based labels to detect distal radius fractures on radiographs are suitable to aid as a secondary reading tool; models for fracture classification are not ready for clinical use. Bigger training sets lead to better models in all categories except joint affection. KEY POINTS: • Detection of metal and cast on radiographs is excellent using AI and labels extracted from radiology reports. • Automatic detection of distal radius fractures on radiographs is feasible and the performance approximates radiology residents. • Automatic classification of the type of distal radius fracture varies in accuracy and is inferior for joint involvement and fragment displacement.


Subject(s)
Radiology , Radius Fractures , Humans , Neural Networks, Computer , Radiography , Radiologists , Radius Fractures/diagnostic imaging
9.
Radiology ; 298(3): 632-639, 2021 03.
Article in English | MEDLINE | ID: mdl-33497316

ABSTRACT

Background Workloads in radiology departments have constantly increased over the past decades. The resulting radiologist fatigue is considered a rising problem that affects diagnostic accuracy. Purpose To investigate whether data mining of quantitative parameters from the report proofreading process can reveal daytime and shift-dependent trends in report similarity as a surrogate marker for resident fatigue. Materials and Methods Data from 117 402 radiology reports written by residents between September 2017 and March 2020 were extracted from a report comparison tool and retrospectively analyzed. Through calculation of the Jaccard similarity coefficient between residents' preliminary and staff-reviewed final reports, the amount of edits performed by staff radiologists during proofreading was quantified on a scale of 0 to 1 (1: perfect similarity, no edits). Following aggregation per weekday and shift, data were statistically analyzed by using simple linear regression or one-way analysis of variance (significance level, P < .05) to determine relationships between report similarity and time of day and/or weekday reports were dictated. Results Decreasing report similarity with increasing work hours was observed for day shifts (r = -0.93 [95% CI: -0.73, -0.98]; P < .001) and weekend shifts (r = -0.72 [95% CI: -0.31, -0.91]; P = .004). For day shifts, negative linear correlation was strongest on Fridays (r = -0.95 [95% CI: -0.80, -0.99]; P < .001), with a 16% lower mean report similarity at the end of shifts (0.85 ± 0.24 at 8 am vs 0.69 ± 0.32 at 5 pm). Furthermore, mean similarity of reports dictated on Fridays (0.79 ± 0.35) was lower than that on all other weekdays (range, 0.84 ± 0.30 to 0.86 ± 0.27; P < .001). For late shifts, report similarity showed a negative correlation with the course of workweeks, showing a continuous decrease from Monday to Friday (r = -0.98 [95% CI: -0.70, -0.99]; P = .007). Temporary increases in report similarity were observed after lunch breaks (day and weekend shifts) and with the arrival of a rested resident during overlapping on-call shifts. Conclusion Decreases in report similarity over the course of workdays and workweeks suggest aggravating effects of fatigue on residents' report writing performances. Periodic breaks within shifts potentially foster recovery. © RSNA, 2021.


Subject(s)
Fatigue/epidemiology , Internship and Residency , Radiology/education , Workload , Adult , Data Mining , Female , Humans , Male
10.
Eur Radiol ; 31(4): 2115-2125, 2021 Apr.
Article in English | MEDLINE | ID: mdl-32997178

ABSTRACT

OBJECTIVES: To investigate the most common errors in residents' preliminary reports, if structured reporting impacts error types and frequencies, and to identify possible implications for resident education and patient safety. MATERIAL AND METHODS: Changes in report content were tracked by a report comparison tool on a word level and extracted for 78,625 radiology reports dictated from September 2017 to December 2018 in our department. Following data aggregation according to word stems and stratification by subspecialty (e.g., neuroradiology) and imaging modality, frequencies of additions/deletions were analyzed for findings and impression report section separately and compared between subgroups. RESULTS: Overall modifications per report averaged 4.1 words, with demonstrably higher amounts of changes for cross-sectional imaging (CT: 6.4; MRI: 6.7) than non-cross-sectional imaging (radiographs: 0.2; ultrasound: 2.8). The four most frequently changed words (right, left, one, and none) remained almost similar among all subgroups (range: 0.072-0.117 per report; once every 9-14 reports). Albeit representing only 0.02% of analyzed words, they accounted for up to 9.7% of all observed changes. Subspecialties solely using structured reporting had substantially lower change ratios in the findings report section (mean: 0.2 per report) compared with prose-style reporting subspecialties (mean: 2.0). Relative frequencies of the most changed words remained unchanged. CONCLUSION: Residents' most common reporting errors in all subspecialties and modalities are laterality discriminator confusions (left/right) and unnoticed descriptor misregistration by speech recognition (one/none). Structured reporting reduces overall error rates, but does not affect occurrence of the most common errors. Increased error awareness and measures improving report correctness and ensuring patient safety are required. KEY POINTS: • The two most common reporting errors in residents' preliminary reports are laterality discriminator confusions (left/right) and unnoticed descriptor misregistration by speech recognition (one/none). • Structured reporting reduces the overall the error frequency in the findings report section by a factor of 10 (structured reporting: mean 0.2 per report; prose-style reporting: 2.0) but does not affect the occurrence of the two major errors. • Staff radiologist review behavior noticeably differs between radiology subspecialties.


Subject(s)
Radiology Information Systems , Radiology , Data Mining , Humans , Radiography , Research Report
11.
Eur J Radiol ; 131: 109233, 2020 Oct.
Article in English | MEDLINE | ID: mdl-32927416

ABSTRACT

PURPOSE: During the emerging COVID-19 pandemic, radiology departments faced a substantial increase in chest CT admissions coupled with the novel demand for quantification of pulmonary opacities. This article describes how our clinic implemented an automated software solution for this purpose into an established software platform in 10 days. The underlying hypothesis was that modern academic centers in radiology are capable of developing and implementing such tools by their own efforts and fast enough to meet the rapidly increasing clinical needs in the wake of a pandemic. METHOD: Deep convolutional neural network algorithms for lung segmentation and opacity quantification on chest CTs were trained using semi-automatically and manually created ground-truth (Ntotal = 172). The performance of the in-house method was compared to an externally developed algorithm on a separate test subset (N = 66). RESULTS: The final algorithm was available at day 10 and achieved human-like performance (Dice coefficient = 0.97). For opacity quantification, a slight underestimation was seen both for the in-house (1.8 %) and for the external algorithm (0.9 %). In contrast to the external reference, the underestimation for the in-house algorithm showed no dependency on total opacity load, making it more suitable for follow-up. CONCLUSIONS: The combination of machine learning and a clinically embedded software development platform enabled time-efficient development, instant deployment, and rapid adoption in clinical routine. The algorithm for fully automated lung segmentation and opacity quantification that we developed in the midst of the COVID-19 pandemic was ready for clinical use within just 10 days and achieved human-level performance even in complex cases.


Subject(s)
Betacoronavirus , Coronavirus Infections/diagnostic imaging , Machine Learning , Pneumonia, Viral/diagnostic imaging , Software , COVID-19 , Humans , Neural Networks, Computer , Pandemics , SARS-CoV-2 , Tomography, X-Ray Computed/methods
12.
Korean J Radiol ; 21(7): 891-899, 2020 07.
Article in English | MEDLINE | ID: mdl-32524789

ABSTRACT

OBJECTIVE: To assess the diagnostic performance of a deep learning-based algorithm for automated detection of acute and chronic rib fractures on whole-body trauma CT. MATERIALS AND METHODS: We retrospectively identified all whole-body trauma CT scans referred from the emergency department of our hospital from January to December 2018 (n = 511). Scans were categorized as positive (n = 159) or negative (n = 352) for rib fractures according to the clinically approved written CT reports, which served as the index test. The bone kernel series (1.5-mm slice thickness) served as an input for a detection prototype algorithm trained to detect both acute and chronic rib fractures based on a deep convolutional neural network. It had previously been trained on an independent sample from eight other institutions (n = 11455). RESULTS: All CTs except one were successfully processed (510/511). The algorithm achieved a sensitivity of 87.4% and specificity of 91.5% on a per-examination level [per CT scan: rib fracture(s): yes/no]. There were 0.16 false-positives per examination (= 81/510). On a per-finding level, there were 587 true-positive findings (sensitivity: 65.7%) and 307 false-negatives. Furthermore, 97 true rib fractures were detected that were not mentioned in the written CT reports. A major factor associated with correct detection was displacement. CONCLUSION: We found good performance of a deep learning-based prototype algorithm detecting rib fractures on trauma CT on a per-examination level at a low rate of false-positives per case. A potential area for clinical application is its use as a screening tool to avoid false-negative radiology reports.


Subject(s)
Deep Learning , Rib Fractures/diagnosis , Tomography, X-Ray Computed/methods , Wounds and Injuries/diagnostic imaging , Adult , Aged , Female , Humans , Image Interpretation, Computer-Assisted , Male , Middle Aged , Retrospective Studies , Whole Body Imaging
13.
Eur J Radiol ; 125: 108862, 2020 Apr.
Article in English | MEDLINE | ID: mdl-32135443

ABSTRACT

PURPOSE: To design and evaluate a self-trainable natural language processing (NLP)-based procedure to classify unstructured radiology reports. The method enabling the generation of curated datasets is exemplified on CT pulmonary angiogram (CTPA) reports. METHOD: We extracted the impressions of CTPA reports created at our institution from 2016 to 2018 (n = 4397; language: German). The status (pulmonary embolism: yes/no) was manually labelled for all exams. Data from 2016/2017 (n = 2801) served as a ground truth to train three NLP architectures that only require a subset of reference datasets for training to be operative. The three architectures were as follows: a convolutional neural network (CNN), a support vector machine (SVM) and a random forest (RF) classifier. Impressions of 2018 (n = 1377) were kept aside and used for general performance measurements. Furthermore, we investigated the dependence of classification performance on the amount of training data with multiple simulations. RESULTS: The classification performance of all three models was excellent (accuracies: 97 %-99 %; F1 scores 0.88-0.97; AUCs: 0.993-0.997). Highest accuracy was reached by the CNN with 99.1 % (95 % CI 98.5-99.6 %). Training with 470 labelled impressions was sufficient to reach an accuracy of > 93 % with all three NLP architectures. CONCLUSION: Our NLP-based approaches allow for an automated and highly accurate retrospective classification of CTPA reports with manageable effort solely using unstructured impression sections. We demonstrated that this approach is useful for the classification of radiology reports not written in English. Moreover, excellent classification performance is achieved at relatively small training set sizes.


Subject(s)
Image Interpretation, Computer-Assisted/methods , Natural Language Processing , Pulmonary Embolism/diagnostic imaging , Tomography, X-Ray Computed/methods , Aged , Area Under Curve , Datasets as Topic , Female , Humans , Male , Neural Networks, Computer , Pulmonary Artery/diagnostic imaging , Retrospective Studies , Support Vector Machine
14.
Abdom Radiol (NY) ; 45(6): 1922-1928, 2020 06.
Article in English | MEDLINE | ID: mdl-31451887

ABSTRACT

PURPOSE: To establish thresholds for contrast enhancement-based attenuation (CM) and iodine concentration (IOD) for the quantitative evaluation of enhancement in renal lesions on single-phase split-filter dual-energy CT (tbDECT) and combine measurements in a machine learning algorithm to potentially improve performance. MATERIAL: 126 patients with incidental renal cysts (both hypo- and hyperdense cysts) or high suspicion for renal cell carcinoma (312 total lesions) undergoing abdominal, portal venous phase tbDECT were initially included in this retrospective study. Gold standard was pathological confirmation or follow-up imaging (MRI or multiphasic CT). CM, IOD, and ROI size were recorded. Thresholds for CM and IOD were identified using Youden-Index of the empirical ROC curves. Decision tree (DTC) and random forest classifier (RFC) were trained. Sensitivities, specificities, and AUCs were compared using McNemar and DeLong test. RESULTS: The final study cohort comprised 40 enhancing and 113 non-enhancing renal lesions. Optimal thresholds for quantitative iodine measurements and contrast enhancement-based attenuation were 1.0 ± 0.0 mg/ml and 23.6 ± 0.3 HU, respectively. Single DECT parameters (IOD, CM) showed similar overall performance with an AUC of 0.894 and 0.858 (p = 0.541) (sensitivity 90 and 80%, specificity 88 and 92%, respectively). While overall performance for the DTC (AUC 0.944) was higher than RFC (AUC 0.886), this difference (p = 0.409) and comparison to CM (p = 0.243) and IOD (p = 0.353) was not statistically significant. CONCLUSIONS: Enhancement in incidental renal lesions on single-phase tbDECT can be classified with up to 87.5% sensitivity and 94.6% specificity. Algorithms combining DECT parameters did not increase overall performance.


Subject(s)
Kidney Neoplasms , Tomography, X-Ray Computed , Algorithms , Contrast Media , Humans , Kidney Neoplasms/diagnostic imaging , Machine Learning , Radiographic Image Interpretation, Computer-Assisted , Retrospective Studies , Sensitivity and Specificity
15.
Invest Radiol ; 55(1): 1-7, 2020 01.
Article in English | MEDLINE | ID: mdl-31503083

ABSTRACT

The use of artificial intelligence (AI) is a powerful tool for image analysis that is increasingly being evaluated by radiology professionals. However, due to the fact that these methods have been developed for the analysis of nonmedical image data and data structure in radiology departments is not "AI ready", implementing AI in radiology is not straightforward. The purpose of this review is to guide the reader through the pipeline of an AI project for automated image analysis in radiology and thereby encourage its implementation in radiology departments. At the same time, this review aims to enable readers to critically appraise articles on AI-based software in radiology.


Subject(s)
Artificial Intelligence , Image Processing, Computer-Assisted/methods , Radiology/methods , Humans
16.
Eur J Radiol ; 121: 108719, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31706232

ABSTRACT

PURPOSE: To share experience from a large, ongoing expert reading teleradiology program in Europe and Asia aiming at supporting referring centers to interpret high-resolution computed tomography (HRCT) with respect to presence of Usual Interstitial Pneumonia (UIP)-pattern in patients with suspected Idiopathic Pulmonary Fibrosis (IPF). METHOD: We analyzed data from 01/2014 to 05/2019, including HRCTs from 239 medical centers in 12 European and Asian countries that were transmitted to our Picture Archiving and Communication System (PACS) via a secured internet connection. Structured reports were generated in consensus by a radiologist with over 20 years of experience in thoracic imaging and a pulmonologist with specific expertise in interstitial lung disease according to current guidelines on IPF. Reports were sent to referring physicians. We evaluated patient characteristics, technical issues, report turnaround times and frequency of diagnoses. We also conducted a survey to collect feedback from referring physicians. RESULTS: HRCT image data from 703 patients were transmitted (53.5% male). Mean age was 63.7 years (SD:17). In 35.1% of all cases diagnosis was "UIP"/"Typical UIP". The mean report turnaround time was 1.7 days (SD:2.9). Data transmission errors occurred in 7.1%. Overall satisfaction rate among referring physicians was high (8.4 out of 10; SD:3.2). CONCLUSIONS: This Eurasian teleradiology program demonstrates the feasibility of cross-border teleradiology for the provision of state-of-the-art reporting despite heterogeneity of referring medical centers and challenges like data transmission errors and language barriers. We also point out important factors for success like the usage of structured reporting templates.


Subject(s)
Idiopathic Pulmonary Fibrosis/diagnostic imaging , Teleradiology/methods , Tomography, X-Ray Computed/methods , Asia , Europe , Female , Humans , Image Interpretation, Computer-Assisted/methods , Lung/diagnostic imaging , Male , Middle Aged
17.
Semin Musculoskelet Radiol ; 23(3): 304-311, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31163504

ABSTRACT

Artificial intelligence (AI) has gained major attention with a rapid increase in the number of published articles, mostly recently. This review provides a general understanding of how AI can or will be useful to the musculoskeletal radiologist. After a brief technical background on AI, machine learning, and deep learning, we illustrate, through examples from the musculoskeletal literature, potential AI applications in the various steps of the radiologist's workflow, from managing the request to communication of results. The implementation of AI solutions does not go without challenges and limitations. These are also discussed, as well as the trends and perspectives.


Subject(s)
Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Musculoskeletal Diseases/diagnostic imaging , Radiology/methods , Humans , Musculoskeletal System/diagnostic imaging
SELECTION OF CITATIONS
SEARCH DETAIL
...