Search | VHL Regional Portal

1.

Automated classification of brain MRI reports using fine-tuned large language models.

Kanzawa, Jun; Yasaka, Koichiro; Fujita, Nana; Fujiwara, Shin; Abe, Osamu.

Neuroradiology ; 2024 Jul 12.

Article in English | MEDLINE | ID: mdl-38995393

ABSTRACT

PURPOSE: This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases. METHODS: This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model's performance on test dataset was compared to that of two radiologists. RESULTS: The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model's sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model's specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000). CONCLUSION: Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.

2.

Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.

Sonoda, Yuki; Kurokawa, Ryo; Nakamura, Yuta; Kanzawa, Jun; Kurokawa, Mariko; Ohizumi, Yuji; Gonoi, Wataru; Abe, Osamu.

Jpn J Radiol ; 2024 Jul 01.

Article in English | MEDLINE | ID: mdl-38954192

ABSTRACT

PURPOSE: Large language models (LLMs) are rapidly advancing and demonstrating high performance in understanding textual information, suggesting potential applications in interpreting patient histories and documented imaging findings. As LLMs continue to improve, their diagnostic abilities are expected to be enhanced further. However, there is a lack of comprehensive comparisons between LLMs from different manufacturers. In this study, we aimed to test the diagnostic performance of the three latest major LLMs (GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro) using Radiology Diagnosis Please Cases, a monthly diagnostic quiz series for radiology experts. MATERIALS AND METHODS: Clinical history and imaging findings, provided textually by the case submitters, were extracted from 324 quiz questions originating from Radiology Diagnosis Please cases published between 1998 and 2023. The top three differential diagnoses were generated by GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, using their respective application programming interfaces. A comparative analysis of diagnostic performance among these three LLMs was conducted using Cochrane's Q and post hoc McNemar's tests. RESULTS: The respective diagnostic accuracies of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro for primary diagnosis were 41.0%, 54.0%, and 33.9%, which further improved to 49.4%, 62.0%, and 41.0%, when considering the accuracy of any of the top three differential diagnoses. Significant differences in the diagnostic performance were observed among all pairs of models. CONCLUSION: Claude 3 Opus outperformed GPT-4o and Gemini 1.5 Pro in solving radiology quiz cases. These models appear capable of assisting radiologists when supplied with accurate evaluations and worded descriptions of imaging findings.

3.

Fine-Tuned Large Language Model for Extracting Patients on Pretreatment for Lung Cancer from a Picture Archiving and Communication System Based on Radiological Reports.

Yasaka, Koichiro; Kanzawa, Jun; Kanemaru, Noriko; Koshino, Saori; Abe, Osamu.

J Imaging Inform Med ; 2024 Jul 02.

Article in English | MEDLINE | ID: mdl-38955964

ABSTRACT

This study aimed to investigate the performance of a fine-tuned large language model (LLM) in extracting patients on pretreatment for lung cancer from picture archiving and communication systems (PACS) and comparing it with that of radiologists. Patients whose radiological reports contained the term lung cancer (3111 for training, 124 for validation, and 288 for test) were included in this retrospective study. Based on clinical indication and diagnosis sections of the radiological report (used as input data), they were classified into four groups (used as reference data): group 0 (no lung cancer), group 1 (pretreatment lung cancer present), group 2 (after treatment for lung cancer), and group 3 (planning radiation therapy). Using the training and validation datasets, fine-tuning of the pretrained LLM was conducted ten times. Due to group imbalance, group 2 data were undersampled in the training. The performance of the best-performing model in the validation dataset was assessed in the independent test dataset. For testing purposes, two other radiologists (readers 1 and 2) were also involved in classifying radiological reports. The overall accuracy of the fine-tuned LLM, reader 1, and reader 2 was 0.983, 0.969, and 0.969, respectively. The sensitivity for differentiating group 0/1/2/3 by LLM, reader 1, and reader 2 was 1.000/0.948/0.991/1.000, 0.750/0.879/0.996/1.000, and 1.000/0.931/0.978/1.000, respectively. The time required for classification by LLM, reader 1, and reader 2 was 46s/2539s/1538s, respectively. Fine-tuned LLM effectively extracted patients on pretreatment for lung cancer from PACS with comparable performance to radiologists in a shorter time.

4.

New liver window width in detecting hepatocellular carcinoma on dynamic contrast-enhanced computed tomography with deep learning reconstruction.

Okimoto, Naomasa; Yasaka, Koichiro; Cho, Shinichi; Koshino, Saori; Kanzawa, Jun; Asari, Yusuke; Fujita, Nana; Kubo, Takatoshi; Suzuki, Yuichi; Abe, Osamu.

Radiol Phys Technol ; 2024 Jun 05.

Article in English | MEDLINE | ID: mdl-38837119

ABSTRACT

Changing a window width (WW) alters appearance of noise and contrast of CT images. The aim of this study was to investigate the impact of adjusted WW for deep learning reconstruction (DLR) in detecting hepatocellular carcinomas (HCCs) on CT with DLR. This retrospective study included thirty-five patients who underwent abdominal dynamic contrast-enhanced CT. DLR was used to reconstruct arterial, portal, and delayed phase images. The investigation of the optimal WW involved two blinded readers. Then, five other blinded readers independently read the image sets for detection of HCCs and evaluation of image quality with optimal or conventional liver WW. The optimal WW for detection of HCC was 119 (rounded to 120 in the subsequent analyses) Hounsfield unit (HU), which was the average of adjusted WW in the arterial, portal, and delayed phases. The average figures of merit for the readers for the jackknife alternative free-response receiver operating characteristic analysis to detect HCC were 0.809 (reader 1/2/3/4/5, 0.765/0.798/0.892/0.764/0.827) in the optimal WW (120 HU) and 0.765 (reader 1/2/3/4/5, 0.707/0.769/0.838/0.720/0.791) in the conventional WW (150 HU), and statistically significant difference was observed between them (p < 0.001). Image quality in the optimal WW was superior to those in the conventional WW, and significant difference was seen for some readers (p < 0.041). The optimal WW for detection of HCC was narrower than conventional WW on dynamic contrast-enhanced CT with DLR. Compared with the conventional liver WW, optimal liver WW significantly improved detection performance of HCC.

5.

Super-resolution Deep Learning Reconstruction for 3D Brain MR Imaging: Improvement of Cranial Nerve Depiction and Interobserver Agreement in Evaluations of Neurovascular Conflict.

Yasaka, Koichiro; Kanzawa, Jun; Nakaya, Moto; Kurokawa, Ryo; Tajima, Taku; Akai, Hiroyuki; Yoshioka, Naoki; Akahane, Masaaki; Ohtomo, Kuni; Abe, Osamu; Kiryu, Shigeru.

Acad Radiol ; 2024 Jun 18.

Article in English | MEDLINE | ID: mdl-38897913

ABSTRACT

RATIONALE AND OBJECTIVES: To determine if super-resolution deep learning reconstruction (SR-DLR) improves the depiction of cranial nerves and interobserver agreement when assessing neurovascular conflict in 3D fast asymmetric spin echo (3D FASE) brain MR images, as compared to deep learning reconstruction (DLR). MATERIALS AND METHODS: This retrospective study involved reconstructing 3D FASE MR images of the brain for 37 patients using SR-DLR and DLR. Three blinded readers conducted qualitative image analyses, evaluating the degree of neurovascular conflict, structure depiction, sharpness, noise, and diagnostic acceptability. Quantitative analyses included measuring edge rise distance (ERD), edge rise slope (ERS), and full width at half maximum (FWHM) using the signal intensity profile along a linear region of interest across the center of the basilar artery. RESULTS: Interobserver agreement on the degree of neurovascular conflict of the facial nerve was generally higher with SR-DLR (0.429-0.923) compared to DLR (0.175-0.689). SR-DLR exhibited increased subjective image noise compared to DLR (p ≥ 0.008). However, all three readers found SR-DLR significantly superior in terms of sharpness (p < 0.001); cranial nerve depiction, particularly of facial and acoustic nerves, as well as the osseous spiral lamina (p < 0.001); and diagnostic acceptability (p ≤ 0.002). The FWHM (mm)/ERD (mm)/ERS (mm-1) for SR-DLR and DLR was 3.1-4.3/0.9-1.1/8795.5-10,703.5 and 3.3-4.8/1.4-2.1/5157.9-7705.8, respectively, with SR-DLR's image sharpness being significantly superior (p ≤ 0.001). CONCLUSION: SR-DLR enhances image sharpness, leading to improved cranial nerve depiction and a tendency for greater interobserver agreement regarding facial nerve neurovascular conflict.

6.

Deep learning reconstruction for improving the visualization of acute brain infarct on computed tomography.

Okimoto, Naomasa; Yasaka, Koichiro; Fujita, Nana; Watanabe, Yusuke; Kanzawa, Jun; Abe, Osamu.

Neuroradiology ; 66(1): 63-71, 2024 Jan.

Article in English | MEDLINE | ID: mdl-37991522

ABSTRACT

PURPOSE: This study aimed to investigate the impact of deep learning reconstruction (DLR) on acute infarct depiction compared with hybrid iterative reconstruction (Hybrid IR). METHODS: This retrospective study included 29 (75.8 ± 13.2 years, 20 males) and 26 (64.4 ± 12.4 years, 18 males) patients with and without acute infarction, respectively. Unenhanced head CT images were reconstructed with DLR and Hybrid IR. In qualitative analyses, three readers evaluated the conspicuity of lesions based on five regions and image quality. A radiologist placed regions of interest on the lateral ventricle, putamen, and white matter in quantitative analyses, and the standard deviation of CT attenuation (i.e., quantitative image noise) was recorded. RESULTS: Conspicuity of acute infarct in DLR was superior to that in Hybrid IR, and a statistically significant difference was observed for two readers (p ≤ 0.038). Conspicuity of acute infarct with time from onset to CT imaging at < 24 h in DLR was significantly improved compared with Hybrid IR for all readers (p ≤ 0.020). Image noise in DLR was significantly reduced compared with Hybrid IR with both the qualitative and quantitative analyses (p < 0.001 for all). CONCLUSION: DLR in head CT helped improve acute infarct depiction, especially those with time from onset to CT imaging at < 24 h.

Subject(s)

Deep Learning , Male , Humans , Retrospective Studies , Brain Infarction , Brain , Tomography, X-Ray Computed , Radiographic Image Interpretation, Computer-Assisted , Radiation Dosage , Algorithms

7.

Massive true thymic hyperplasia with osseous metaplasia.

Kanzawa, Jun; Matsuki, Mitsuru; Kano, Shintaro; Nakamata, Akihiro; Nakata, Waka; Furukawa, Rieko; Baba, Katsuhisa; Ono, Shigeru; Mori, Harushi.

Radiol Case Rep ; 18(6): 2307-2310, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37153480

ABSTRACT

True thymic hyperplasia is defined as an increase in both the size and weight of the gland, while maintaining a normal microscopic architecture. Massive true thymic hyperplasia is a rare type of hyperplasia that compresses adjacent structures and causes various symptoms. Limited reports address the imaging findings of massive true thymic hyperplasia. Herein, we report a case of massive true thymic hyperplasia in a 3-year-old girl with no remarkable medical history. Contrast-enhanced CT revealed an anterior mediastinal mass with a bilobed configuration containing punctate and linear calcifications in curvilinear septa, which corresponded to lamellar bone deposits in the interlobular septa. To our knowledge, this is the first report of massive true thymic hyperplasia with osseous metaplasia. We also discuss the imaging features and etiology of massive true thymic hyperplasia with osseous metaplasia.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL