Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Radiol Artif Intell ; : e230601, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38900043

ABSTRACT

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To evaluate the performance of an automated deep learning method in detecting ascites and subsequently quantifying its volume in patients with liver cirrhosis and ovarian cancer. Materials and Methods This retrospective study included contrast-enhanced and noncontrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer from two institutions, National Institutes of Health (NIH) and University of Wisconsin (UofW). The model, trained on The Cancer Genome Atlas Ovarian Cancer dataset (mean age, 60 years ± 11 [SD]; 143 female), was tested on two internal (NIH-LC and NIH-OV) and one external dataset (UofW-LC). Its performance was measured by the Dice coefficient, standard deviations, and 95% confidence intervals, focusing on ascites volume in the peritoneal cavity. Results On NIH-LC (25 patients; mean age, 59 years ± 14; 14 male) and NIH-OV (166 patients; mean age, 65 years ± 9; all female), the model achieved Dice scores of 85.5% ± 6.1% (CI: 83.1%-87.8%) and 82.6% ± 15.3% (CI: 76.4%-88.7%), with median volume estimation errors of 19.6% (IQR: 13.2%-29.0%) and 5.3% (IQR: 2.4%- 9.7%), respectively. On UofW-LC (124 patients; mean age, 46 years ± 12; 73 female), the model had a Dice score of 83.0% ± 10.7% (CI: 79.8%-86.3%) and median volume estimation error of 9.7% (IQR: 4.5%-15.1%). The model showed strong agreement with expert assessments, with r2 values of 0.79, 0.98, and 0.97 across the test sets. Conclusion The proposed deep learning method performed well in segmenting and quantifying the volume of ascites in concordance with expert radiologist assessments. ©RSNA, 2024.

2.
ArXiv ; 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38903743

ABSTRACT

BACKGROUND: Segmentation of organs and structures in abdominal MRI is useful for many clinical applications, such as disease diagnosis and radiotherapy. Current approaches have focused on delineating a limited set of abdominal structures (13 types). To date, there is no publicly available abdominal MRI dataset with voxel-level annotations of multiple organs and structures. Consequently, a segmentation tool for multi-structure segmentation is also unavailable. METHODS: We curated a T1-weighted abdominal MRI dataset consisting of 195 patients who underwent imaging at National Institutes of Health (NIH) Clinical Center. The dataset comprises of axial pre-contrast T1, arterial, venous, and delayed phases for each patient, thereby amounting to a total of 780 series (69,248 2D slices). Each series contains voxel-level annotations of 62 abdominal organs and structures. A 3D nnUNet model, dubbed as MRISegmentator-Abdomen (MRISegmentator in short), was trained on this dataset, and evaluation was conducted on an internal test set and two large external datasets: AMOS22 and Duke Liver. The predicted segmentations were compared against the ground-truth using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD). FINDINGS: MRISegmentator achieved an average DSC of 0.861$\pm$0.170 and a NSD of 0.924$\pm$0.163 in the internal test set. On the AMOS22 dataset, MRISegmentator attained an average DSC of 0.829$\pm$0.133 and a NSD of 0.908$\pm$0.067. For the Duke Liver dataset, an average DSC of 0.933$\pm$0.015 and a NSD of 0.929$\pm$0.021 was obtained. INTERPRETATION: The proposed MRISegmentator provides automatic, accurate, and robust segmentations of 62 organs and structures in T1-weighted abdominal MRI sequences. The tool has the potential to accelerate research on various clinical topics, such as abnormality detection, radiotherapy, disease classification among others.

3.
ArXiv ; 2024 Feb 17.
Article in English | MEDLINE | ID: mdl-38903745

ABSTRACT

In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI-generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a 0.48 score, outperforming the METEOR metric by 0.19, while our "Regressed GPT-4" model shows even greater alignment with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports.

4.
Article in English | MEDLINE | ID: mdl-38758290

ABSTRACT

PURPOSE: Body composition measurements from routine abdominal CT can yield personalized risk assessments for asymptomatic and diseased patients. In particular, attenuation and volume measures of muscle and fat are associated with important clinical outcomes, such as cardiovascular events, fractures, and death. This study evaluates the reliability of an Internal tool for the segmentation of muscle and fat (subcutaneous and visceral) as compared to the well-established public TotalSegmentator tool. METHODS: We assessed the tools across 900 CT series from the publicly available SAROS dataset, focusing on muscle, subcutaneous fat, and visceral fat. The Dice score was employed to assess accuracy in subcutaneous fat and muscle segmentation. Due to the lack of ground truth segmentations for visceral fat, Cohen's Kappa was utilized to assess segmentation agreement between the tools. RESULTS: Our Internal tool achieved a 3% higher Dice (83.8 vs. 80.8) for subcutaneous fat and a 5% improvement (87.6 vs. 83.2) for muscle segmentation, respectively. A Wilcoxon signed-rank test revealed that our results were statistically different with p < 0.01. For visceral fat, the Cohen's Kappa score of 0.856 indicated near-perfect agreement between the two tools. Our internal tool also showed very strong correlations for muscle volume (R 2 =0.99), muscle attenuation (R 2 =0.93), and subcutaneous fat volume (R 2 =0.99) with a moderate correlation for subcutaneous fat attenuation (R 2 =0.45). CONCLUSION: Our findings indicated that our Internal tool outperformed TotalSegmentator in measuring subcutaneous fat and muscle. The high Cohen's Kappa score for visceral fat suggests a reliable level of agreement between the two tools. These results demonstrate the potential of our tool in advancing the accuracy of body composition analysis.

5.
ArXiv ; 2024 Apr 12.
Article in English | MEDLINE | ID: mdl-38410656

ABSTRACT

Purpose: Body composition measurements from routine abdominal CT can yield personalized risk assessments for asymptomatic and diseased patients. In particular, attenuation and volume measures of muscle and fat are associated with important clinical outcomes, such as cardiovascular events, fractures, and death. This study evaluates the reliability of an Internal tool for the segmentation of muscle and fat (subcutaneous and visceral) as compared to the well-established public TotalSegmentator tool. Methods: We assessed the tools across 900 CT series from the publicly available SAROS dataset, focusing on muscle, subcutaneous fat, and visceral fat. The Dice score was employed to assess accuracy in subcutaneous fat and muscle segmentation. Due to the lack of ground truth segmentations for visceral fat, Cohen's Kappa was utilized to assess segmentation agreement between the tools. Results: Our Internal tool achieved a 3% higher Dice (83.8 vs. 80.8) for subcutaneous fat and a 5% improvement (87.6 vs. 83.2) for muscle segmentation respectively. A Wilcoxon signed-rank test revealed that our results were statistically different with p < 0.01. For visceral fat, the Cohen's kappa score of 0.856 indicated near-perfect agreement between the two tools. Our internal tool also showed very strong correlations for muscle volume (R2=0.99), muscle attenuation (R2=0.93), and subcutaneous fat volume (R2=0.99) with a moderate correlation for subcutaneous fat attenuation (R2=0.45). Conclusion: Our findings indicated that our Internal tool outperformed TotalSegmentator in measuring subcutaneous fat and muscle. The high Cohen's Kappa score for visceral fat suggests a reliable level of agreement between the two tools. These results demonstrate the potential of our tool in advancing the accuracy of body composition analysis.

6.
Radiology ; 309(1): e231147, 2023 10.
Article in English | MEDLINE | ID: mdl-37815442

ABSTRACT

Background Large language models (LLMs) such as ChatGPT, though proficient in many text-based tasks, are not suitable for use with radiology reports due to patient privacy constraints. Purpose To test the feasibility of using an alternative LLM (Vicuna-13B) that can be run locally for labeling radiography reports. Materials and Methods Chest radiography reports from the MIMIC-CXR and National Institutes of Health (NIH) data sets were included in this retrospective study. Reports were examined for 13 findings. Outputs reporting the presence or absence of the 13 findings were generated by Vicuna by using a single-step or multistep prompting strategy (prompts 1 and 2, respectively). Agreements between Vicuna outputs and CheXpert and CheXbert labelers were assessed using Fleiss κ. Agreement between Vicuna outputs from three runs under a hyperparameter setting that introduced some randomness (temperature, 0.7) was also assessed. The performance of Vicuna and the labelers was assessed in a subset of 100 NIH reports annotated by a radiologist with use of area under the receiver operating characteristic curve (AUC). Results A total of 3269 reports from the MIMIC-CXR data set (median patient age, 68 years [IQR, 59-79 years]; 161 male patients) and 25 596 reports from the NIH data set (median patient age, 47 years [IQR, 32-58 years]; 1557 male patients) were included. Vicuna outputs with prompt 2 showed, on average, moderate to substantial agreement with the labelers on the MIMIC-CXR (κ median, 0.57 [IQR, 0.45-0.66] with CheXpert and 0.64 [IQR, 0.45-0.68] with CheXbert) and NIH (κ median, 0.52 [IQR, 0.41-0.65] with CheXpert and 0.55 [IQR, 0.41-0.74] with CheXbert) data sets, respectively. Vicuna with prompt 2 performed at par (median AUC, 0.84 [IQR, 0.74-0.93]) with both labelers on nine of 11 findings. Conclusion In this proof-of-concept study, outputs of the LLM Vicuna reporting the presence or absence of 13 findings on chest radiography reports showed moderate to substantial agreement with existing labelers. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Cai in this issue.


Subject(s)
Camelids, New World , Radiology , United States , Humans , Male , Animals , Aged , Middle Aged , Privacy , Feasibility Studies , Retrospective Studies , Language
7.
Biomed Opt Express ; 14(2): 533-549, 2023 Feb 01.
Article in English | MEDLINE | ID: mdl-36874499

ABSTRACT

Retina fundus imaging for diagnosing diabetic retinopathy (DR) is an efficient and patient-friendly modality, where many high-resolution images can be easily obtained for accurate diagnosis. With the advancements of deep learning, data-driven models may facilitate the process of high-throughput diagnosis especially in areas with less availability of certified human experts. Many datasets of DR already exist for training learning-based models. However, most are often unbalanced, do not have a large enough sample count, or both. This paper proposes a two-stage pipeline for generating photo-realistic retinal fundus images based on either artificially generated or free-hand drawn semantic lesion maps. The first stage uses a conditional StyleGAN to generate synthetic lesion maps based on a DR severity grade. The second stage then uses GauGAN to convert the synthetic lesion maps into high resolution fundus images. We evaluate the photo-realism of generated images using the Fréchet inception distance (FID), and show the efficacy of our pipeline through downstream tasks, such as; dataset augmentation for automatic DR grading and lesion segmentation.

8.
IEEE Trans Med Imaging ; 41(10): 2728-2738, 2022 10.
Article in English | MEDLINE | ID: mdl-35468060

ABSTRACT

Detecting Out-of-Distribution (OoD) data is one of the greatest challenges in safe and robust deployment of machine learning algorithms in medicine. When the algorithms encounter cases that deviate from the distribution of the training data, they often produce incorrect and over-confident predictions. OoD detection algorithms aim to catch erroneous predictions in advance by analysing the data distribution and detecting potential instances of failure. Moreover, flagging OoD cases may support human readers in identifying incidental findings. Due to the increased interest in OoD algorithms, benchmarks for different domains have recently been established. In the medical imaging domain, for which reliable predictions are often essential, an open benchmark has been missing. We introduce the Medical-Out-Of-Distribution-Analysis-Challenge (MOOD) as an open, fair, and unbiased benchmark for OoD methods in the medical imaging domain. The analysis of the submitted algorithms shows that performance has a strong positive correlation with the perceived difficulty, and that all algorithms show a high variance for different anomalies, making it yet hard to recommend them for clinical practice. We also see a strong correlation between challenge ranking and performance on a simple toy test set, indicating that this might be a valuable addition as a proxy dataset during anomaly detection algorithm development.


Subject(s)
Benchmarking , Machine Learning , Algorithms , Humans
9.
IEEE Trans Med Imaging ; 38(12): 2755-2767, 2019 12.
Article in English | MEDLINE | ID: mdl-31021795

ABSTRACT

Detecting acoustic shadows in ultrasound images is important in many clinical and engineering applications. Real-time feedback of acoustic shadows can guide sonographers to a standardized diagnostic viewing plane with minimal artifacts and can provide additional information for other automatic image analysis algorithms. However, automatically detecting shadow regions using learning-based algorithms is challenging because pixel-wise ground truth annotation of acoustic shadows is subjective and time consuming. In this paper, we propose a weakly supervised method for automatic confidence estimation of acoustic shadow regions. Our method is able to generate a dense shadow-focused confidence map. In our method, a shadow-seg module is built to learn general shadow features for shadow segmentation, based on global image-level annotations as well as a small number of coarse pixel-wise shadow annotations. A transfer function is introduced to extend the obtained binary shadow segmentation to a reference confidence map. In addition, a confidence estimation network is proposed to learn the mapping between input images and the reference confidence maps. This network is able to predict shadow confidence maps directly from input images during inference. We use evaluation metrics such as DICE, inter-class correlation, and so on, to verify the effectiveness of our method. Our method is more consistent than human annotation and outperforms the state-of-the-art quantitatively in shadow segmentation and qualitatively in confidence estimation of shadow regions. Furthermore, we demonstrate the applicability of our method by integrating shadow confidence maps into tasks such as ultrasound image classification, multi-view image fusion, and automated biometric measurements.


Subject(s)
Image Processing, Computer-Assisted/methods , Supervised Machine Learning , Ultrasonography, Prenatal/methods , Algorithms , Deep Learning , Female , Fetus/diagnostic imaging , Humans , Pregnancy
10.
Med Image Anal ; 53: 156-164, 2019 04.
Article in English | MEDLINE | ID: mdl-30784956

ABSTRACT

Automatic detection of anatomical landmarks is an important step for a wide range of applications in medical image analysis. Manual annotation of landmarks is a tedious task and prone to observer errors. In this paper, we evaluate novel deep reinforcement learning (RL) strategies to train agents that can precisely and robustly localize target landmarks in medical scans. An artificial RL agent learns to identify the optimal path to the landmark by interacting with an environment, in our case 3D images. Furthermore, we investigate the use of fixed- and multi-scale search strategies with novel hierarchical action steps in a coarse-to-fine manner. Several deep Q-network (DQN) architectures are evaluated for detecting multiple landmarks using three different medical imaging datasets: fetal head ultrasound (US), adult brain and cardiac magnetic resonance imaging (MRI). The performance of our agents surpasses state-of-the-art supervised and RL methods. Our experiments also show that multi-scale search strategies perform significantly better than fixed-scale agents in images with large field of view and noisy background such as in cardiac MRI. Moreover, the novel hierarchical steps can significantly speed up the searching process by a factor of 4-5 times.


Subject(s)
Anatomic Landmarks , Brain/diagnostic imaging , Deep Learning , Head/diagnostic imaging , Heart/diagnostic imaging , Imaging, Three-Dimensional/methods , Magnetic Resonance Imaging/methods , Adult , Female , Head/embryology , Humans , Pregnancy
11.
IEEE Trans Med Imaging ; 37(8): 1737-1750, 2018 08.
Article in English | MEDLINE | ID: mdl-29994453

ABSTRACT

Limited capture range, and the requirement to provide high quality initialization for optimization-based 2-D/3-D image registration methods, can significantly degrade the performance of 3-D image reconstruction and motion compensation pipelines. Challenging clinical imaging scenarios, which contain significant subject motion, such as fetal in-utero imaging, complicate the 3-D image and volume reconstruction process. In this paper, we present a learning-based image registration method capable of predicting 3-D rigid transformations of arbitrarily oriented 2-D image slices, with respect to a learned canonical atlas co-ordinate system. Only image slice intensity information is used to perform registration and canonical alignment, no spatial transform initialization is required. To find image transformations, we utilize a convolutional neural network architecture to learn the regression function capable of mapping 2-D image slices to a 3-D canonical atlas space. We extensively evaluate the effectiveness of our approach quantitatively on simulated magnetic resonance imaging (MRI), fetal brain imagery with synthetic motion and further demonstrate qualitative results on real fetal MRI data where our method is integrated into a full reconstruction and motion compensation pipeline. Our learning based registration achieves an average spatial prediction error of 7 mm on simulated data and produces qualitatively improved reconstructions for heavily moving fetuses with gestational ages of approximately 20 weeks. Our model provides a general and computationally efficient solution to the 2-D/3-D registration initialization problem and is suitable for real-time scenarios.


Subject(s)
Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Magnetic Resonance Imaging/methods , Algorithms , Brain/diagnostic imaging , Female , Fetus/diagnostic imaging , Humans , Machine Learning , Movement , Pregnancy
12.
Annu Int Conf IEEE Eng Med Biol Soc ; 2017: 189-192, 2017 Jul.
Article in English | MEDLINE | ID: mdl-29059842

ABSTRACT

This paper describes the development of an array of individually addressable pH sensitive microneedles using injection moulding and their integration within a portable device for real-time wireless recording of pH distributions in biological samples. The fabricated microneedles are subjected to gold patterning followed by electrodeposition of iridium oxide to sensitize them to 0.07 units of pH change. Miniaturised electronics suitable for the sensors readout, analog-to-digital conversion and wireless transmission of the potentiometric data are embodied within the device, enabling it to measure real-time pH of soft biological samples such as muscles. In this paper, real-time recording of the cardiac pH distribution, during ischemia followed by reperfusion cycles in cardiac muscles of male Wistar rats has been demonstrated by using the microneedle array.


Subject(s)
Needles , Animals , Electroplating , Hydrogen-Ion Concentration , Injections , Male , Potentiometry , Rats , Rats, Wistar , Time Factors , Wireless Technology
SELECTION OF CITATIONS
SEARCH DETAIL
...