Search | VHL Regional Portal

1.

Physics-informed motion registration of lung parenchyma across static CT images.

Neelakantan, Sunder; Mukherjee, Tanmay; Myers, Kyle J; Rizi, Rahim; Avazmohammadi, Reza.

ArXiv ; 2024 Jul 03.

Article in English | MEDLINE | ID: mdl-39010873

ABSTRACT

Lung injuries, such as ventilator-induced lung injury and radiation-induced lung injury, can lead to heterogeneous alterations in the biomechanical behavior of the lungs. While imaging methods, e.g., X-ray and static computed tomography (CT), can point to regional alterations in lung structure between healthy and diseased tissue, they fall short of delineating timewise kinematic variations between the former and the latter. Image registration has gained recent interest as a tool to estimate the displacement experienced by the lungs during respiration via regional deformation metrics such as volumetric expansion and distortion. However, successful image registration commonly relies on a temporal series of image stacks with small displacements in the lungs across succeeding image stacks, which remains limited in static imaging. In this study, we have presented a finite element (FE) method to estimate strains from static images acquired at the end-expiration (EE) and end-inspiration (EI) timepoints, i.e., images with a large deformation between the two distant timepoints. Physiologically realistic loads were applied to the geometry obtained at EE to deform this geometry to match the geometry obtained at EI. The results indicated that the simulation could minimize the error between the two geometries. Using four-dimensional (4D) dynamic CT in a rat, the strain at an isolated transverse plane estimated by our method showed sufficient agreement with that estimated through non-rigid image registration that used all the timepoints. Through the proposed method, we can estimate the lung deformation at any timepoint between EE and EI. The proposed method offers a tool to estimate timewise regional deformation in the lungs using only static images acquired at EE and EI.

2.

Complete spatiotemporal quantification of cardiac motion in mice through enhanced acquisition and super-resolution reconstruction.

Mukherjee, Tanmay; Keshavarzian, Maziyar; Fugate, Elizabeth M; Naeini, Vahid; Darwish, Amr; Ohayon, Jacques; Myers, Kyle J; Shah, Dipan J; Lindquist, Diana; Sadayappan, Sakthivel; Pettigrew, Roderic I; Avazmohammadi, Reza.

bioRxiv ; 2024 Jun 03.

Article in English | MEDLINE | ID: mdl-38895261

ABSTRACT

The quantification of cardiac motion using cardiac magnetic resonance imaging (CMR) has shown promise as an early-stage marker for cardiovascular diseases. Despite the growing popularity of CMR-based myocardial strain calculations, measures of complete spatiotemporal strains (i.e., three-dimensional strains over the cardiac cycle) remain elusive. Complete spatiotemporal strain calculations are primarily hampered by poor spatial resolution, with the rapid motion of the cardiac wall also challenging the reproducibility of such strains. We hypothesize that a super-resolution reconstruction (SRR) framework that leverages combined image acquisitions at multiple orientations will enhance the reproducibility of complete spatiotemporal strain estimation. Two sets of CMR acquisitions were obtained for five wild-type mice, combining short-axis scans with radial and orthogonal long-axis scans. Super-resolution reconstruction, integrated with tissue classification, was performed to generate full four-dimensional (4D) images. The resulting enhanced and full 4D images enabled complete quantification of the motion in terms of 4D myocardial strains. Additionally, the effects of SRR in improving accurate strain measurements were evaluated using an in-silico heart phantom. The SRR framework revealed near isotropic spatial resolution, high structural similarity, and minimal loss of contrast, which led to overall improvements in strain accuracy. In essence, a comprehensive methodology was generated to quantify complete and reproducible myocardial deformation, aiding in the much-needed standardization of complete spatiotemporal strain calculations.

3.

Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics.

Deshpande, Rucha; Kelkar, Varun A; Gotsis, Dimitrios; Kc, Prabhat; Zeng, Rongping; Myers, Kyle J; Brooks, Frank J; Anastasio, Mark A.

ArXiv ; 2024 May 03.

Article in English | MEDLINE | ID: mdl-38745699

ABSTRACT

Background: The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. Purpose: The goal of this challenge was to promote the development of deep generative models for medical imaging and to emphasize the need for their domain-relevant assessments via the analysis of relevant image statistics. Methods: As part of this Grand Challenge, a common training dataset and an evaluation procedure was developed for benchmarking deep generative models for medical image synthesis. To create the training dataset, an established 3D virtual breast phantom was adapted. The resulting dataset comprised about 108,000 images of size 512×512. For the evaluation of submissions to the Challenge, an ensemble of 10,000 DGM-generated images from each submission was employed. The evaluation procedure consisted of two stages. In the first stage, a preliminary check for memorization and image quality (via the Fréchet Inception Distance (FID)) was performed. Submissions that passed the first stage were then evaluated for the reproducibility of image statistics corresponding to several feature families including texture, morphology, image moments, fractal statistics and skeleton statistics. A summary measure in this feature space was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, the four classes in the training data, and also to identify various artifacts. Results: Fifty-eight submissions from 12 unique users were received for this Challenge. Out of these 12 submissions, 9 submissions passed the first stage of evaluation and were eligible for ranking. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. In general, we observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. Conclusions: This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.

4.

MIDRC-MetricTree: a decision tree-based tool for recommending performance metrics in artificial intelligence-assisted medical image analysis.

Drukker, Karen; Sahiner, Berkman; Hu, Tingting; Kim, Grace Hyun; Whitney, Heather M; Baughan, Natalie; Myers, Kyle J; Giger, Maryellen L; McNitt-Gray, Michael.

J Med Imaging (Bellingham) ; 11(2): 024504, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38576536

ABSTRACT

Purpose: The Medical Imaging and Data Resource Center (MIDRC) was created to facilitate medical imaging machine learning (ML) research for tasks including early detection, diagnosis, prognosis, and assessment of treatment response related to the coronavirus disease 2019 pandemic and beyond. The purpose of this work was to create a publicly available metrology resource to assist researchers in evaluating the performance of their medical image analysis ML algorithms. Approach: An interactive decision tree, called MIDRC-MetricTree, has been developed, organized by the type of task that the ML algorithm was trained to perform. The criteria for this decision tree were that (1) users can select information such as the type of task, the nature of the reference standard, and the type of the algorithm output and (2) based on the user input, recommendations are provided regarding appropriate performance evaluation approaches and metrics, including literature references and, when possible, links to publicly available software/code as well as short tutorial videos. Results: Five types of tasks were identified for the decision tree: (a) classification, (b) detection/localization, (c) segmentation, (d) time-to-event (TTE) analysis, and (e) estimation. As an example, the classification branch of the decision tree includes two-class (binary) and multiclass classification tasks and provides suggestions for methods, metrics, software/code recommendations, and literature references for situations where the algorithm produces either binary or non-binary (e.g., continuous) output and for reference standards with negligible or non-negligible variability and unreliability. Conclusions: The publicly available decision tree is a resource to assist researchers in conducting task-specific performance evaluations, including classification, detection/localization, segmentation, TTE, and estimation tasks.

5.

Sequestration of imaging studies in MIDRC: stratified sampling to balance demographic characteristics of patients in a multi-institutional data commons.

Baughan, Natalie; Whitney, Heather M; Drukker, Karen; Sahiner, Berkman; Hu, Tingting; Kim, Grace Hyun; McNitt-Gray, Michael; Myers, Kyle J; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 10(6): 064501, 2023 Nov.

Article in English | MEDLINE | ID: mdl-38074627

ABSTRACT

Purpose: The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional effort to accelerate medical imaging machine intelligence research and create a publicly available image repository/commons as well as a sequestered commons for performance evaluation and benchmarking of algorithms. After de-identification, approximately 80% of the medical images and associated metadata become part of the open commons and 20% are sequestered from the open commons. To ensure that both commons are representative of the population available, we introduced a stratified sampling method to balance the demographic characteristics across the two datasets. Approach: Our method uses multi-dimensional stratified sampling where several demographic variables of interest are sequentially used to separate the data into individual strata, each representing a unique combination of variables. Within each resulting stratum, patients are assigned to the open or sequestered commons. This algorithm was used on an example dataset containing 5000 patients using the variables of race, age, sex at birth, ethnicity, COVID-19 status, and image modality and compared resulting demographic distributions to naïve random sampling of the dataset over 2000 independent trials. Results: Resulting prevalence of each demographic variable matched the prevalence from the input dataset within one standard deviation. Mann-Whitney U test results supported the hypothesis that sequestration by stratified sampling provided more balanced subsets than naïve randomization, except for demographic subcategories with very low prevalence. Conclusions: The developed multi-dimensional stratified sampling algorithm can partition a large dataset while maintaining balance across several variables, superior to the balance achieved from naïve randomization.

6.

Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons.

Whitney, Heather M; Baughan, Natalie; Myers, Kyle J; Drukker, Karen; Gichoya, Judy; Bower, Brad; Chen, Weijie; Gruszauskas, Nicholas; Kalpathy-Cramer, Jayashree; Koyejo, Sanmi; Sá, Rui C; Sahiner, Berkman; Zhang, Zi; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 10(6): 61105, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37469387

ABSTRACT

Purpose: The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary MIDRC dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC). Approach: The Jensen-Shannon distance (JSD), a measure of similarity of two distributions, was used to longitudinally measure the representativeness of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC. The distributions were evaluated in the demographic categories of age at index, sex, race, ethnicity, and the combination of race and ethnicity. Results: Representativeness of the MIDRC data by ethnicity and the combination of race and ethnicity was impacted by the percentage of CDC case counts for which this was not reported. The distributions by sex and race have retained their level of representativeness over time. Conclusion: The representativeness of the open medical imaging datasets in the curated public data commons at MIDRC has evolved over time as the number of contributing institutions and overall number of subjects have grown. The use of metrics, such as the JSD support measurement of representativeness, is one step needed for fair and generalizable AI algorithm development.

7.

Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential.

Lyu, Qing; Tan, Josh; Zapadka, Michael E; Ponnatapura, Janardhana; Niu, Chuang; Myers, Kyle J; Wang, Ge; Whitlow, Christopher T.

Vis Comput Ind Biomed Art ; 6(1): 9, 2023 May 18.

Article in English | MEDLINE | ID: mdl-37198498

ABSTRACT

The large language model called ChatGPT has drawn extensively attention because of its human-like expression and reasoning abilities. In this study, we investigate the feasibility of using ChatGPT in experiments on translating radiology reports into plain language for patients and healthcare providers so that they are educated for improved healthcare. Radiology reports from 62 low-dose chest computed tomography lung cancer screening scans and 76 brain magnetic resonance imaging metastases screening scans were collected in the first half of February for this study. According to the evaluation by radiologists, ChatGPT can successfully translate radiology reports into plain language with an average score of 4.27 in the five-point system with 0.08 places of information missing and 0.07 places of misinformation. In terms of the suggestions provided by ChatGPT, they are generally relevant such as keeping following-up with doctors and closely monitoring any symptoms, and for about 37% of 138 cases in total ChatGPT offers specific suggestions based on findings in the report. ChatGPT also presents some randomness in its responses with occasionally over-simplified or neglected information, which can be mitigated using a more detailed prompt. Furthermore, ChatGPT results are compared with a newly released large model GPT-4, showing that GPT-4 can significantly improve the quality of translated reports. Our results show that it is feasible to utilize large language models in clinical education, and further efforts are needed to address limitations and maximize their potential.

8.

Assessing the Ability of Generative Adversarial Networks to Learn Canonical Medical Image Statistics.

Kelkar, Varun A; Gotsis, Dimitrios S; Brooks, Frank J; Kc, Prabhat; Myers, Kyle J; Zeng, Rongping; Anastasio, Mark A.

IEEE Trans Med Imaging ; 42(6): 1799-1808, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37022374

ABSTRACT

In recent years, generative adversarial networks (GANs) have gained tremendous popularity for potential applications in medical imaging, such as medical image synthesis, restoration, reconstruction, translation, as well as objective image quality assessment. Despite the impressive progress in generating high-resolution, perceptually realistic images, it is not clear if modern GANs reliably learn the statistics that are meaningful to a downstream medical imaging application. In this work, the ability of a state-of-the-art GAN to learn the statistics of canonical stochastic image models (SIMs) that are relevant to objective assessment of image quality is investigated. It is shown that although the employed GAN successfully learned several basic first- and second-order statistics of the specific medical SIMs under consideration and generated images with high perceptual quality, it failed to correctly learn several per-image statistics pertinent to the these SIMs, highlighting the urgent need to assess medical image GANs in terms of objective measures of image quality.

9.

Discrimination tasks in simulated low-dose CT noise.

Abbey, Craig K; Samuelson, Frank W; Zeng, Rongping; Boone, John M; Myers, Kyle J; Eckstein, Miguel P.

Med Phys ; 50(7): 4151-4172, 2023 Jul.

Article in English | MEDLINE | ID: mdl-37057360

ABSTRACT

BACKGROUND: This study reports the results of a set of discrimination experiments using simulated images that represent the appearance of subtle lesions in low-dose computed tomography (CT) of the lungs. Noise in these images has a characteristic ramp-spectrum before apodization by noise control filters. We consider three specific diagnostic features that determine whether a lesion is considered malignant or benign, two system-resolution levels, and four apodization levels for a total of 24 experimental conditions. PURPOSE: The goal of the investigation is to better understand how well human observers perform subtle discrimination tasks like these, and the mechanisms of that performance. We use a forced-choice psychophysical paradigm to estimate observer efficiency and classification images. These measures quantify how effectively subjects can read the images, and how they use images to perform discrimination tasks across the different imaging conditions. MATERIALS AND METHODS: The simulated CT images used as stimuli in the psychophysical experiments are generated from high-resolution objects passed through a modulation transfer function (MTF) before down-sampling to the image-pixel grid. Acquisition noise is then added with a ramp noise-power spectrum (NPS), with subsequent smoothing through apodization filters. The features considered are lesion size, indistinct lesion boundary, and a nonuniform lesion interior. System resolution is implemented by an MTF with resolution (10% max.) of 0.47 or 0.58 cyc/mm. Apodization is implemented by a Shepp-Logan filter (Sinc profile) with various cutoffs. Six medically naïve subjects participated in the psychophysical studies, entailing training and testing components for each condition. Training consisted of staircase procedures to find the 80% correct threshold for each subject, and testing involved 2000 psychophysical trials at the threshold value for each subject. Human-observer performance is compared to the Ideal Observer to generate estimates of task efficiency. The significance of imaging factors is assessed using ANOVA. Classification images are used to estimate the linear template weights used by subjects to perform these tasks. Classification-image spectra are used to analyze subject weights in the spatial-frequency domain. RESULTS: Overall, average observer efficiency is relatively low in these experiments (10%-40%) relative to detection and localization studies reported previously. We find significant effects for feature type and apodization level on observer efficiency. Somewhat surprisingly, system resolution is not a significant factor. Efficiency effects of the different features appear to be well explained by the profile of the linear templates in the classification images. Increasingly strong apodization is found to both increase the classification-image weights and to increase the mean-frequency of the classification-image spectra. A secondary analysis of "Unapodized" classification images shows that this is largely due to observers undoing (inverting) the effects of apodization filters. CONCLUSIONS: These studies demonstrate that human observers can be relatively inefficient at feature-discrimination tasks in ramp-spectrum noise. Observers appear to be adapting to frequency suppression implemented in apodization filters, but there are residual effects that are not explained by spatial weighting patterns. The studies also suggest that the mechanisms for improving performance through the application of noise-control filters may require further investigation.

Subject(s)

Image Processing, Computer-Assisted , Tomography, X-Ray Computed , Humans , Image Processing, Computer-Assisted/methods , Phantoms, Imaging , Algorithms

10.

Special Issue Editorial: The SPIE Medical Imaging Symposium Celebrates 50 Years.

Myers, Kyle J; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 9(Suppl 1): S12200, 2022 Feb.

Article in English | MEDLINE | ID: mdl-36247334

ABSTRACT

The article introduces the JMI Special Issue Celebrating 50 Years of SPIE Medical Imaging.

11.

Fifty years of SPIE Medical Imaging proceedings papers.

Nishikawa, Robert M; Deserno, Thomas M; Madabhushi, Anant; Krupinski, Elizabeth A; Summers, Ronald M; Hoeschen, Christoph; Mello-Thoms, Claudia; Myers, Kyle J; Kupinski, Mathew A; Siewerdsen, Jeffrey H.

J Med Imaging (Bellingham) ; 9(Suppl 1): 012207, 2022 Feb.

Article in English | MEDLINE | ID: mdl-35761820

ABSTRACT

Purpose: To commemorate the 50th anniversary of the first SPIE Medical Imaging meeting, we highlight some of the important publications published in the conference proceedings. Approach: We determined the top cited and downloaded papers. We also asked members of the editorial board of the Journal of Medical Imaging to select their favorite papers. Results: There was very little overlap between the three methods of highlighting papers. The downloads were mostly recent papers, whereas the favorite papers were mostly older papers. Conclusions: The three different methods combined provide an overview of the highlights of the papers published in the SPIE Medical Imaging conference proceedings over the last 50 years.

12.

Development of metaverse for intelligent healthcare.

Wang, Ge; Badal, Andreu; Jia, Xun; Maltz, Jonathan S; Mueller, Klaus; Myers, Kyle J; Niu, Chuang; Vannier, Michael; Yan, Pingkun; Yu, Zhou; Zeng, Rongping.

Nat Mach Intell ; 4(11): 922-929, 2022 Nov.

Article in English | MEDLINE | ID: mdl-36935774

ABSTRACT

The metaverse integrates physical and virtual realities, enabling humans and their avatars to interact in an environment supported by technologies such as high-speed internet, virtual reality, augmented reality, mixed and extended reality, blockchain, digital twins and artificial intelligence (AI), all enriched by effectively unlimited data. The metaverse recently emerged as social media and entertainment platforms, but extension to healthcare could have a profound impact on clinical practice and human health. As a group of academic, industrial, clinical and regulatory researchers, we identify unique opportunities for metaverse approaches in the healthcare domain. A metaverse of 'medical technology and AI' (MeTAI) can facilitate the development, prototyping, evaluation, regulation, translation and refinement of AI-based medical practice, especially medical imaging-guided diagnosis and therapy. Here, we present metaverse use cases, including virtual comparative scanning, raw data sharing, augmented regulatory science and metaversed medical intervention. We discuss relevant issues on the ecosystem of the MeTAI metaverse including privacy, security and disparity. We also identify specific action items for coordinated efforts to build the MeTAI metaverse for improved healthcare quality, accessibility, cost-effectiveness and patient satisfaction.

13.

Performance of a deep learning-based CT image denoising method: Generalizability over dose, reconstruction kernel, and slice thickness.

Zeng, Rongping; Lin, Claire Yilin; Li, Qin; Jiang, Lu; Skopec, Marlene; Fessler, Jeffrey A; Myers, Kyle J.

Med Phys ; 49(2): 836-853, 2022 Feb.

Article in English | MEDLINE | ID: mdl-34954845

ABSTRACT

PURPOSE: Deep learning (DL) is rapidly finding applications in low-dose CT image denoising. While having the potential to improve the image quality (IQ) over the filtered back projection method (FBP) and produce images quickly, performance generalizability of the data-driven DL methods is not fully understood yet. The main purpose of this work is to investigate the performance generalizability of a low-dose CT image denoising neural network in data acquired under different scan conditions, particularly relating to these three parameters: reconstruction kernel, slice thickness, and dose (noise) level. A secondary goal is to identify any underlying data property associated with the CT scan settings that might help predict the generalizability of the denoising network. METHODS: We select the residual encoder-decoder convolutional neural network (REDCNN) as an example of a low-dose CT image denoising technique in this work. To study how the network generalizes on the three imaging parameters, we grouped the CT volumes in the Low-Dose Grand Challenge (LDGC) data into three pairs of training datasets according to their imaging parameters, changing only one parameter in each pair. We trained REDCNN with them to obtain six denoising models. We test each denoising model on datasets of matching and mismatching parameters with respect to its training sets regarding dose, reconstruction kernel, and slice thickness, respectively, to evaluate the denoising performance changes. Denoising performances are evaluated on patient scans, simulated phantom scans, and physical phantom scans using IQ metrics including mean-squared error (MSE), contrast-dependent modulation transfer function (MTF), pixel-level noise power spectrum (pNPS), and low-contrast lesion detectability (LCD). RESULTS: REDCNN had larger MSE when the testing data were different from the training data in reconstruction kernel, but no significant MSE difference when varying slice thickness in the testing data. REDCNN trained with quarter-dose data had slightly worse MSE in denoising higher-dose images than that trained with mixed-dose data (17%-80%). The MTF tests showed that REDCNN trained with the two reconstruction kernels and slice thicknesses yielded images of similar image resolution. However, REDCNN trained with mixed-dose data preserved the low-contrast resolution better compared to REDCNN trained with quarter-dose data. In the pNPS test, it was found that REDCNN trained with smooth-kernel data could not remove high-frequency noise in the test data of sharp kernel, possibly because the lack of high-frequency noise in the smooth-kernel data limited the ability of the trained model in removing high-frequency noise. Finally, in the LCD test, REDCNN improved the lesion detectability over the original FBP images regardless of whether the training and testing data had matching reconstruction kernels. CONCLUSIONS: REDCNN is observed to be poorly generalizable between reconstruction kernels, more robust in denoising data of arbitrary dose levels when trained with mixed-dose data, and not highly sensitive to slice thickness. It is known that reconstruction kernel affects the in-plane pNPS shape of a CT image, whereas slice thickness and dose level do not, so it is possible that the generalizability performance of this CT image denoising network highly correlates to the pNPS similarity between the testing and training data.

Subject(s)

Deep Learning , Algorithms , Humans , Image Processing, Computer-Assisted , Neural Networks, Computer , Phantoms, Imaging , Radiation Dosage , Signal-To-Noise Ratio , Tomography, X-Ray Computed

14.

Objective Task-Based Evaluation of Artificial Intelligence-Based Medical Imaging Methods:: Framework, Strategies, and Role of the Physician.

Jha, Abhinav K; Myers, Kyle J; Obuchowski, Nancy A; Liu, Ziping; Rahman, Md Ashequr; Saboury, Babak; Rahmim, Arman; Siegel, Barry A.

PET Clin ; 16(4): 493-511, 2021 Oct.

Article in English | MEDLINE | ID: mdl-34537127

ABSTRACT

Artificial intelligence-based methods are showing promise in medical imaging applications. There is substantial interest in clinical translation of these methods, requiring that they be evaluated rigorously. We lay out a framework for objective task-based evaluation of artificial intelligence methods. We provide a list of available tools to conduct this evaluation. We outline the important role of physicians in conducting these evaluation studies. The examples in this article are proposed in the context of PET scans with a focus on evaluating neural network-based methods. However, the framework is also applicable to evaluate other medical imaging modalities and other types of artificial intelligence methods.

Subject(s)

Artificial Intelligence , Physicians , Humans , Positron-Emission Tomography

15.

Special Section Guest Editorial: Evaluation Methodologies for Clinical AI.

Astley, Susan M; Chen, Weijie; Myers, Kyle J; Nishikawa, Robert M.

J Med Imaging (Bellingham) ; 7(1): 012701, 2020 Jan.

Article in English | MEDLINE | ID: mdl-32206681

ABSTRACT

The editorial introduces the Special Section on Evaluation Methodologies for Clinical AI.

16.

Computational reader design and statistical performance evaluation of an in-silico imaging clinical trial comparing digital breast tomosynthesis with full-field digital mammography.

Zeng, Rongping; Samuelson, Frank W; Sharma, Diksha; Badal, Andreu; Christian, Graff G; Glick, Stephen J; Myers, Kyle J; Badano, Aldo.

J Med Imaging (Bellingham) ; 7(4): 042802, 2020 Jul.

Article in English | MEDLINE | ID: mdl-32118094

ABSTRACT

A recent study reported on an in-silico imaging trial that evaluated the performance of digital breast tomosynthesis (DBT) as a replacement for full-field digital mammography (FFDM) for breast cancer screening. In this in-silico trial, the whole imaging chain was simulated, including the breast phantom generation, the x-ray transport process, and computational readers for image interpretation. We focus on the design and performance characteristics of the computational reader in the above-mentioned trial. Location-known lesion (spiculated mass and clustered microcalcifications) detection tasks were used to evaluate the imaging system performance. The computational readers were designed based on the mechanism of a channelized Hotelling observer (CHO), and the reader models were selected to trend human performance. Parameters were tuned to ensure stable lesion detectability. A convolutional CHO that can adapt a round channel function to irregular lesion shapes was compared with the original CHO and was found to be suitable for detecting clustered microcalcifications but was less optimal in detecting spiculated masses. A three-dimensional CHO that operated on the multiple slices was compared with a two-dimensional (2-D) CHO that operated on three versions of 2-D slabs converted from the multiple slices and was found to be optimal in detecting lesions in DBT. Multireader multicase reader output analysis was used to analyze the performance difference between FFDM and DBT for various breast and lesion types. The results showed that DBT was more beneficial in detecting masses than detecting clustered microcalcifications compared with FFDM, consistent with the finding in a clinical imaging trial. Statistical uncertainty smaller than 0.01 standard error for the estimated performance differences was achieved with a dataset containing approximately 3000 breast phantoms. The computational reader design methodology presented provides evidence that model observers can be useful in-silico tools for supporting the performance comparison of breast imaging systems.

17.

Human observer templates for lesion discrimination tasks.

Abbey, Craig K; Samuelson, Frank W; Zeng, Rongping; Boone, John M; Eckstein, Miguel P; Myers, Kyle J.

Proc SPIE Int Soc Opt Eng ; 113162020 Feb.

Article in English | MEDLINE | ID: mdl-33384465

ABSTRACT

We investigate a series of two-alternative forced-choice (2AFC) discrimination tasks based on malignant features of abnormalities in low-dose lung CT scans. A total of 3 tasks are evaluated, and these consist of a size-discrimination task, a boundary-sharpness task, and an irregular-interior task. Target and alternative signal profiles for these tasks are modulated by one of two system transfer functions and embedded in ramp-spectrum noise that has been apodized for noise control in one of 4 different ways. This gives the resulting images statistical properties that are related to weak ground-glass lesions in axial slices of low-dose lung CT images. We investigate observer performance in these tasks using a combination of statistical efficiency and classification images. We report results of 24 2AFC experiments involving the three tasks. A staircase procedure is used to find the approximate 80% correct discrimination threshold in each task, with a subsequent set of 2,000 trials at this threshold. These data are used to estimate statistical efficiency with respect to the ideal observer for each task, and to estimate the observer template using the classification-image methodology. We find efficiency varies between the different tasks with lowest efficiency in the boundary-sharpness task, and highest efficiency in the non-uniform interior task. All three tasks produce clearly visible patterns of positive and negative weighting in the classification images. The spatial frequency plots of classification images show how apodization results in larger weights at higher spatial frequencies.

18.

Performance evaluation of computed tomography systems: Summary of AAPM Task Group 233.

Samei, Ehsan; Bakalyar, Donovan; Boedeker, Kirsten L; Brady, Samuel; Fan, Jiahua; Leng, Shuai; Myers, Kyle J; Popescu, Lucretiu M; Ramirez Giraldo, Juan Carlos; Ranallo, Frank; Solomon, Justin; Vaishnav, Jay; Wang, Jia.

Med Phys ; 46(11): e735-e756, 2019 Nov.

Article in English | MEDLINE | ID: mdl-31408540

ABSTRACT

BACKGROUND: The rapid development and complexity of new x-ray computed tomography (CT) technologies and the need for evidence-based optimization of image quality with respect to radiation and contrast media dose call for an updated approach towards CT performance evaluation. AIMS: This report offers updated testing guidelines for testing CT systems with an enhanced focus on the operational performance including iterative reconstructions and automatic exposure control (AEC) techniques. MATERIALS AND METHODS: The report was developed based on a comprehensive review of best methods and practices in the scientific literature. The detailed methods include the assessment of 1) CT noise (magnitude, texture, nonuniformity, inhomogeneity), 2) resolution (task transfer function under varying conditions and its scalar reflections), 3) task-based performance (detectability, estimability), and 4) AEC performance (spatial, noise, and mA concordance of attenuation and exposure modulation). The methods include varying reconstruction and tube current modulation conditions, standardized testing protocols, and standardized quantities and metrology to facilitate tracking, benchmarking, and quantitative comparisons. RESULTS: The methods, implemented in cited publications, are robust to provide a representative reflection of CT system performance as used operationally in a clinical facility. The methods include recommendations for phantoms and phantom image analysis. DISCUSSION: In line with the current professional trajectory of the field toward quantitation and operational engagement, the stated methods offer quantitation that is more predictive of clinical performance than specification-based approaches. They can pave the way to approach performance testing of new CT systems not only in terms of acceptance testing (i.e., verifying a device meets predefined specifications), but also system commissioning (i.e., determining how the system can be used most effectively in clinical practice). CONCLUSION: We offer a set of common testing procedures that can be utilized towards the optimal clinical utilization of CT imaging devices, benchmarking across varying systems and times, and a basis to develop future performance-based criteria for CT imaging.

Subject(s)

Societies, Medical , Tomography, X-Ray Computed/methods , Contrast Media , Guidelines as Topic , Image Processing, Computer-Assisted , Quality Control , Radiation Dosage , Tomography, X-Ray Computed/instrumentation , Tomography, X-Ray Computed/standards

19.

Impact of prevalence and case distribution in lab-based diagnostic imaging studies.

Gallas, Brandon D; Chen, Weijie; Cole, Elodia; Ochs, Robert; Petrick, Nicholas; Pisano, Etta D; Sahiner, Berkman; Samuelson, Frank W; Myers, Kyle J.

J Med Imaging (Bellingham) ; 6(1): 015501, 2019 Jan.

Article in English | MEDLINE | ID: mdl-30713851

ABSTRACT

We investigated effects of prevalence and case distribution on radiologist diagnostic performance as measured by area under the receiver operating characteristic curve (AUC) and sensitivity-specificity in lab-based reader studies evaluating imaging devices. Our retrospective reader studies compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with dense breasts. Mammograms were acquired from the prospective Digital Mammographic Imaging Screening Trial. We performed five reader studies that differed in terms of cancer prevalence and the distribution of noncancers. Twenty radiologists participated in each reader study. Using split-plot study designs, we collected recall decisions and multilevel scores from the radiologists for calculating sensitivity, specificity, and AUC. Differences in reader-averaged AUCs slightly favored SFM over FFDM (biggest AUC difference: 0.047, SE = 0.023 , p = 0.047 ), where standard error accounts for reader and case variability. The differences were not significant at a level of 0.01 (0.05/5 reader studies). The differences in sensitivities and specificities were also indeterminate. Prevalence had little effect on AUC (largest difference: 0.02), whereas sensitivity increased and specificity decreased as prevalence increased. We found that AUC is robust to changes in prevalence, while radiologists were more aggressive with recall decisions as prevalence increased.

20.

A data-efficient method for local noise power spectrum (NPS) estimation in FDK-reconstructed 3D cone-beam CT.

Zeng, Rongping; Torkaman, Mahsa; Ning, Holly; Zhuge, Ying; Miller, Robert; Myers, Kyle J.

Med Phys ; 46(4): 1634-1647, 2019 Apr.

Article in English | MEDLINE | ID: mdl-30723944

ABSTRACT

PURPOSE: For computed tomography (CT) systems in which noise is nonstationary, a local noise power spectrum (NPS) is often needed to characterize its noise property. We have previously developed a data-efficient radial NPS method to estimate the two-dimensional (2D) local NPS for filtered back projection (FBP)-reconstructed fan-beam CT utilizing the polar separability of CT NPS. In this work, we extend this method to estimate three-dimensional (3D) local NPS for feldkamp-davis-kress (FDK)-reconstructed cone-beam CT (CBCT) volumes. METHODS: Starting from the 2D polar separability, we analyze the CBCT geometry and FDK image reconstruction process to derive the 3D expression of the polar separability for CBCT local NPS. With the polar separability, the 3D local NPS of CBCT can be decomposed into a 2D radial NPS shape function and a one-dimensional (1D) angular amplitude function with certain geometrical transforms. The 2D radial NPS shape function is a global function characterizing the noise correlation structure, while the 1D angular amplitude function is a local function reflecting the varying local noise amplitudes. The 3D radial local NPS method is constructed from the polar separability. We evaluate the accuracy of the 3D radial local NPS method using simulated and real CBCT data by comparing the radial local NPS estimates to a reference local NPS in terms of normalized mean squared error (NMSE) and a task-based performance metric (lesion detectability). RESULTS: In both simulated and physical CBCT examples, a very small NMSE (<5%) was achieved by the radial local NPS method from as few as two scans, while for the traditional local NPS method, about 20 scans were needed to reach this accuracy. The results also showed that the detectability-based system performances computed using the local NPS estimated with the NPS method developed in this work from two scans closely reflected the actual system performance. CONCLUSIONS: The polar separability greatly reduces the data dimensionality of the 3D CBCT local NPS. The radial local NPS method developed based on this property is shown to be capable of estimating the 3D local NPS from only two CBCT scans with acceptable accuracy. The minimum data requirement indicates the potential utility of local NPS in CBCT applications even for clinical situations.

Subject(s)

Algorithms , Cone-Beam Computed Tomography/methods , Four-Dimensional Computed Tomography/methods , Image Processing, Computer-Assisted/methods , Lung Neoplasms/diagnostic imaging , Phantoms, Imaging , Humans , Signal-To-Noise Ratio

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL