Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 89
Filter
1.
ArXiv ; 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38699170

ABSTRACT

Importance: The efficacy of lung cancer screening can be significantly impacted by the imaging modality used. This Virtual Lung Screening Trial (VLST) addresses the critical need for precision in lung cancer diagnostics and the potential for reducing unnecessary radiation exposure in clinical settings. Objectives: To establish a virtual imaging trial (VIT) platform that accurately simulates real-world lung screening trials (LSTs) to assess the diagnostic accuracy of CT and CXR modalities. Design Setting and Participants: Utilizing computational models and machine learning algorithms, we created a diverse virtual patient population. The cohort, designed to mirror real-world demographics, was assessed using virtual imaging techniques that reflect historical imaging technologies. Main Outcomes and Measures: The primary outcome was the difference in the Area Under the Curve (AUC) for CT and CXR modalities across lesion types and sizes. Results: The study analyzed 298 CT and 313 CXR simulated images from 313 virtual patients, with a lesion-level AUC of 0.81 (95% CI: 0.78-0.84) for CT and 0.55 (95% CI: 0.53-0.56) for CXR. At the patient level, CT demonstrated an AUC of 0.85 (95% CI: 0.80-0.89), compared to 0.53 (95% CI: 0.47-0.60) for CXR. Subgroup analyses indicated CT's superior performance in detecting homogeneous lesions (AUC of 0.97 for lesion-level) and heterogeneous lesions (AUC of 0.71 for lesion-level) as well as in identifying larger nodules (AUC of 0.98 for nodules > 8 mm). Conclusion and Relevance: The VIT platform validated the superior diagnostic accuracy of CT over CXR, especially for smaller nodules, underscoring its potential to replicate real clinical imaging trials. These findings advocate for the integration of virtual trials in the evaluation and improvement of imaging-based diagnostic tools.

2.
Radiology ; 311(2): e232286, 2024 May.
Article in English | MEDLINE | ID: mdl-38771177

ABSTRACT

Background Artificial intelligence (AI) is increasingly used to manage radiologists' workloads. The impact of patient characteristics on AI performance has not been well studied. Purpose To understand the impact of patient characteristics (race and ethnicity, age, and breast density) on the performance of an AI algorithm interpreting negative screening digital breast tomosynthesis (DBT) examinations. Materials and Methods This retrospective cohort study identified negative screening DBT examinations from an academic institution from January 1, 2016, to December 31, 2019. All examinations had 2 years of follow-up without a diagnosis of atypia or breast malignancy and were therefore considered true negatives. A subset of unique patients was randomly selected to provide a broad distribution of race and ethnicity. DBT studies in this final cohort were interpreted by a U.S. Food and Drug Administration-approved AI algorithm, which generated case scores (malignancy certainty) and risk scores (1-year subsequent malignancy risk) for each mammogram. Positive examinations were classified based on vendor-provided thresholds for both scores. Multivariable logistic regression was used to understand relationships between the scores and patient characteristics. Results A total of 4855 patients (median age, 54 years [IQR, 46-63 years]) were included: 27% (1316 of 4855) White, 26% (1261 of 4855) Black, 28% (1351 of 4855) Asian, and 19% (927 of 4855) Hispanic patients. False-positive case scores were significantly more likely in Black patients (odds ratio [OR] = 1.5 [95% CI: 1.2, 1.8]) and less likely in Asian patients (OR = 0.7 [95% CI: 0.5, 0.9]) compared with White patients, and more likely in older patients (71-80 years; OR = 1.9 [95% CI: 1.5, 2.5]) and less likely in younger patients (41-50 years; OR = 0.6 [95% CI: 0.5, 0.7]) compared with patients aged 51-60 years. False-positive risk scores were more likely in Black patients (OR = 1.5 [95% CI: 1.0, 2.0]), patients aged 61-70 years (OR = 3.5 [95% CI: 2.4, 5.1]), and patients with extremely dense breasts (OR = 2.8 [95% CI: 1.3, 5.8]) compared with White patients, patients aged 51-60 years, and patients with fatty density breasts, respectively. Conclusion Patient characteristics influenced the case and risk scores of a Food and Drug Administration-approved AI algorithm analyzing negative screening DBT examinations. © RSNA, 2024.


Subject(s)
Algorithms , Artificial Intelligence , Breast Neoplasms , Mammography , Humans , Female , Middle Aged , Retrospective Studies , Mammography/methods , Breast Neoplasms/diagnostic imaging , Breast/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Aged , Adult , Breast Density
3.
J Imaging Inform Med ; 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38587767

ABSTRACT

De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI "burned-in" to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).

4.
Cancer Imaging ; 24(1): 48, 2024 Apr 05.
Article in English | MEDLINE | ID: mdl-38576031

ABSTRACT

BACKGROUND: Ductal Carcinoma In Situ (DCIS) can progress to invasive breast cancer, but most DCIS lesions never will. Therefore, four clinical trials (COMET, LORIS, LORETTA, AND LORD) test whether active surveillance for women with low-risk Ductal carcinoma In Situ is safe (E. S. Hwang et al., BMJ Open, 9: e026797, 2019, A. Francis et al., Eur J Cancer. 51: 2296-2303, 2015, Chizuko Kanbayashi et al. The international collaboration of active surveillance trials for low-risk DCIS (LORIS, LORD, COMET, LORETTA),  L. E. Elshof et al., Eur J Cancer, 51, 1497-510, 2015). Low-risk is defined as grade I or II DCIS. Because DCIS grade is a major eligibility criteria in these trials, it would be very helpful to assess DCIS grade on mammography, informed by grade assessed on DCIS histopathology in pre-surgery biopsies, since surgery will not be performed on a significant number of patients participating in these trials. OBJECTIVE: To assess the performance and clinical utility of a convolutional neural network (CNN) in discriminating high-risk (grade III) DCIS and/or Invasive Breast Cancer (IBC) from low-risk (grade I/II) DCIS based on mammographic features. We explored whether the CNN could be used as a decision support tool, from excluding high-risk patients for active surveillance. METHODS: In this single centre retrospective study, 464 patients diagnosed with DCIS based on pre-surgery biopsy between 2000 and 2014 were included. The collection of mammography images was partitioned on a patient-level into two subsets, one for training containing 80% of cases (371 cases, 681 images) and 20% (93 cases, 173 images) for testing. A deep learning model based on the U-Net CNN was trained and validated on 681 two-dimensional mammograms. Classification performance was assessed with the Area Under the Curve (AUC) receiver operating characteristic and predictive values on the test set for predicting high risk DCIS-and high-risk DCIS and/ or IBC from low-risk DCIS. RESULTS: When classifying DCIS as high-risk, the deep learning network achieved a Positive Predictive Value (PPV) of 0.40, Negative Predictive Value (NPV) of 0.91 and an AUC of 0.72 on the test dataset. For distinguishing high-risk and/or upstaged DCIS (occult invasive breast cancer) from low-risk DCIS a PPV of 0.80, a NPV of 0.84 and an AUC of 0.76 were achieved. CONCLUSION: For both scenarios (DCIS grade I/II vs. III, DCIS grade I/II vs. III and/or IBC) AUCs were high, 0.72 and 0.76, respectively, concluding that our convolutional neural network can discriminate low-grade from high-grade DCIS.


Subject(s)
Breast Neoplasms , Carcinoma, Ductal, Breast , Carcinoma, Intraductal, Noninfiltrating , Deep Learning , Humans , Female , Carcinoma, Intraductal, Noninfiltrating/diagnostic imaging , Carcinoma, Intraductal, Noninfiltrating/pathology , Retrospective Studies , Patient Participation , Watchful Waiting , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/pathology , Mammography , Carcinoma, Ductal, Breast/diagnosis , Carcinoma, Ductal, Breast/pathology , Carcinoma, Ductal, Breast/surgery
5.
PLoS One ; 19(2): e0282402, 2024.
Article in English | MEDLINE | ID: mdl-38324545

ABSTRACT

OBJECTIVES: To assess the performance bias caused by sampling data into training and test sets in a mammography radiomics study. METHODS: Mammograms from 700 women were used to study upstaging of ductal carcinoma in situ. The dataset was repeatedly shuffled and split into training (n = 400) and test cases (n = 300) forty times. For each split, cross-validation was used for training, followed by an assessment of the test set. Logistic regression with regularization and support vector machine were used as the machine learning classifiers. For each split and classifier type, multiple models were created based on radiomics and/or clinical features. RESULTS: Area under the curve (AUC) performances varied considerably across the different data splits (e.g., radiomics regression model: train 0.58-0.70, test 0.59-0.73). Performances for regression models showed a tradeoff where better training led to worse testing and vice versa. Cross-validation over all cases reduced this variability, but required samples of 500+ cases to yield representative estimates of performance. CONCLUSIONS: In medical imaging, clinical datasets are often limited to relatively small size. Models built from different training sets may not be representative of the whole dataset. Depending on the selected data split and model, performance bias could lead to inappropriate conclusions that might influence the clinical significance of the findings. ADVANCES IN KNOWLEDGE: Performance bias can result from model testing when using limited datasets. Optimal strategies for test set selection should be developed to ensure study conclusions are appropriate.


Subject(s)
Machine Learning , Mammography , Humans , Female , Retrospective Studies
6.
IEEE Trans Med Imaging ; 42(10): 3080-3090, 2023 10.
Article in English | MEDLINE | ID: mdl-37227903

ABSTRACT

Computer-aided detection (CAD) frameworks for breast cancer screening have been researched for several decades. Early adoption of deep-learning models in CAD frameworks has shown greatly improved detection performance compared to traditional CAD on single-view images. Recently, studies have improved performance by merging information from multiple views within each screening exam. Clinically, the integration of lesion correspondence during screening is a complicated decision process that depends on the correct execution of several referencing steps. However, most multi-view CAD frameworks are deep-learning-based black-box techniques. Fully end-to-end designs make it very difficult to analyze model behaviors and fine-tune performance. More importantly, the black-box nature of the techniques discourages clinical adoption due to the lack of explicit reasoning for each multi-view referencing step. Therefore, there is a need for a multi-view detection framework that can not only detect cancers accurately but also provide step-by-step, multi-view reasoning. In this work, we present Ipsilateral-Matching-Refinement Networks (IMR-Net) for digital breast tomosynthesis (DBT) lesion detection across multiple views. Our proposed framework adaptively refines the single-view detection scores based on explicit ipsilateral lesion matching. IMR-Net is built on a robust, single-view detection CAD pipeline with a commercial development DBT dataset of 24675 DBT volumetric views from 8034 exams. Performance is measured using location-based, case-level receiver operating characteristic (ROC) and case-level free-response ROC (FROC) analysis.


Subject(s)
Breast Neoplasms , Humans , Female , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/pathology , Mammography/methods , ROC Curve , Early Detection of Cancer , Radiographic Image Interpretation, Computer-Assisted/methods
7.
medRxiv ; 2023 Feb 23.
Article in English | MEDLINE | ID: mdl-36865183

ABSTRACT

Objectives: To assess the performance bias caused by sampling data into training and test sets in a mammography radiomics study. Methods: Mammograms from 700 women were used to study upstaging of ductal carcinoma in situ. The dataset was repeatedly shuffled and split into training (n=400) and test cases (n=300) forty times. For each split, cross-validation was used for training, followed by an assessment of the test set. Logistic regression with regularization and support vector machine were used as the machine learning classifiers. For each split and classifier type, multiple models were created based on radiomics and/or clinical features. Results: Area under the curve (AUC) performances varied considerably across the different data splits (e.g., radiomics regression model: train 0.58-0.70, test 0.59-0.73). Performances for regression models showed a tradeoff where better training led to worse testing and vice versa. Cross-validation over all cases reduced this variability, but required samples of 500+ cases to yield representative estimates of performance. Conclusions: In medical imaging, clinical datasets are often limited to relatively small size. Models built from different training sets may not be representative of the whole dataset. Depending on the selected data split and model, performance bias could lead to inappropriate conclusions that might influence the clinical significance of the findings. Optimal strategies for test set selection should be developed to ensure study conclusions are appropriate.

8.
Acad Radiol ; 30(6): 1141-1147, 2023 06.
Article in English | MEDLINE | ID: mdl-35909050

ABSTRACT

RATIONALE AND OBJECTIVES: Adoption of the Prostate Imaging Reporting & Data System (PI-RADS) has been shown to increase detection of clinically significant prostate cancer on prostate mpMRI. We propose that a rule-based algorithm based on Regular Expression (RegEx) matching can be used to automatically categorize prostate mpMRI reports into categories as a means by which to assess for opportunities for quality improvement. MATERIALS AND METHODS: All prostate mpMRIs performed in the Duke University Health System from January 2, 2015, to January 29, 2021, were analyzed. Exclusion criteria were applied, for a total of 5343 male patients and 6264 prostate mpMRI reports. These reports were then analyzed by our RegEx algorithm to be categorized as PI-RADS 1 through PI-RADS 5, Recurrent Disease, or "No Information Available." A stratified, random sample of 502 mpMRI reports was reviewed by a blinded clinical team to assess performance of the RegEx algorithm. RESULTS: Compared to manual review, the RegEx algorithm achieved overall accuracy of 92.6%, average precision of 88.8%, average recall of 85.6%, and F1 score of 0.871. The clinical team also reviewed 344 cases that were classified as "No Information Available," and found that in 150 instances, no numerical PI-RADS score for any lesion was included in the impression section of the mpMRI report. CONCLUSION: Rule-based processing is an accurate method for the large-scale, automated extraction of PI-RADS scores from the text of radiology reports. These natural language processing approaches can be used for future initiatives in quality improvement in prostate mpMRI reporting with PI-RADS.


Subject(s)
Multiparametric Magnetic Resonance Imaging , Prostatic Neoplasms , Humans , Male , Prostate/pathology , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Magnetic Resonance Imaging/methods , Algorithms , Retrospective Studies , Image-Guided Biopsy/methods
9.
BMC Med Inform Decis Mak ; 22(1): 102, 2022 04 15.
Article in English | MEDLINE | ID: mdl-35428335

ABSTRACT

BACKGROUND: There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation. METHODS: We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method. RESULTS: Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems. CONCLUSIONS: Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.


Subject(s)
Deep Learning , Abdomen , Humans , Neural Networks, Computer , Pelvis/diagnostic imaging , Tomography, X-Ray Computed
10.
Radiol Artif Intell ; 4(1): e210026, 2022 Jan.
Article in English | MEDLINE | ID: mdl-35146433

ABSTRACT

PURPOSE: To design multidisease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports. MATERIALS AND METHODS: This retrospective study included a total of 12 092 patients (mean age, 57 years ± 18 [standard deviation]; 6172 women) for model development and testing. Rule-based algorithms were used to extract 19 225 disease labels from 13 667 body CT scans performed between 2012 and 2017. Using a three-dimensional DenseVNet, three organ systems were segmented: lungs and pleura, liver and gallbladder, and kidneys and ureters. For each organ system, a three-dimensional convolutional neural network classified each as no apparent disease or for the presence of four common diseases, for a total of 15 different labels across all three models. Testing was performed on a subset of 2158 CT volumes relative to 2875 manually derived reference labels from 2133 patients (mean age, 58 years ± 18; 1079 women). Performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method. RESULTS: Manual validation of the extracted labels confirmed 91%-99% accuracy across the 15 different labels. AUCs for lungs and pleura labels were as follows: atelectasis, 0.77 (95% CI: 0.74, 0.81); nodule, 0.65 (95% CI: 0.61, 0.69); emphysema, 0.89 (95% CI: 0.86, 0.92); effusion, 0.97 (95% CI: 0.96, 0.98); and no apparent disease, 0.89 (95% CI: 0.87, 0.91). AUCs for liver and gallbladder were as follows: hepatobiliary calcification, 0.62 (95% CI: 0.56, 0.67); lesion, 0.73 (95% CI: 0.69, 0.77); dilation, 0.87 (95% CI: 0.84, 0.90); fatty, 0.89 (95% CI: 0.86, 0.92); and no apparent disease, 0.82 (95% CI: 0.78, 0.85). AUCs for kidneys and ureters were as follows: stone, 0.83 (95% CI: 0.79, 0.87); atrophy, 0.92 (95% CI: 0.89, 0.94); lesion, 0.68 (95% CI: 0.64, 0.72); cyst, 0.70 (95% CI: 0.66, 0.73); and no apparent disease, 0.79 (95% CI: 0.75, 0.83). CONCLUSION: Weakly supervised deep learning models were able to classify diverse diseases in multiple organ systems from CT scans.Keywords: CT, Diagnosis/Classification/Application Domain, Semisupervised Learning, Whole-Body Imaging© RSNA, 2022.

11.
Med Phys ; 49(4): 2582-2589, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35191035

ABSTRACT

PURPOSE: The purpose of this work was to characterize and improve the ability of fused filament fabrication to create anthropomorphic physical phantoms for CT research. Specifically, we sought to develop the ability to create multiple levels of X-ray attenuation with a single material. METHODS: CT images of 3D printed cylinders with different infill angles and printing patterns were assessed by comparing their 2D noise power spectra to determine the conditions that produced minimal and uniform noise. A backfilling approach in which additional polymer was extruded into an existing 3D printed background layer was developed to create multiple levels of image contrast. RESULTS: A print with nine infill angles and a rectilinear infill pattern was found to have the best uniformity, but the printed objects were not as uniform as a commercial phantom. An HU dynamic range of 600 was achieved by changing the infill percentage from 40% to 100%. The backfilling technique enabled control of up to eight levels of contrast within one object across a range of 200 HU, similar to the range of soft tissue. A contrast detail phantom with six levels of contrast and an anthropomorphic liver phantom with four levels of contrast were printed with a single material. CONCLUSION: This work improves the uniformity and levels of contrast that can be achieved with fused filament fabrication, thereby enabling researchers to easily create more detailed physical phantoms, including realistic, anthropomorphic textures.


Subject(s)
Printing, Three-Dimensional , Tomography, X-Ray Computed , Abdomen , Phantoms, Imaging , Tomography, X-Ray Computed/methods
12.
IEEE J Biomed Health Inform ; 26(1): 478, 2022 01.
Article in English | MEDLINE | ID: mdl-35038291

ABSTRACT

In [1], the dose estimation accuracy using the alternative baseline method under modulated tube current was not correctly calculated due to an unintentional simulation error.

13.
Radiology ; 303(1): 54-62, 2022 04.
Article in English | MEDLINE | ID: mdl-34981975

ABSTRACT

Background Improving diagnosis of ductal carcinoma in situ (DCIS) before surgery is important in choosing optimal patient management strategies. However, patients may harbor occult invasive disease not detected until definitive surgery. Purpose To assess the performance and clinical utility of mammographic radiomic features in the prediction of occult invasive cancer among women diagnosed with DCIS on the basis of core biopsy findings. Materials and Methods In this Health Insurance Portability and Accountability Act-compliant retrospective study, digital magnification mammographic images were collected from women who underwent breast core-needle biopsy for calcifications that was performed at a single institution between September 2008 and April 2017 and yielded a diagnosis of DCIS. The database query was directed at asymptomatic women with calcifications without a mass, architectural distortion, asymmetric density, or palpable disease. Logistic regression with regularization was used. Differences across training and internal test set by upstaging rate, age, lesion size, and estrogen and progesterone receptor status were assessed by using the Kruskal-Wallis or χ2 test. Results The study consisted of 700 women with DCIS (age range, 40-89 years; mean age, 59 years ± 10 [standard deviation]), including 114 with lesions (16.3%) upstaged to invasive cancer at subsequent surgery. The sample was split randomly into 400 women for the training set and 300 for the testing set (mean ages: training set, 59 years ± 10; test set, 59 years ± 10; P = .85). A total of 109 radiomic and four clinical features were extracted. The best model on the test set by using all radiomic and clinical features helped predict upstaging with an area under the receiver operating characteristic curve of 0.71 (95% CI: 0.62, 0.79). For a fixed high sensitivity (90%), the model yielded a specificity of 22%, a negative predictive value of 92%, and an odds ratio of 2.4 (95% CI: 1.8, 3.2). High specificity (90%) corresponded to a sensitivity of 37%, positive predictive value of 41%, and odds ratio of 5.0 (95% CI: 2.8, 9.0). Conclusion Machine learning models that use radiomic features applied to mammographic calcifications may help predict upstaging of ductal carcinoma in situ, which can refine clinical decision making and treatment planning. © RSNA, 2022.


Subject(s)
Breast Neoplasms , Calcinosis , Carcinoma in Situ , Carcinoma, Ductal, Breast , Carcinoma, Intraductal, Noninfiltrating , Adult , Aged , Aged, 80 and over , Breast Neoplasms/diagnostic imaging , Carcinoma, Ductal, Breast/pathology , Carcinoma, Intraductal, Noninfiltrating/diagnostic imaging , Carcinoma, Intraductal, Noninfiltrating/pathology , Female , Humans , Male , Mammography , Middle Aged , Retrospective Studies
14.
IEEE Trans Biomed Eng ; 69(5): 1639-1650, 2022 05.
Article in English | MEDLINE | ID: mdl-34788216

ABSTRACT

In mammography, calcifications are one of the most common signs of breast cancer. Detection of such lesions is an active area of research for computer-aided diagnosis and machine learning algorithms. Due to limited numbers of positive cases, many supervised detection models suffer from overfitting and fail to generalize. We present a one-class, semi-supervised framework using a deep convolutional autoencoder trained with over 50,000 images from 11,000 negative-only cases. Since the model learned from only normal breast parenchymal features, calcifications produced large signals when comparing the residuals between input and reconstruction output images. As a key advancement, a structural dissimilarity index was used to suppress non-structural noises. Our selected model achieved pixel-based AUROC of 0.959 and AUPRC of 0.676 during validation, where calcification masks were defined in a semi-automated process. Although not trained directly on any cancers, detection performance of calcification lesions on 1,883 testing images (645 malignant and 1238 negative) achieved 75% sensitivity at 2.5 false positives per image. Performance plateaued early when trained with only a fraction of the cases, and greater model complexity or a larger dataset did not improve performance. This study demonstrates the potential of this anomaly detection approach to detect mammographic calcifications in a semi-supervised manner with efficient use of a small number of labeled images, and may facilitate new clinical applications such as computer-aided triage and quality improvement.


Subject(s)
Breast Neoplasms , Calcinosis , Breast Neoplasms/diagnostic imaging , Calcinosis/diagnostic imaging , Diagnosis, Computer-Assisted , Female , Humans , Machine Learning , Mammography/methods
15.
JAMA Netw Open ; 4(8): e2119100, 2021 08 02.
Article in English | MEDLINE | ID: mdl-34398205

ABSTRACT

Importance: Breast cancer screening is among the most common radiological tasks, with more than 39 million examinations performed each year. While it has been among the most studied medical imaging applications of artificial intelligence, the development and evaluation of algorithms are hindered by the lack of well-annotated, large-scale publicly available data sets. Objectives: To curate, annotate, and make publicly available a large-scale data set of digital breast tomosynthesis (DBT) images to facilitate the development and evaluation of artificial intelligence algorithms for breast cancer screening; to develop a baseline deep learning model for breast cancer detection; and to test this model using the data set to serve as a baseline for future research. Design, Setting, and Participants: In this diagnostic study, 16 802 DBT examinations with at least 1 reconstruction view available, performed between August 26, 2014, and January 29, 2018, were obtained from Duke Health System and analyzed. From the initial cohort, examinations were divided into 4 groups and split into training and test sets for the development and evaluation of a deep learning model. Images with foreign objects or spot compression views were excluded. Data analysis was conducted from January 2018 to October 2020. Exposures: Screening DBT. Main Outcomes and Measures: The detection algorithm was evaluated with breast-based free-response receiver operating characteristic curve and sensitivity at 2 false positives per volume. Results: The curated data set contained 22 032 reconstructed DBT volumes that belonged to 5610 studies from 5060 patients with a mean (SD) age of 55 (11) years and 5059 (100.0%) women. This included 4 groups of studies: (1) 5129 (91.4%) normal studies; (2) 280 (5.0%) actionable studies, for which where additional imaging was needed but no biopsy was performed; (3) 112 (2.0%) benign biopsied studies; and (4) 89 studies (1.6%) with cancer. Our data set included masses and architectural distortions that were annotated by 2 experienced radiologists. Our deep learning model reached breast-based sensitivity of 65% (39 of 60; 95% CI, 56%-74%) at 2 false positives per DBT volume on a test set of 460 examinations from 418 patients. Conclusions and Relevance: The large, diverse, and curated data set presented in this study could facilitate the development and evaluation of artificial intelligence algorithms for breast cancer screening by providing data for training as well as a common set of cases for model validation. The performance of the model developed in this study showed that the task remains challenging; its performance could serve as a baseline for future model development.


Subject(s)
Breast Neoplasms/diagnosis , Datasets as Topic , Deep Learning , Early Detection of Cancer/methods , Mammography , Aged , Breast/diagnostic imaging , False Positive Reactions , Female , Humans , Middle Aged , ROC Curve , Reproducibility of Results
16.
Life (Basel) ; 11(8)2021 Jul 26.
Article in English | MEDLINE | ID: mdl-34440490

ABSTRACT

BACKGROUND: The strategy to combat the problem associated with large deformations in the breast due to the difference in the medical imaging of patient posture plays a vital role in multimodal medical image registration with artificial intelligence (AI) initiatives. How to build a breast biomechanical model simulating the large-scale deformation of soft tissue remains a challenge but is highly desirable. METHODS: This study proposed a hybrid individual-specific registration model of the breast combining finite element analysis, property optimization, and affine transformation to register breast images. During the registration process, the mechanical properties of the breast tissues were individually assigned using an optimization process, which allowed the model to become patient specific. Evaluation and results: The proposed method has been extensively tested on two datasets collected from two independent institutions, one from America and another from Hong Kong. CONCLUSIONS: Our method can accurately predict the deformation of breasts from the supine to prone position for both the Hong Kong and American samples, with a small target registration error of lesions.

17.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34117742

ABSTRACT

Most tissue collections of neoplasms are composed of formalin-fixed and paraffin-embedded (FFPE) excised tumor samples used for routine diagnostics. DNA sequencing is becoming increasingly important in cancer research and clinical management; however it is difficult to accurately sequence DNA from FFPE samples. We developed and validated a new bioinformatic pipeline to use existing variant-calling strategies to robustly identify somatic single nucleotide variants (SNVs) from whole exome sequencing using small amounts of DNA extracted from archival FFPE samples of breast cancers. We optimized this strategy using 28 pairs of technical replicates. After optimization, the mean similarity between replicates increased 5-fold, reaching 88% (range 0-100%), with a mean of 21.4 SNVs (range 1-68) per sample, representing a markedly superior performance to existing tools. We found that the SNV-identification accuracy declined when there was less than 40 ng of DNA available and that insertion-deletion variant calls are less reliable than single base substitutions. As the first application of the new algorithm, we compared samples of ductal carcinoma in situ of the breast to their adjacent invasive ductal carcinoma samples. We observed an increased number of mutations (paired-samples sign test, P < 0.05), and a higher genetic divergence in the invasive samples (paired-samples sign test, P < 0.01). Our method provides a significant improvement in detecting SNVs in FFPE samples over previous approaches.


Subject(s)
Biomarkers, Tumor , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Computational Biology/methods , Polymorphism, Single Nucleotide , DNA, Neoplasm , Female , Genetic Heterogeneity , Genetic Testing/methods , Genetic Testing/standards , High-Throughput Nucleotide Sequencing , Humans , Mutation , Workflow
18.
IEEE J Biomed Health Inform ; 25(8): 3061-3072, 2021 08.
Article in English | MEDLINE | ID: mdl-33651703

ABSTRACT

OBJECTIVE: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or "digital-twins (DT)" using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. METHOD: Given a volume of patient CT images, iPhantom segments selected anchor organs and structures (e.g., liver, bones, pancreas) using a learning-based model developed for multi-organ CT segmentation. Organs which are challenging to segment (e.g., intestines) are incorporated from a matched phantom template, using a diffeomorphic registration model developed for multi-organ phantom-voxels. The resulting digital-twin phantoms are used to assess organ doses during routine CT exams. RESULT: iPhantom was validated on both with a set of XCAT digital phantoms (n = 50) and an independent clinical dataset (n = 10) with similar accuracy. iPhantom precisely predicted all organ locations yielding Dice Similarity Coefficients (DSC) 0.6 - 1 for anchor organs and DSC of 0.3-0.9 for all other organs. iPhantom showed <10% errors in estimated radiation dose for the majority of organs, which was notably superior to the state-of-the-art baseline method (20-35% dose errors). CONCLUSION: iPhantom enables automated and accurate creation of patient-specific phantoms and, for the first time, provides sufficient and automated patient-specific dose estimates for CT dosimetry. SIGNIFICANCE: The new framework brings the creation and application of CHPs (computational human phantoms) to the level of individual CHPs through automation, achieving wide and precise organ localization, paving the way for clinical monitoring, personalized optimization, and large-scale research.


Subject(s)
Tomography, X-Ray Computed , Humans , Phantoms, Imaging
19.
AJR Am J Roentgenol ; 216(4): 903-911, 2021 04.
Article in English | MEDLINE | ID: mdl-32783550

ABSTRACT

BACKGROUND. The incidence of ductal carcinoma in situ (DCIS) has steadily increased, as have concerns regarding overtreatment. Active surveillance is a novel treatment strategy that avoids surgical excision, but identifying patients with occult invasive disease who should be excluded from active surveillance is challenging. Radiologists are not typically expected to predict the upstaging of DCIS to invasive disease, though they might be trained to perform this task. OBJECTIVE. The purpose of this study was to determine whether a mixed-methods two-stage observer study can improve radiologists' ability to predict upstaging of DCIS to invasive disease on mammography. METHODS. All cases of DCIS calcifications that underwent stereotactic biopsy between 2010 and 2015 were identified. Two cohorts were randomly generated, each containing 150 cases (120 pure DCIS cases and 30 DCIS cases upstaged to invasive disease at surgery). Nine breast radiologists reviewed the mammograms in the first cohort in a blinded fashion and scored the probability of upstaging to invasive disease. The radiologists then reviewed the cases and results collectively in a focus group to develop consensus criteria that could improve their ability to predict upstaging. The radiologists reviewed the mammograms from the second cohort in a blinded fashion and again scored the probability of upstaging. Statistical analysis compared the performances between rounds 1 and 2. RESULTS. The mean AUC for reader performance in predicting upstaging in round 1 was 0.623 (range, 0.514-0.684). In the focus group, radiologists agreed that upstaging was better predicted when an associated mass, asymmetry, or architectural distortion was present; when densely packed calcifications extended over a larger area; and when the most suspicious features were focused on rather than the most common features. Additionally, radiologists agreed that BI-RADS descriptors do not adequately characterize risk of invasion, and that microinvasive disease and smaller areas of DCIS will have poor prediction estimates. Reader performance significantly improved in round 2 (mean AUC, 0.765; range, 0.617-0.852; p = .045). CONCLUSION. A mixed-methods two-stage observer study identified factors that helped radiologists significantly improve their ability to predict upstaging of DCIS to invasive disease. CLINICAL IMPACT. Breast radiologists can be trained to better predict upstaging of DCIS to invasive disease, which may facilitate discussions with patients and referring providers.


Subject(s)
Breast Neoplasms/diagnostic imaging , Carcinoma, Intraductal, Noninfiltrating/diagnostic imaging , Mammography , Aged , Biopsy , Breast/diagnostic imaging , Breast/pathology , Breast Density , Breast Neoplasms/diagnosis , Breast Neoplasms/pathology , Carcinoma, Intraductal, Noninfiltrating/diagnosis , Carcinoma, Intraductal, Noninfiltrating/pathology , Clinical Decision Rules , Female , Focus Groups , Humans , Middle Aged , Retrospective Studies
20.
Med Phys ; 48(3): 1026-1038, 2021 Mar.
Article in English | MEDLINE | ID: mdl-33128288

ABSTRACT

PURPOSE: Digital breast tomosynthesis (DBT) is a limited-angle tomographic breast imaging modality that can be used for breast cancer screening in conjunction with full-field digital mammography (FFDM) or synthetic mammography (SM). Currently, there are five commercial DBT systems that have been approved by the U.S. FDA for breast cancer screening, all varying greatly in design and imaging protocol. Because the systems are different in technical specifications, there is a need for a quantitative approach for assessing them. In this study, the DBT systems are assessed using a novel methodology with an inkjet-printed anthropomorphic phantom and four alternative forced choice (4AFC) study scheme. METHOD: A breast phantom was fabricated using inkjet printing and parchment paper. The phantom contained 5-mm spiculated masses fabricated with potassium iodide (KI)-doped ink and microcalcifications (MCs) made with calcium hydroxyapatite. Images of the phantom were acquired on all five systems with DBT, FFDM, and SM modalities where available using beam settings under automatic exposure control. A 4AFC study was conducted to assess reader performance with each signal under each modality. Statistical analysis was performed on the data to determine proportion correct (PC), standard deviations, and levels of significance. RESULTS: For masses, overall detection was highest with DBT. The difference in PC was statistically significant between DBT and SM for most systems. A relationship was observed between increasing PC and greater gantry span. For MCs, performance was highest with DBT and FFDM compared to SM. The difference between PC of DBT and PC of SM was statistically significant for all manufacturers. CONCLUSIONS: This methodology represents a novel approach for evaluating systems. This study is the first of its kind to use an inkjet-printed anthropomorphic phantom with realistic signals to assess performance of clinical DBT imaging systems.


Subject(s)
Breast Diseases , Breast Neoplasms , Mammography , Breast/diagnostic imaging , Breast Neoplasms/diagnostic imaging , Humans , Phantoms, Imaging , Radiographic Image Enhancement
SELECTION OF CITATIONS
SEARCH DETAIL
...