Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 137
Filter
1.
Radiother Oncol ; 197: 110368, 2024 Jun 02.
Article in English | MEDLINE | ID: mdl-38834153

ABSTRACT

BACKGROUND AND PURPOSE: To optimize our previously proposed TransRP, a model integrating CNN (convolutional neural network) and ViT (Vision Transformer) designed for recurrence-free survival prediction in oropharyngeal cancer and to extend its application to the prediction of multiple clinical outcomes, including locoregional control (LRC), Distant metastasis-free survival (DMFS) and overall survival (OS). MATERIALS AND METHODS: Data was collected from 400 patients (300 for training and 100 for testing) diagnosed with oropharyngeal squamous cell carcinoma (OPSCC) who underwent (chemo)radiotherapy at University Medical Center Groningen. Each patient's data comprised pre-treatment PET/CT scans, clinical parameters, and clinical outcome endpoints, namely LRC, DMFS and OS. The prediction performance of TransRP was compared with CNNs when inputting image data only. Additionally, three distinct methods (m1-3) of incorporating clinical predictors into TransRP training and one method (m4) that uses TransRP prediction as one parameter in a clinical Cox model were compared. RESULTS: TransRP achieved higher test C-index values of 0.61, 0.84 and 0.70 than CNNs for LRC, DMFS and OS, respectively. Furthermore, when incorporating TransRP's prediction into a clinical Cox model (m4), a higher C-index of 0.77 for OS was obtained. Compared with a clinical routine risk stratification model of OS, our model, using clinical variables, radiomics and TransRP prediction as predictors, achieved larger separations of survival curves between low, intermediate and high risk groups. CONCLUSION: TransRP outperformed CNN models for all endpoints. Combining clinical data and TransRP prediction in a Cox model achieved better OS prediction.

2.
Comput Biol Med ; 177: 108675, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38820779

ABSTRACT

BACKGROUND: The different tumor appearance of head and neck cancer across imaging modalities, scanners, and acquisition parameters accounts for the highly subjective nature of the manual tumor segmentation task. The variability of the manual contours is one of the causes of the lack of generalizability and the suboptimal performance of deep learning (DL) based tumor auto-segmentation models. Therefore, a DL-based method was developed that outputs predicted tumor probabilities for each PET-CT voxel in the form of a probability map instead of one fixed contour. The aim of this study was to show that DL-generated probability maps for tumor segmentation are clinically relevant, intuitive, and a more suitable solution to assist radiation oncologists in gross tumor volume segmentation on PET-CT images of head and neck cancer patients. METHOD: A graphical user interface (GUI) was designed, and a prototype was developed to allow the user to interact with tumor probability maps. Furthermore, a user study was conducted where nine experts in tumor delineation interacted with the interface prototype and its functionality. The participants' experience was assessed qualitatively and quantitatively. RESULTS: The interviews with radiation oncologists revealed their preference for using a rainbow colormap to visualize tumor probability maps during contouring, which they found intuitive. They also appreciated the slider feature, which facilitated interaction by allowing the selection of threshold values to create single contours for editing and use as a starting point. Feedback on the prototype highlighted its excellent usability and positive integration into clinical workflows. CONCLUSIONS: This study shows that DL-generated tumor probability maps are explainable, transparent, intuitive and a better alternative to the single output of tumor segmentation models.


Subject(s)
Deep Learning , Head and Neck Neoplasms , Humans , Head and Neck Neoplasms/diagnostic imaging , User-Computer Interface , Positron Emission Tomography Computed Tomography/methods
3.
Eur Radiol Exp ; 8(1): 63, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38764066

ABSTRACT

BACKGROUND: Emphysema influences the appearance of lung tissue in computed tomography (CT). We evaluated whether this affects lung nodule detection by artificial intelligence (AI) and human readers (HR). METHODS: Individuals were selected from the "Lifelines" cohort who had undergone low-dose chest CT. Nodules in individuals without emphysema were matched to similar-sized nodules in individuals with at least moderate emphysema. AI results for nodular findings of 30-100 mm3 and 101-300 mm3 were compared to those of HR; two expert radiologists blindly reviewed discrepancies. Sensitivity and false positives (FPs)/scan were compared for emphysema and non-emphysema groups. RESULTS: Thirty-nine participants with and 82 without emphysema were included (n = 121, aged 61 ± 8 years (mean ± standard deviation), 58/121 males (47.9%)). AI and HR detected 196 and 206 nodular findings, respectively, yielding 109 concordant nodules and 184 discrepancies, including 118 true nodules. For AI, sensitivity was 0.68 (95% confidence interval 0.57-0.77) in emphysema versus 0.71 (0.62-0.78) in non-emphysema, with FPs/scan 0.51 and 0.22, respectively (p = 0.028). For HR, sensitivity was 0.76 (0.65-0.84) and 0.80 (0.72-0.86), with FPs/scan of 0.15 and 0.27 (p = 0.230). Overall sensitivity was slightly higher for HR than for AI, but this difference disappeared after the exclusion of benign lymph nodes. FPs/scan were higher for AI in emphysema than in non-emphysema (p = 0.028), while FPs/scan for HR were higher than AI for 30-100 mm3 nodules in non-emphysema (p = 0.009). CONCLUSIONS: AI resulted in more FPs/scan in emphysema compared to non-emphysema, a difference not observed for HR. RELEVANCE STATEMENT: In the creation of a benchmark dataset to validate AI software for lung nodule detection, the inclusion of emphysema cases is important due to the additional number of FPs. KEY POINTS: • The sensitivity of nodule detection by AI was similar in emphysema and non-emphysema. • AI had more FPs/scan in emphysema compared to non-emphysema. • Sensitivity and FPs/scan by the human reader were comparable for emphysema and non-emphysema. • Emphysema and non-emphysema representation in benchmark dataset is important for validating AI.


Subject(s)
Artificial Intelligence , Pulmonary Emphysema , Tomography, X-Ray Computed , Humans , Male , Middle Aged , Female , Tomography, X-Ray Computed/methods , Pulmonary Emphysema/diagnostic imaging , Software , Sensitivity and Specificity , Lung Neoplasms/diagnostic imaging , Aged , Radiation Dosage , Solitary Pulmonary Nodule/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods
4.
Insights Imaging ; 15(1): 54, 2024 Feb 27.
Article in English | MEDLINE | ID: mdl-38411750

ABSTRACT

OBJECTIVE: To systematically review radiomic feature reproducibility and model validation strategies in recent studies dealing with CT and MRI radiomics of bone and soft-tissue sarcomas, thus updating a previous version of this review which included studies published up to 2020. METHODS: A literature search was conducted on EMBASE and PubMed databases for papers published between January 2021 and March 2023. Data regarding radiomic feature reproducibility and model validation strategies were extracted and analyzed. RESULTS: Out of 201 identified papers, 55 were included. They dealt with radiomics of bone (n = 23) or soft-tissue (n = 32) tumors. Thirty-two (out of 54 employing manual or semiautomatic segmentation, 59%) studies included a feature reproducibility analysis. Reproducibility was assessed based on intra/interobserver segmentation variability in 30 (55%) and geometrical transformations of the region of interest in 2 (4%) studies. At least one machine learning validation technique was used for model development in 34 (62%) papers, and K-fold cross-validation was employed most frequently. A clinical validation of the model was reported in 38 (69%) papers. It was performed using a separate dataset from the primary institution (internal test) in 22 (40%), an independent dataset from another institution (external test) in 14 (25%) and both in 2 (4%) studies. CONCLUSIONS: Compared to papers published up to 2020, a clear improvement was noted with almost double publications reporting methodological aspects related to reproducibility and validation. Larger multicenter investigations including external clinical validation and the publication of databases in open-access repositories could further improve methodology and bring radiomics from a research area to the clinical stage. CRITICAL RELEVANCE STATEMENT: An improvement in feature reproducibility and model validation strategies has been shown in this updated systematic review on radiomics of bone and soft-tissue sarcomas, highlighting efforts to enhance methodology and bring radiomics from a research area to the clinical stage. KEY POINTS: • 2021-2023 radiomic studies on CT and MRI of musculoskeletal sarcomas were reviewed. • Feature reproducibility was assessed in more than half (59%) of the studies. • Model clinical validation was performed in 69% of the studies. • Internal (44%) and/or external (29%) test datasets were employed for clinical validation.

5.
Insights Imaging ; 15(1): 15, 2024 Jan 17.
Article in English | MEDLINE | ID: mdl-38228800

ABSTRACT

OBJECTIVES: To present a framework to develop and implement a fast-track artificial intelligence (AI) curriculum into an existing radiology residency program, with the potential to prepare a new generation of AI conscious radiologists. METHODS: The AI-curriculum framework comprises five sequential steps: (1) forming a team of AI experts, (2) assessing the residents' knowledge level and needs, (3) defining learning objectives, (4) matching these objectives with effective teaching strategies, and finally (5) implementing and evaluating the pilot. Following these steps, a multidisciplinary team of AI engineers, radiologists, and radiology residents designed a 3-day program, including didactic lectures, hands-on laboratory sessions, and group discussions with experts to enhance AI understanding. Pre- and post-curriculum surveys were conducted to assess participants' expectations and progress and were analyzed using a Wilcoxon rank-sum test. RESULTS: There was 100% response rate to the pre- and post-curriculum survey (17 and 12 respondents, respectively). Participants' confidence in their knowledge and understanding of AI in radiology significantly increased after completing the program (pre-curriculum means 3.25 ± 1.48 (SD), post-curriculum means 6.5 ± 0.90 (SD), p-value = 0.002). A total of 75% confirmed that the course addressed topics that were applicable to their work in radiology. Lectures on the fundamentals of AI and group discussions with experts were deemed most useful. CONCLUSION: Designing an AI curriculum for radiology residents and implementing it into a radiology residency program is feasible using the framework presented. The 3-day AI curriculum effectively increased participants' perception of knowledge and skills about AI in radiology and can serve as a starting point for further customization. CRITICAL RELEVANCE STATEMENT: The framework provides guidance for developing and implementing an AI curriculum in radiology residency programs, educating residents on the application of AI in radiology and ultimately contributing to future high-quality, safe, and effective patient care. KEY POINTS: • AI education is necessary to prepare a new generation of AI-conscious radiologists. • The AI curriculum increased participants' perception of AI knowledge and skills in radiology. • This five-step framework can assist integrating AI education into radiology residency programs.

6.
Eur Radiol ; 34(3): 2084-2092, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37658141

ABSTRACT

OBJECTIVES: To develop a deep learning-based method for contrast-enhanced breast lesion detection in ultrafast screening MRI. MATERIALS AND METHODS: A total of 837 breast MRI exams of 488 consecutive patients were included. Lesion's location was independently annotated in the maximum intensity projection (MIP) image of the last time-resolved angiography with stochastic trajectories (TWIST) sequence for each individual breast, resulting in 265 lesions (190 benign, 75 malignant) in 163 breasts (133 women). YOLOv5 models were fine-tuned using training sets containing the same number of MIP images with and without lesions. A long short-term memory (LSTM) network was employed to help reduce false positive predictions. The integrated system was then evaluated on test sets containing enriched uninvolved breasts during cross-validation to mimic the performance in a screening scenario. RESULTS: In five-fold cross-validation, the YOLOv5x model showed a sensitivity of 0.95, 0.97, 0.98, and 0.99, with 0.125, 0.25, 0.5, and 1 false positive per breast, respectively. The LSTM network reduced 15.5% of the false positive prediction from the YOLO model, and the positive predictive value was increased from 0.22 to 0.25. CONCLUSIONS: A fine-tuned YOLOv5x model can detect breast lesions on ultrafast MRI with high sensitivity in a screening population, and the output of the model could be further refined by an LSTM network to reduce the amount of false positive predictions. CLINICAL RELEVANCE STATEMENT: The proposed integrated system would make the ultrafast MRI screening process more effective by assisting radiologists in prioritizing suspicious examinations and supporting the diagnostic workup. KEY POINTS: • Deep convolutional neural networks could be utilized to automatically pinpoint breast lesions in screening MRI with high sensitivity. • False positive predictions significantly increased when the detection models were tested on highly unbalanced test sets with more normal scans. • Dynamic enhancement patterns of breast lesions during contrast inflow learned by the long short-term memory networks helped to reduce false positive predictions.


Subject(s)
Breast Neoplasms , Contrast Media , Female , Humans , Contrast Media/pharmacology , Breast/pathology , Magnetic Resonance Imaging/methods , Neural Networks, Computer , Time , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/pathology
7.
IEEE Trans Med Imaging ; 43(1): 216-228, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37428657

ABSTRACT

Karyotyping is of importance for detecting chromosomal aberrations in human disease. However, chromosomes easily appear curved in microscopic images, which prevents cytogeneticists from analyzing chromosome types. To address this issue, we propose a framework for chromosome straightening, which comprises a preliminary processing algorithm and a generative model called masked conditional variational autoencoders (MC-VAE). The processing method utilizes patch rearrangement to address the difficulty in erasing low degrees of curvature, providing reasonable preliminary results for the MC-VAE. The MC-VAE further straightens the results by leveraging chromosome patches conditioned on their curvatures to learn the mapping between banding patterns and conditions. During model training, we apply a masking strategy with a high masking ratio to train the MC-VAE with eliminated redundancy. This yields a non-trivial reconstruction task, allowing the model to effectively preserve chromosome banding patterns and structure details in the reconstructed results. Extensive experiments on three public datasets with two stain styles show that our framework surpasses the performance of state-of-the-art methods in retaining banding patterns and structure details. Compared to using real-world bent chromosomes, the use of high-quality straightened chromosomes generated by our proposed method can improve the performance of various deep learning models for chromosome classification by a large margin. Such a straightening approach has the potential to be combined with other karyotyping systems to assist cytogeneticists in chromosome analysis.


Subject(s)
Algorithms , Chromosomes , Humans , Karyotyping , Chromosome Banding
8.
Eur Radiol ; 34(4): 2791-2804, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37733025

ABSTRACT

OBJECTIVES: To investigate the intra- and inter-rater reliability of the total radiomics quality score (RQS) and the reproducibility of individual RQS items' score in a large multireader study. METHODS: Nine raters with different backgrounds were randomly assigned to three groups based on their proficiency with RQS utilization: Groups 1 and 2 represented the inter-rater reliability groups with or without prior training in RQS, respectively; group 3 represented the intra-rater reliability group. Thirty-three original research papers on radiomics were evaluated by raters of groups 1 and 2. Of the 33 papers, 17 were evaluated twice with an interval of 1 month by raters of group 3. Intraclass coefficient (ICC) for continuous variables, and Fleiss' and Cohen's kappa (k) statistics for categorical variables were used. RESULTS: The inter-rater reliability was poor to moderate for total RQS (ICC 0.30-055, p < 0.001) and very low to good for item's reproducibility (k - 0.12 to 0.75) within groups 1 and 2 for both inexperienced and experienced raters. The intra-rater reliability for total RQS was moderate for the less experienced rater (ICC 0.522, p = 0.009), whereas experienced raters showed excellent intra-rater reliability (ICC 0.91-0.99, p < 0.001) between the first and second read. Intra-rater reliability on RQS items' score reproducibility was higher and most of the items had moderate to good intra-rater reliability (k - 0.40 to 1). CONCLUSIONS: Reproducibility of the total RQS and the score of individual RQS items is low. There is a need for a robust and reproducible assessment method to assess the quality of radiomics research. CLINICAL RELEVANCE STATEMENT: There is a need for reproducible scoring systems to improve quality of radiomics research and consecutively close the translational gap between research and clinical implementation. KEY POINTS: • Radiomics quality score has been widely used for the evaluation of radiomics studies. • Although the intra-rater reliability was moderate to excellent, intra- and inter-rater reliability of total score and point-by-point scores were low with radiomics quality score. • A robust, easy-to-use scoring system is needed for the evaluation of radiomics research.


Subject(s)
Radiomics , Reading , Humans , Observer Variation , Reproducibility of Results
9.
Comput Methods Programs Biomed ; 244: 107939, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38008678

ABSTRACT

BACKGROUND AND OBJECTIVE: Recently, deep learning (DL) algorithms showed to be promising in predicting outcomes such as distant metastasis-free survival (DMFS) and overall survival (OS) using pre-treatment imaging in head and neck cancer. Gross Tumor Volume of the primary tumor (GTVp) segmentation is used as an additional channel in the input to DL algorithms to improve model performance. However, the binary segmentation mask of the GTVp directs the focus of the network to the defined tumor region only and uniformly. DL models trained for tumor segmentation have also been used to generate predicted tumor probability maps (TPM) where each pixel value corresponds to the degree of certainty of that pixel to be classified as tumor. The aim of this study was to explore the effect of using TPM as an extra input channel of CT- and PET-based DL prediction models for oropharyngeal cancer (OPC) patients in terms of local control (LC), regional control (RC), DMFS and OS. METHODS: We included 399 OPC patients from our institute that were treated with definitive (chemo)radiation. For each patient, CT and PET scans and GTVp contours, used for radiotherapy treatment planning, were collected. We first trained a previously developed 2.5D DL framework for tumor probability prediction by 5-fold cross validation using 131 patients. Then, a 3D ResNet18 was trained for outcome prediction using the 3D TPM as one of the possible inputs. The endpoints were LC, RC, DMFS, and OS. We performed 3-fold cross validation on 168 patients for each endpoint using different combinations of image modalities as input. The final prediction in the test set (100) was obtained by averaging the predictions of the 3-fold models. The C-index was used to evaluate the discriminative performance of the models. RESULTS: The models trained replacing the GTVp contours with the TPM achieved the highest C-indexes for LC (0.74) and RC (0.60) prediction. For OS, using the TPM or the GTVp as additional image modality resulted in comparable C-indexes (0.72 and 0.74). CONCLUSIONS: Adding predicted TPMs instead of GTVp contours as an additional input channel for DL-based outcome prediction models improved model performance for LC and RC.


Subject(s)
Deep Learning , Head and Neck Neoplasms , Oropharyngeal Neoplasms , Humans , Positron Emission Tomography Computed Tomography/methods , Oropharyngeal Neoplasms/diagnostic imaging , Prognosis
10.
Comput Biol Med ; 169: 107871, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38154157

ABSTRACT

BACKGROUND: During lung cancer screening, indeterminate pulmonary nodules (IPNs) are a frequent finding. We aim to predict whether IPNs are resolving or non-resolving to reduce follow-up examinations, using machine learning (ML) models. We incorporated dedicated techniques to enhance prediction explainability. METHODS: In total, 724 IPNs (size 50-500 mm3, 575 participants) from the Dutch-Belgian Randomized Lung Cancer Screening Trial were used. We implemented six ML models and 14 factors to predict nodule disappearance. Random search was applied to determine the optimal hyperparameters on the training set (579 nodules). ML models were trained using 5-fold cross-validation and tested on the test set (145 nodules). Model predictions were evaluated by utilizing the recall, precision, F1 score, and the area under the receiver operating characteristic curve (AUC). The best-performing model was used for three feature importance techniques: mean decrease in impurity (MDI), permutation feature importance (PFI), and SHAPley Additive exPlanations (SHAP). RESULTS: The random forest model outperformed the other ML models with an AUC of 0.865. This model achieved a recall of 0.646, a precision of 0.816, and an F1 score of 0.721. The evaluation of feature importance achieved consistent ranking across all three methods for the most crucial factors. The MDI, PFI, and SHAP methods highlighted volume, maximum diameter, and minimum diameter as the top three factors. However, the remaining factors revealed discrepant ranking across methods. CONCLUSION: ML models effectively predict IPN disappearance using participant demographics and nodule characteristics. Explainable techniques can assist clinicians in developing understandable preliminary assessments.


Subject(s)
Lung Neoplasms , Humans , Early Detection of Cancer , Machine Learning , ROC Curve , Randomized Controlled Trials as Topic
11.
BJR Open ; 5(1): 20230033, 2023.
Article in English | MEDLINE | ID: mdl-37953871

ABSTRACT

Artificial intelligence (AI) has transitioned from the lab to the bedside, and it is increasingly being used in healthcare. Radiology and Radiography are on the frontline of AI implementation, because of the use of big data for medical imaging and diagnosis for different patient groups. Safe and effective AI implementation requires that responsible and ethical practices are upheld by all key stakeholders, that there is harmonious collaboration between different professional groups, and customised educational provisions for all involved. This paper outlines key principles of ethical and responsible AI, highlights recent educational initiatives for clinical practitioners and discusses the synergies between all medical imaging professionals as they prepare for the digital future in Europe. Responsible and ethical AI is vital to enhance a culture of safety and trust for healthcare professionals and patients alike. Educational and training provisions for medical imaging professionals on AI is central to the understanding of basic AI principles and applications and there are many offerings currently in Europe. Education can facilitate the transparency of AI tools, but more formalised, university-led training is needed to ensure the academic scrutiny, appropriate pedagogy, multidisciplinarity and customisation to the learners' unique needs are being adhered to. As radiographers and radiologists work together and with other professionals to understand and harness the benefits of AI in medical imaging, it becomes clear that they are faced with the same challenges and that they have the same needs. The digital future belongs to multidisciplinary teams that work seamlessly together, learn together, manage risk collectively and collaborate for the benefit of the patients they serve.

12.
Phys Imaging Radiat Oncol ; 28: 100502, 2023 Oct.
Article in English | MEDLINE | ID: mdl-38026084

ABSTRACT

Background and purpose: To compare the prediction performance of image features of computed tomography (CT) images extracted by radiomics, self-supervised learning and end-to-end deep learning for local control (LC), regional control (RC), locoregional control (LRC), distant metastasis-free survival (DMFS), tumor-specific survival (TSS), overall survival (OS) and disease-free survival (DFS) of oropharyngeal squamous cell carcinoma (OPSCC) patients after (chemo)radiotherapy. Methods and materials: The OPC-Radiomics dataset was used for model development and independent internal testing and the UMCG-OPC set for external testing. Image features were extracted from the Gross Tumor Volume contours of the primary tumor (GTVt) regions in CT scans when using radiomics or a self-supervised learning-based method (autoencoder). Clinical and combined (radiomics, autoencoder or end-to-end) models were built using multivariable Cox proportional-hazard analysis with clinical features only and both clinical and image features for LC, RC, LRC, DMFS, TSS, OS and DFS prediction, respectively. Results: In the internal test set, combined autoencoder models performed better than clinical models and combined radiomics models for LC, RC, LRC, DMFS, TSS and DFS prediction (largest improvements in C-index: 0.91 vs. 0.76 in RC and 0.74 vs. 0.60 in DMFS). In the external test set, combined radiomics models performed better than clinical and combined autoencoder models for all endpoints (largest improvements in LC, 0.82 vs. 0.71). Furthermore, combined models performed better in risk stratification than clinical models and showed good calibration for most endpoints. Conclusions: Image features extracted using self-supervised learning showed best internal prediction performance while radiomics features have better external generalizability.

13.
J Magn Reson Imaging ; 2023 Oct 17.
Article in English | MEDLINE | ID: mdl-37846440

ABSTRACT

BACKGROUND: Accurate breast density evaluation allows for more precise risk estimation but suffers from high inter-observer variability. PURPOSE: To evaluate the feasibility of reducing inter-observer variability of breast density assessment through artificial intelligence (AI) assisted interpretation. STUDY TYPE: Retrospective. POPULATION: Six hundred and twenty-one patients without breast prosthesis or reconstructions were randomly divided into training (N = 377), validation (N = 98), and independent test (N = 146) datasets. FIELD STRENGTH/SEQUENCE: 1.5 T and 3.0 T; T1-weighted spectral attenuated inversion recovery. ASSESSMENT: Five radiologists independently assessed each scan in the independent test set to establish the inter-observer variability baseline and to reach a reference standard. Deep learning and three radiomics models were developed for three classification tasks: (i) four Breast Imaging-Reporting and Data System (BI-RADS) breast composition categories (A-D), (ii) dense (categories C, D) vs. non-dense (categories A, B), and (iii) extremely dense (category D) vs. moderately dense (categories A-C). The models were tested against the reference standard on the independent test set. AI-assisted interpretation was performed by majority voting between the models and each radiologist's assessment. STATISTICAL TESTS: Inter-observer variability was assessed using linear-weighted kappa (κ) statistics. Kappa statistics, accuracy, and area under the receiver operating characteristic curve (AUC) were used to assess models against reference standard. RESULTS: In the independent test set, five readers showed an overall substantial agreement on tasks (i) and (ii), but moderate agreement for task (iii). The best-performing model showed substantial agreement with reference standard for tasks (i) and (ii), but moderate agreement for task (iii). With the assistance of the AI models, almost perfect inter-observer variability was obtained for tasks (i) (mean κ = 0.86), (ii) (mean κ = 0.94), and (iii) (mean κ = 0.94). DATA CONCLUSION: Deep learning and radiomics models have the potential to help reduce inter-observer variability of breast density assessment. LEVEL OF EVIDENCE: 3 TECHNICAL EFFICACY: Stage 1.

14.
Heliyon ; 9(6): e17104, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37484314

ABSTRACT

BACKGROUND: Deep learning is an important means to realize the automatic detection, segmentation, and classification of pulmonary nodules in computed tomography (CT) images. An entire CT scan cannot directly be used by deep learning models due to image size, image format, image dimensionality, and other factors. Between the acquisition of the CT scan and feeding the data into the deep learning model, there are several steps including data use permission, data access and download, data annotation, and data preprocessing. This paper aims to recommend a complete and detailed guide for researchers who want to engage in interdisciplinary lung nodule research of CT images and Artificial Intelligence (AI) engineering. METHODS: The data preparation pipeline used the following four popular large-scale datasets: LIDC-IDRI (Lung Image Database Consortium image collection), LUNA16 (Lung Nodule Analysis 2016), NLST (National Lung Screening Trial) and NELSON (The Dutch-Belgian Randomized Lung Cancer Screening Trial). The dataset preparation is presented in chronological order. FINDINGS: The different data preparation steps before deep learning were identified. These include both more generic steps and steps dedicated to lung nodule research. For each of these steps, the required process, necessity, and example code or tools for actual implementation are provided. DISCUSSION AND CONCLUSION: Depending on the specific research question, researchers should be aware of the various preparation steps required and carefully select datasets, data annotation methods, and image preprocessing methods. Moreover, it is vital to acknowledge that each auxiliary tool or code has its specific scope of use and limitations. This paper proposes a standardized data preparation process while clearly demonstrating the principles and sequence of different steps. A data preparation pipeline can be quickly realized by following these proposed steps and implementing the suggested example codes and tools.

15.
Med Phys ; 50(10): 6190-6200, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37219816

ABSTRACT

BACKGROUND: Personalized treatment is increasingly required for oropharyngeal squamous cell carcinoma (OPSCC) patients due to emerging new cancer subtypes and treatment options. Outcome prediction model can help identify low or high-risk patients who may be suitable to receive de-escalation or intensified treatment approaches. PURPOSE: To develop a deep learning (DL)-based model for predicting multiple and associated efficacy endpoints in OPSCC patients based on computed tomography (CT). METHODS: Two patient cohorts were used in this study: a development cohort consisting of 524 OPSCC patients (70% for training and 30% for independent testing) and an external test cohort of 396 patients. Pre-treatment CT-scans with the gross primary tumor volume contours (GTVt) and clinical parameters were available to predict endpoints, including 2-year local control (LC), regional control (RC), locoregional control (LRC), distant metastasis-free survival (DMFS), disease-specific survival (DSS), overall survival (OS), and disease-free survival (DFS). We proposed DL outcome prediction models with the multi-label learning (MLL) strategy that integrates the associations of different endpoints based on clinical factors and CT-scans. RESULTS: The multi-label learning models outperformed the models that were developed based on a single endpoint for all endpoints especially with high AUCs ≥ 0.80 for 2-year RC, DMFS, DSS, OS, and DFS in the internal independent test set and for all endpoints except 2-year LRC in the external test set. Furthermore, with the models developed, patients could be stratified into high and low-risk groups that were significantly different for all endpoints in the internal test set and for all endpoints except DMFS in the external test set. CONCLUSION: MLL models demonstrated better discriminative ability for all 2-year efficacy endpoints than single outcome models in the internal test and for all endpoints except LRC in the external set.


Subject(s)
Carcinoma, Squamous Cell , Head and Neck Neoplasms , Oropharyngeal Neoplasms , Humans , Squamous Cell Carcinoma of Head and Neck , Carcinoma, Squamous Cell/diagnostic imaging , Carcinoma, Squamous Cell/therapy , Tomography, X-Ray Computed , Disease-Free Survival , Oropharyngeal Neoplasms/diagnostic imaging , Oropharyngeal Neoplasms/therapy , Retrospective Studies
16.
J Digit Imaging ; 36(4): 1460-1479, 2023 08.
Article in English | MEDLINE | ID: mdl-37145248

ABSTRACT

An automated diagnosis system is crucial for helping radiologists identify brain abnormalities efficiently. The convolutional neural network (CNN) algorithm of deep learning has the advantage of automated feature extraction beneficial for an automated diagnosis system. However, several challenges in the CNN-based classifiers of medical images, such as a lack of labeled data and class imbalance problems, can significantly hinder the performance. Meanwhile, the expertise of multiple clinicians may be required to achieve accurate diagnoses, which can be reflected in the use of multiple algorithms. In this paper, we present Deep-Stacked CNN, a deep heterogeneous model based on stacked generalization to harness the advantages of different CNN-based classifiers. The model aims to improve robustness in the task of multi-class brain disease classification when we have no opportunity to train single CNNs on sufficient data. We propose two levels of learning processes to obtain the desired model. At the first level, different pre-trained CNNs fine-tuned via transfer learning will be selected as the base classifiers through several procedures. Each base classifier has a unique expert-like character, which provides diversity to the diagnosis outcomes. At the second level, the base classifiers are stacked together through neural network, representing the meta-learner that best combines their outputs and generates the final prediction. The proposed Deep-Stacked CNN obtained an accuracy of 99.14% when evaluated on the untouched dataset. This model shows its superiority over existing methods in the same domain. It also requires fewer parameters and computations while maintaining outstanding performance.


Subject(s)
Brain Diseases , Neural Networks, Computer , Humans , Magnetic Resonance Imaging/methods , Algorithms , Brain Diseases/diagnostic imaging , Brain/diagnostic imaging
17.
Phys Med Biol ; 68(5)2023 02 23.
Article in English | MEDLINE | ID: mdl-36749988

ABSTRACT

Objective. Tumor segmentation is a fundamental step for radiotherapy treatment planning. To define an accurate segmentation of the primary tumor (GTVp) of oropharyngeal cancer patients (OPC) each image volume is explored slice-by-slice from different orientations on different image modalities. However, the manual fixed boundary of segmentation neglects the spatial uncertainty known to occur in tumor delineation. This study proposes a novel deep learning-based method that generates probability maps which capture the model uncertainty in the segmentation task.Approach. We included 138 OPC patients treated with (chemo)radiation in our institute. Sequences of 3 consecutive 2D slices of concatenated FDG-PET/CT images and GTVp contours were used as input. Our framework exploits inter and intra-slice context using attention mechanisms and bi-directional long short term memory (Bi-LSTM). Each slice resulted in three predictions that were averaged. A 3-fold cross validation was performed on sequences extracted from the axial, sagittal, and coronal plane. 3D volumes were reconstructed and single- and multi-view ensembling were performed to obtain final results. The output is a tumor probability map determined by averaging multiple predictions.Main Results. Model performance was assessed on 25 patients at different probability thresholds. Predictions were the closest to the GTVp at a threshold of 0.9 (mean surface DSC of 0.81, median HD95of 3.906 mm).Significance. The promising results of the proposed method show that is it possible to offer the probability maps to radiation oncologists to guide them in a in a slice-by-slice adaptive GTVp segmentation.


Subject(s)
Deep Learning , Head and Neck Neoplasms , Oropharyngeal Neoplasms , Humans , Fluorodeoxyglucose F18 , Positron Emission Tomography Computed Tomography , Tomography, X-Ray Computed/methods , Probability , Image Processing, Computer-Assisted/methods
18.
Radiother Oncol ; 180: 109483, 2023 03.
Article in English | MEDLINE | ID: mdl-36690302

ABSTRACT

BACKGROUND AND PURPOSE: The aim of this study was to develop and evaluate a prediction model for 2-year overall survival (OS) in stage I-IIIA non-small cell lung cancer (NSCLC) patients who received definitive radiotherapy by considering clinical variables and image features from pre-treatment CT-scans. MATERIALS AND METHODS: NSCLC patients who received stereotactic radiotherapy were prospectively collected at the UMCG and split into a training and a hold out test set including 189 and 81 patients, respectively. External validation was performed on 228 NSCLC patients who were treated with radiation or concurrent chemoradiation at the Maastro clinic (Lung1 dataset). A hybrid model that integrated both image and clinical features was implemented using deep learning. Image features were learned from cubic patches containing lung tumours extracted from pre-treatment CT scans. Relevant clinical variables were selected by univariable and multivariable analyses. RESULTS: Multivariable analysis showed that age and clinical stage were significant prognostic clinical factors for 2-year OS. Using these two clinical variables in combination with image features from pre-treatment CT scans, the hybrid model achieved a median AUC of 0.76 [95 % CI: 0.65-0.86] and 0.64 [95 % CI: 0.58-0.70] on the complete UMCG and Maastro test sets, respectively. The Kaplan-Meier survival curves showed significant separation between low and high mortality risk groups on these two test sets (log-rank test: p-value < 0.001, p-value = 0.012, respectively) CONCLUSION: We demonstrated that a hybrid model could achieve reasonable performance by utilizing both clinical and image features for 2-year OS prediction. Such a model has the potential to identify patients with high mortality risk and guide clinical decision making.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Deep Learning , Lung Neoplasms , Humans , Carcinoma, Non-Small-Cell Lung/therapy , Carcinoma, Non-Small-Cell Lung/drug therapy , Lung Neoplasms/pathology , Neoplasm Staging , Tomography, X-Ray Computed/methods , Retrospective Studies
19.
Eur Radiol ; 33(3): 2239-2247, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36303093

ABSTRACT

OBJECTIVE: To evaluate the methodological rigor of radiomics-based studies using noninvasive imaging in ovarian setting. METHODS: Multiple medical literature archives (PubMed, Web of Science, and Scopus) were searched to retrieve original studies focused on computed tomography (CT), magnetic resonance imaging (MRI), ultrasound (US), or positron emission tomography (PET) radiomics for ovarian disorders' assessment. Two researchers in consensus evaluated each investigation using the radiomics quality score (RQS). Subgroup analyses were performed to assess whether the total RQS varied according to first author category, study aim and topic, imaging modality, and journal quartile. RESULTS: From a total of 531 items, 63 investigations were finally included in the analysis. The studies were greatly focused (94%) on the field of oncology, with CT representing the most used imaging technique (41%). Overall, the papers achieved a median total RQS 6 (IQR, -0.5 to 11), corresponding to a percentage of 16.7% of the maximum score (IQR, 0-30.6%). The scoring was low especially due to the lack of prospective design and formal validation of the results. At subgroup analysis, the 4 studies not focused on oncological topic showed significantly lower quality scores than the others. CONCLUSIONS: The overall methodological rigor of radiomics studies in the ovarian field is still not ideal, limiting the reproducibility of results and potential translation to clinical setting. More efforts towards a standardized methodology in the workflow are needed to allow radiomics to become a viable tool for clinical decision-making. KEY POINTS: • The 63 included studies using noninvasive imaging for ovarian applications were mostly focused on oncologic topic (94%). • The included investigations achieved a median total RQS 6 (IQR, -0.5 to 11), indicating poor methodological rigor. • The RQS was low especially due to the lack of prospective design and formal validation of the results.


Subject(s)
Magnetic Resonance Imaging , Tomography, X-Ray Computed , Humans , Reproducibility of Results , Tomography, X-Ray Computed/methods , Magnetic Resonance Imaging/methods , Positron-Emission Tomography , Ultrasonography
20.
Health Informatics J ; 28(4): 14604582221131198, 2022.
Article in English | MEDLINE | ID: mdl-36227062

ABSTRACT

BACKGROUND: Radiology requests and reports contain valuable information about diagnostic findings and indications, and transformer-based language models are promising for more accurate text classification. METHODS: In a retrospective study, 2256 radiologist-annotated radiology requests (8 classes) and reports (10 classes) were divided into training and testing datasets (90% and 10%, respectively) and used to train 32 models. Performance metrics were compared by model type (LSTM, Bertje, RobBERT, BERT-clinical, BERT-multilingual, BERT-base), text length, data prevalence, and training strategy. The best models were used to predict the remaining 40,873 cases' categories of the datasets of requests and reports. RESULTS: The RobBERT model performed the best after 4000 training iterations, resulting in AUC values ranging from 0.808 [95% CI (0.757-0.859)] to 0.976 [95% CI (0.956-0.996)] for the requests and 0.746 [95% CI (0.689-0.802)] to 1.0 [95% CI (1.0-1.0)] for the reports. The AUC for the classification of normal reports was 0.95 [95% CI (0.922-0.979)]. The predicted data demonstrated variability of both diagnostic yield for various request classes and request patterns related to COVID-19 hospital admission data. CONCLUSION: Transformer-based natural language processing is feasible for the multilabel classification of chest imaging request and report items. Diagnostic yield varies with the information in the requests.


Subject(s)
COVID-19 , Radiology , COVID-19/diagnostic imaging , Humans , Natural Language Processing , Research Report , Retrospective Studies
SELECTION OF CITATIONS
SEARCH DETAIL
...