Búsqueda | Portal Regional de la BVS

1.

End-to-end reproducible AI pipelines in radiology using the cloud.

Bontempi, Dennis; Nuernberg, Leonard; Pai, Suraj; Krishnaswamy, Deepa; Thiriveedhi, Vamsi; Hosny, Ahmed; Mak, Raymond H; Farahani, Keyvan; Kikinis, Ron; Fedorov, Andrey; Aerts, Hugo J W L.

Nat Commun ; 15(1): 6931, 2024 Aug 13.

Artículo en Inglés | MEDLINE | ID: mdl-39138215

RESUMEN

Artificial intelligence (AI) algorithms hold the potential to revolutionize radiology. However, a significant portion of the published literature lacks transparency and reproducibility, which hampers sustained progress toward clinical translation. Although several reporting guidelines have been proposed, identifying practical means to address these issues remains challenging. Here, we show the potential of cloud-based infrastructure for implementing and sharing transparent and reproducible AI-based radiology pipelines. We demonstrate end-to-end reproducibility from retrieving cloud-hosted data, through data pre-processing, deep learning inference, and post-processing, to the analysis and reporting of the final results. We successfully implement two distinct use cases, starting from recent literature on AI-based biomarkers for cancer imaging. Using cloud-hosted data and computing, we confirm the findings of these studies and extend the validation to previously unseen data for one of the use cases. Furthermore, we provide the community with transparent and easy-to-extend examples of pipelines impactful for the broader oncology field. Our approach demonstrates the potential of cloud resources for implementing, sharing, and using reproducible and transparent AI pipelines, which can accelerate the translation into clinical solutions.

Asunto(s)

Inteligencia Artificial , Nube Computacional , Humanos , Reproducibilidad de los Resultados , Aprendizaje Profundo , Radiología/métodos , Radiología/normas , Algoritmos , Neoplasias/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos

2.

Longitudinal risk prediction for pediatric glioma with temporal deep learning.

Tak, Divyanshu; Garomsa, Biniam A; Zapaishchykova, Anna; Ye, Zezhong; Vajapeyam, Sri; Mahootiha, Maryam; Climent Pardo, Juan Carlos; Smith, Ceilidh; Familiar, Ariana M; Chaunzwa, Tafadzwa; Liu, Kevin X; Prabhu, Sanjay; Bandopadhayay, Pratiti; Nabavizadeh, Ali; Mueller, Sabine; Aerts, Hugo Jwl; Haas-Kogan, Daphne; Poussaint, Tina Y; Kann, Benjamin H.

medRxiv ; 2024 Jun 28.

Artículo en Inglés | MEDLINE | ID: mdl-38978642

RESUMEN

Pediatric glioma recurrence can cause morbidity and mortality; however, recurrence pattern and severity are heterogeneous and challenging to predict with established clinical and genomic markers. Resultingly, almost all children undergo frequent, long-term, magnetic resonance (MR) brain surveillance regardless of individual recurrence risk. Deep learning analysis of longitudinal MR may be an effective approach for improving individualized recurrence prediction in gliomas and other cancers but has thus far been infeasible with current frameworks. Here, we propose a self-supervised, deep learning approach to longitudinal medical imaging analysis, temporal learning, that models the spatiotemporal information from a patient's current and prior brain MRs to predict future recurrence. We apply temporal learning to pediatric glioma surveillance imaging for 715 patients (3,994 scans) from four distinct clinical settings. We find that longitudinal imaging analysis with temporal learning improves recurrence prediction performance by up to 41% compared to traditional approaches, with improvements in performance in both low- and high-grade glioma. We find that recurrence prediction accuracy increases incrementally with the number of historical scans available per patient. Temporal deep learning may enable point-of-care decision-support for pediatric brain tumors and be adaptable more broadly to patients with other cancers and chronic diseases undergoing surveillance imaging.

3.

Open Access Data and Deep Learning for Cardiac Device Identification on Standard DICOM and Smartphone-based Chest Radiographs.

Busch, Felix; Bressem, Keno K; Suwalski, Phillip; Hoffmann, Lena; Niehues, Stefan M; Poddubnyy, Denis; Makowski, Marcus R; Aerts, Hugo J W L; Zhukov, Andrei; Adams, Lisa C.

Radiol Artif Intell ; 6(5): e230502, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-39017033

RESUMEN

Purpose To develop and evaluate a publicly available deep learning model for segmenting and classifying cardiac implantable electronic devices (CIEDs) on Digital Imaging and Communications in Medicine (DICOM) and smartphone-based chest radiographs. Materials and Methods This institutional review board-approved retrospective study included patients with implantable pacemakers, cardioverter defibrillators, cardiac resynchronization therapy devices, and cardiac monitors who underwent chest radiography between January 2012 and January 2022. A U-Net model with a ResNet-50 backbone was created to classify CIEDs on DICOM and smartphone images. Using 2321 chest radiographs in 897 patients (median age, 76 years [range, 18-96 years]; 625 male, 272 female), CIEDs were categorized into four manufacturers, 27 models, and one "other" category. Five smartphones were used to acquire 11 072 images. Performance was reported using the Dice coefficient on the validation set for segmentation or balanced accuracy on the test set for manufacturer and model classification, respectively. Results The segmentation tool achieved a mean Dice coefficient of 0.936 (IQR: 0.890-0.958). The model had an accuracy of 94.36% (95% CI: 90.93%, 96.84%; 251 of 266) for CIED manufacturer classification and 84.21% (95% CI: 79.31%, 88.30%; 224 of 266) for CIED model classification. Conclusion The proposed deep learning model, trained on both traditional DICOM and smartphone images, showed high accuracy for segmentation and classification of CIEDs on chest radiographs. Keywords: Conventional Radiography, Segmentation Supplemental material is available for this article. © RSNA, 2024 See also the commentary by Júdice de Mattos Farina and Celi in this issue.

Asunto(s)

Aprendizaje Profundo , Desfibriladores Implantables , Radiografía Torácica , Teléfono Inteligente , Humanos , Anciano , Femenino , Masculino , Adolescente , Radiografía Torácica/normas , Persona de Mediana Edad , Anciano de 80 o más Años , Estudios Retrospectivos , Adulto , Adulto Joven , Marcapaso Artificial

4.

Stepwise Transfer Learning for Expert-level Pediatric Brain Tumor MRI Segmentation in a Limited Data Scenario.

Boyd, Aidan; Ye, Zezhong; Prabhu, Sanjay P; Tjong, Michael C; Zha, Yining; Zapaishchykova, Anna; Vajapeyam, Sridhar; Catalano, Paul J; Hayat, Hasaan; Chopra, Rishi; Liu, Kevin X; Nabavizadeh, Ali; Resnick, Adam C; Mueller, Sabine; Haas-Kogan, Daphne A; Aerts, Hugo J W L; Poussaint, Tina Y; Kann, Benjamin H.

Radiol Artif Intell ; 6(4): e230254, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38984985

RESUMEN

Purpose To develop, externally test, and evaluate clinical acceptability of a deep learning pediatric brain tumor segmentation model using stepwise transfer learning. Materials and Methods In this retrospective study, the authors leveraged two T2-weighted MRI datasets (May 2001 through December 2015) from a national brain tumor consortium (n = 184; median age, 7 years [range, 1-23 years]; 94 male patients) and a pediatric cancer center (n = 100; median age, 8 years [range, 1-19 years]; 47 male patients) to develop and evaluate deep learning neural networks for pediatric low-grade glioma segmentation using a stepwise transfer learning approach to maximize performance in a limited data scenario. The best model was externally tested on an independent test set and subjected to randomized blinded evaluation by three clinicians, wherein they assessed clinical acceptability of expert- and artificial intelligence (AI)-generated segmentations via 10-point Likert scales and Turing tests. Results The best AI model used in-domain stepwise transfer learning (median Dice score coefficient, 0.88 [IQR, 0.72-0.91] vs 0.812 [IQR, 0.56-0.89] for baseline model; P = .049). With external testing, the AI model yielded excellent accuracy using reference standards from three clinical experts (median Dice similarity coefficients: expert 1, 0.83 [IQR, 0.75-0.90]; expert 2, 0.81 [IQR, 0.70-0.89]; expert 3, 0.81 [IQR, 0.68-0.88]; mean accuracy, 0.82). For clinical benchmarking (n = 100 scans), experts rated AI-based segmentations higher on average compared with other experts (median Likert score, 9 [IQR, 7-9] vs 7 [IQR 7-9]) and rated more AI segmentations as clinically acceptable (80.2% vs 65.4%). Experts correctly predicted the origin of AI segmentations in an average of 26.0% of cases. Conclusion Stepwise transfer learning enabled expert-level automated pediatric brain tumor autosegmentation and volumetric measurement with a high level of clinical acceptability. Keywords: Stepwise Transfer Learning, Pediatric Brain Tumors, MRI Segmentation, Deep Learning Supplemental material is available for this article. © RSNA, 2024.

Asunto(s)

Neoplasias Encefálicas , Aprendizaje Profundo , Imagen por Resonancia Magnética , Humanos , Niño , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/patología , Imagen por Resonancia Magnética/métodos , Masculino , Adolescente , Preescolar , Estudios Retrospectivos , Femenino , Lactante , Adulto Joven , Glioma/diagnóstico por imagen , Glioma/patología , Interpretación de Imagen Asistida por Computador/métodos

5.

Body Composition in Advanced Non-Small Cell Lung Cancer Treated With Immunotherapy.

Chaunzwa, Tafadzwa L; Qian, Jack M; Li, Qin; Ricciuti, Biagio; Nuernberg, Leonard; Johnson, Justin W; Weiss, Jakob; Zhang, Zhongyi; MacKay, Jamie; Kagiampakis, Ioannis; Bikiel, Damian; Di Federico, Alessandro; Alessi, Joao V; Mak, Raymond H; Jacob, Etai; Awad, Mark M; Aerts, Hugo J W L.

JAMA Oncol ; 10(6): 773-783, 2024 Jun 01.

Artículo en Inglés | MEDLINE | ID: mdl-38780929

RESUMEN

Importance: The association between body composition (BC) and cancer outcomes is complex and incompletely understood. Previous research in non-small-cell lung cancer (NSCLC) has been limited to small, single-institution studies and yielded promising, albeit heterogeneous, results. Objectives: To evaluate the association of BC with oncologic outcomes in patients receiving immunotherapy for advanced or metastatic NSCLC. Design, Setting, and Participants: This comprehensive multicohort analysis included clinical data from cohorts receiving treatment at the Dana-Farber Brigham Cancer Center (DFBCC) who received immunotherapy given alone or in combination with chemotherapy and prospectively collected data from the phase 1/2 Study 1108 and the chemotherapy arm of the phase 3 MYSTIC trial. Baseline and follow-up computed tomography (CT) scans were collected and analyzed using deep neural networks for automatic L3 slice selection and body compartment segmentation (skeletal muscle [SM], subcutaneous adipose tissue [SAT], and visceral adipose tissue). Outcomes were compared based on baseline BC measures or their change at the first follow-up scan. The data were analyzed between July 2022 and April 2023. Main Outcomes and Measures: Hazard ratios (HRs) for the association of BC measurements with overall survival (OS) and progression-free survival (PFS). Results: A total of 1791 patients (878 women [49%]) with NSCLC were analyzed, of whom 487 (27.2%) received chemoimmunotherapy at DFBCC (DFBCC-CIO), 825 (46.1%) received ICI monotherapy at DFBCC (DFBCC-IO), 222 (12.4%) were treated with durvalumab monotherapy on Study 1108, and 257 (14.3%) were treated with chemotherapy on MYSTIC; median (IQR) ages were 65 (58-74), 66 (57-71), 65 (26-87), and 63 (30-84) years, respectively. A loss in SM mass, as indicated by a change in the L3 SM area, was associated with worse oncologic outcome across patient groups (HR, 0.59 [95% CI, 0.43-0.81] and 0.61 [95% CI, 0.47-0.79] for OS and PFS, respectively, in DFBCC-CIO; HR, 0.74 [95% CI, 0.60-0.91] for OS in DFBCC-IO; HR, 0.46 [95% CI, 0.33-0.64] and 0.47 [95% CI, 0.34-0.64] for OS and PFS, respectively, in Study 1108; HR, 0.76 [95% CI, 0.61-0.96] for PFS in the MYSTIC trial). This association was most prominent among male patients, with a nonsignificant association among female patients in the MYSTIC trial and DFBCC-CIO cohorts on Kaplan-Meier analysis. An increase of more than 5% in SAT density, as quantified by the average CT attenuation in Hounsfield units of the SAT compartment, was associated with poorer OS in 3 patient cohorts (HR, 0.61 [95% CI, 0.43-0.86] for DFBCC-CIO; HR, 0.62 [95% CI, 0.49-0.79] for DFBCC-IO; and HR, 0.56 [95% CI, 0.40-0.77] for Study 1108). The change in SAT density was also associated with PFS for DFBCC-CIO (HR, 0.73; 95% CI, 0.54-0.97). This was primarily observed in female patients on Kaplan-Meier analysis. Conclusions and Relevance: The results of this multicohort study suggest that loss in SM mass during systemic therapy for NSCLC is a marker of poor outcomes, especially in male patients. SAT density changes are also associated with prognosis, particularly in female patients. Automated CT-derived BC measurements should be considered in determining NSCLC prognosis.

Asunto(s)

Composición Corporal , Carcinoma de Pulmón de Células no Pequeñas , Inmunoterapia , Neoplasias Pulmonares , Humanos , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Carcinoma de Pulmón de Células no Pequeñas/terapia , Carcinoma de Pulmón de Células no Pequeñas/patología , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/terapia , Neoplasias Pulmonares/patología , Neoplasias Pulmonares/mortalidad , Femenino , Masculino , Inmunoterapia/métodos , Persona de Mediana Edad , Anciano , Supervivencia sin Progresión , Adulto

6.

The effect of using a large language model to respond to patient messages.

Chen, Shan; Guevara, Marco; Moningi, Shalini; Hoebers, Frank; Elhalawani, Hesham; Kann, Benjamin H; Chipidza, Fallon E; Leeman, Jonathan; Aerts, Hugo J W L; Miller, Timothy; Savova, Guergana K; Gallifant, Jack; Celi, Leo A; Mak, Raymond H; Lustberg, Maryam; Afshar, Majid; Bitterman, Danielle S.

Lancet Digit Health ; 6(6): e379-e381, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38664108

7.

Deep learning analysis of epicardial adipose tissue to predict cardiovascular risk in heavy smokers.

Foldyna, Borek; Hadzic, Ibrahim; Zeleznik, Roman; Langenbach, Marcel C; Raghu, Vineet K; Mayrhofer, Thomas; Lu, Michael T; Aerts, Hugo J W L.

Commun Med (Lond) ; 4(1): 44, 2024 Mar 13.

Artículo en Inglés | MEDLINE | ID: mdl-38480863

RESUMEN

BACKGROUND: Heavy smokers are at increased risk for cardiovascular disease and may benefit from individualized risk quantification using routine lung cancer screening chest computed tomography. We investigated the prognostic value of deep learning-based automated epicardial adipose tissue quantification and compared it to established cardiovascular risk factors and coronary artery calcium. METHODS: We investigated the prognostic value of automated epicardial adipose tissue quantification in heavy smokers enrolled in the National Lung Screening Trial and followed for 12.3 (11.9-12.8) years. The epicardial adipose tissue was segmented and quantified on non-ECG-synchronized, non-contrast low-dose chest computed tomography scans using a validated deep-learning algorithm. Multivariable survival regression analyses were then utilized to determine the associations of epicardial adipose tissue volume and density with all-cause and cardiovascular mortality (myocardial infarction and stroke). RESULTS: Here we show in 24,090 adult heavy smokers (59% men; 61 ± 5 years) that epicardial adipose tissue volume and density are independently associated with all-cause (adjusted hazard ratios: 1.10 and 1.38; P < 0.001) and cardiovascular mortality (adjusted hazard ratios: 1.14 and 1.78; P < 0.001) beyond demographics, clinical risk factors, body habitus, level of education, and coronary artery calcium score. CONCLUSIONS: Our findings suggest that automated assessment of epicardial adipose tissue from low-dose lung cancer screening images offers prognostic value in heavy smokers, with potential implications for cardiovascular risk stratification in this high-risk population.

Heavy smokers are at increased risk of poor health outcomes, particularly outcomes related to cardiovascular disease. We explore how fat surrounding the heart, known as epicardial adipose tissue, may be an indicator of the health of heavy smokers. We use an artificial intelligence system to measure the heart fat on chest scans of heavy smokers taken during a lung cancer screening trial and following their health for 12 years. We find that higher amounts and denser epicardial adipose tissue are linked to an increased risk of death from any cause, specifically from heart-related issues, even when considering other health factors. This suggests that measuring epicardial adipose tissue during lung cancer screenings could be a valuable tool for identifying heavy smokers at greater risk of heart problems and death, possibly helping to guide their medical management and improve their cardiovascular health.

8.

Noninvasive Molecular Subtyping of Pediatric Low-Grade Glioma with Self-Supervised Transfer Learning.

Tak, Divyanshu; Ye, Zezhong; Zapaischykova, Anna; Zha, Yining; Boyd, Aidan; Vajapeyam, Sridhar; Chopra, Rishi; Hayat, Hasaan; Prabhu, Sanjay P; Liu, Kevin X; Elhalawani, Hesham; Nabavizadeh, Ali; Familiar, Ariana; Resnick, Adam C; Mueller, Sabine; Aerts, Hugo J W L; Bandopadhayay, Pratiti; Ligon, Keith L; Haas-Kogan, Daphne A; Poussaint, Tina Y; Kann, Benjamin H.

Radiol Artif Intell ; 6(3): e230333, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38446044

RESUMEN

Purpose To develop and externally test a scan-to-prediction deep learning pipeline for noninvasive, MRI-based BRAF mutational status classification for pediatric low-grade glioma. Materials and Methods This retrospective study included two pediatric low-grade glioma datasets with linked genomic and diagnostic T2-weighted MRI data of patients: Dana-Farber/Boston Children's Hospital (development dataset, n = 214 [113 (52.8%) male; 104 (48.6%) BRAF wild type, 60 (28.0%) BRAF fusion, and 50 (23.4%) BRAF V600E]) and the Children's Brain Tumor Network (external testing, n = 112 [55 (49.1%) male; 35 (31.2%) BRAF wild type, 60 (53.6%) BRAF fusion, and 17 (15.2%) BRAF V600E]). A deep learning pipeline was developed to classify BRAF mutational status (BRAF wild type vs BRAF fusion vs BRAF V600E) via a two-stage process: (a) three-dimensional tumor segmentation and extraction of axial tumor images and (b) section-wise, deep learning-based classification of mutational status. Knowledge-transfer and self-supervised approaches were investigated to prevent model overfitting, with a primary end point of the area under the receiver operating characteristic curve (AUC). To enhance model interpretability, a novel metric, center of mass distance, was developed to quantify the model attention around the tumor. Results A combination of transfer learning from a pretrained medical imaging-specific network and self-supervised label cross-training (TransferX) coupled with consensus logic yielded the highest classification performance with an AUC of 0.82 (95% CI: 0.72, 0.91), 0.87 (95% CI: 0.61, 0.97), and 0.85 (95% CI: 0.66, 0.95) for BRAF wild type, BRAF fusion, and BRAF V600E, respectively, on internal testing. On external testing, the pipeline yielded an AUC of 0.72 (95% CI: 0.64, 0.86), 0.78 (95% CI: 0.61, 0.89), and 0.72 (95% CI: 0.64, 0.88) for BRAF wild type, BRAF fusion, and BRAF V600E, respectively. Conclusion Transfer learning and self-supervised cross-training improved classification performance and generalizability for noninvasive pediatric low-grade glioma mutational status prediction in a limited data scenario. Keywords: Pediatrics, MRI, CNS, Brain/Brain Stem, Oncology, Feature Detection, Diagnosis, Supervised Learning, Transfer Learning, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2024.

Asunto(s)

Neoplasias Encefálicas , Glioma , Humanos , Niño , Masculino , Femenino , Neoplasias Encefálicas/diagnóstico por imagen , Estudios Retrospectivos , Proteínas Proto-Oncogénicas B-raf/genética , Glioma/diagnóstico , Aprendizaje Automático

9.

Foundation model for cancer imaging biomarkers.

Pai, Suraj; Bontempi, Dennis; Hadzic, Ibrahim; Prudente, Vasco; Sokac, Mateo; Chaunzwa, Tafadzwa L; Bernatz, Simon; Hosny, Ahmed; Mak, Raymond H; Birkbak, Nicolai J; Aerts, Hugo J W L.

Nat Mach Intell ; 6(3): 354-367, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38523679

RESUMEN

Foundation models in deep learning are characterized by a single large-scale model trained on vast amounts of data serving as the foundation for various downstream tasks. Foundation models are generally trained using self-supervised learning and excel in reducing the demand for training samples in downstream applications. This is especially important in medicine, where large labelled datasets are often scarce. Here, we developed a foundation model for cancer imaging biomarker discovery by training a convolutional encoder through self-supervised learning using a comprehensive dataset of 11,467 radiographic lesions. The foundation model was evaluated in distinct and clinically relevant applications of cancer imaging-based biomarkers. We found that it facilitated better and more efficient learning of imaging biomarkers and yielded task-specific models that significantly outperformed conventional supervised and other state-of-the-art pretrained implementations on downstream tasks, especially when training dataset sizes were very limited. Furthermore, the foundation model was more stable to input variations and showed strong associations with underlying biology. Our results demonstrate the tremendous potential of foundation models in discovering new imaging biomarkers that may extend to other clinical use cases and can accelerate the widespread translation of imaging biomarkers into clinical settings.

10.

Deep Learning to Estimate Cardiovascular Risk From Chest Radiographs : A Risk Prediction Study.

Weiss, Jakob; Raghu, Vineet K; Paruchuri, Kaavya; Zinzuwadia, Aniket; Natarajan, Pradeep; Aerts, Hugo J W L; Lu, Michael T.

Ann Intern Med ; 177(4): 409-417, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38527287

RESUMEN

BACKGROUND: Guidelines for primary prevention of atherosclerotic cardiovascular disease (ASCVD) recommend a risk calculator (ASCVD risk score) to estimate 10-year risk for major adverse cardiovascular events (MACE). Because the necessary inputs are often missing, complementary approaches for opportunistic risk assessment are desirable. OBJECTIVE: To develop and test a deep-learning model (CXR CVD-Risk) that estimates 10-year risk for MACE from a routine chest radiograph (CXR) and compare its performance with that of the traditional ASCVD risk score for implications for statin eligibility. DESIGN: Risk prediction study. SETTING: Outpatients potentially eligible for primary cardiovascular prevention. PARTICIPANTS: The CXR CVD-Risk model was developed using data from a cancer screening trial. It was externally validated in 8869 outpatients with unknown ASCVD risk because of missing inputs to calculate the ASCVD risk score and in 2132 outpatients with known risk whose ASCVD risk score could be calculated. MEASUREMENTS: 10-year MACE predicted by CXR CVD-Risk versus the ASCVD risk score. RESULTS: Among 8869 outpatients with unknown ASCVD risk, those with a risk of 7.5% or higher as predicted by CXR CVD-Risk had higher 10-year risk for MACE after adjustment for risk factors (adjusted hazard ratio [HR], 1.73 [95% CI, 1.47 to 2.03]). In the additional 2132 outpatients with known ASCVD risk, CXR CVD-Risk predicted MACE beyond the traditional ASCVD risk score (adjusted HR, 1.88 [CI, 1.24 to 2.85]). LIMITATION: Retrospective study design using electronic medical records. CONCLUSION: On the basis of a single CXR, CXR CVD-Risk predicts 10-year MACE beyond the clinical standard and may help identify individuals at high risk whose ASCVD risk score cannot be calculated because of missing data. PRIMARY FUNDING SOURCE: None.

Asunto(s)

Aterosclerosis , Enfermedades Cardiovasculares , Aprendizaje Profundo , Humanos , Factores de Riesgo , Enfermedades Cardiovasculares/diagnóstico por imagen , Enfermedades Cardiovasculares/epidemiología , Estudios Retrospectivos , Medición de Riesgo , Factores de Riesgo de Enfermedad Cardiaca

11.

Robustness and reproducibility for AI learning in biomedical sciences: RENOIR.

Barberis, Alessandro; Aerts, Hugo J W L; Buffa, Francesca M.

Sci Rep ; 14(1): 1933, 2024 01 22.

Artículo en Inglés | MEDLINE | ID: mdl-38253545

RESUMEN

Artificial intelligence (AI) techniques are increasingly applied across various domains, favoured by the growing acquisition and public availability of large, complex datasets. Despite this trend, AI publications often suffer from lack of reproducibility and poor generalisation of findings, undermining scientific value and contributing to global research waste. To address these issues and focusing on the learning aspect of the AI field, we present RENOIR (REpeated random sampliNg fOr machIne leaRning), a modular open-source platform for robust and reproducible machine learning (ML) analysis. RENOIR adopts standardised pipelines for model training and testing, introducing elements of novelty, such as the dependence of the performance of the algorithm on the sample size. Additionally, RENOIR offers automated generation of transparent and usable reports, aiming to enhance the quality and reproducibility of AI studies. To demonstrate the versatility of our tool, we applied it to benchmark datasets from health, computer science, and STEM (Science, Technology, Engineering, and Mathematics) domains. Furthermore, we showcase RENOIR's successful application in recently published studies, where it identified classifiers for SET2D and TP53 mutation status in cancer. Finally, we present a use case where RENOIR was employed to address a significant pharmacological challenge-predicting drug efficacy. RENOIR is freely available at https://github.com/alebarberis/renoir .

Asunto(s)

Algoritmos , Inteligencia Artificial , Reproducibilidad de los Resultados , Aprendizaje Automático , Benchmarking

12.

Evaluating the ChatGPT family of models for biomedical reasoning and classification.

Chen, Shan; Li, Yingya; Lu, Sheng; Van, Hoang; Aerts, Hugo J W L; Savova, Guergana K; Bitterman, Danielle S.

J Am Med Inform Assoc ; 31(4): 940-948, 2024 04 03.

Artículo en Inglés | MEDLINE | ID: mdl-38261400

RESUMEN

OBJECTIVE: Large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates ChatGPT family of models (GPT-3.5, GPT-4) in biomedical tasks beyond question-answering. MATERIALS AND METHODS: We evaluated model performance with 11 122 samples for two fundamental tasks in the biomedical domain-classification (n = 8676) and reasoning (n = 2446). The first task involves classifying health advice in scientific literature, while the second task is detecting causal relations in biomedical literature. We used 20% of the dataset for prompt development, including zero- and few-shot settings with and without chain-of-thought (CoT). We then evaluated the best prompts from each setting on the remaining dataset, comparing them to models using simple features (BoW with logistic regression) and fine-tuned BioBERT models. RESULTS: Fine-tuning BioBERT produced the best classification (F1: 0.800-0.902) and reasoning (F1: 0.851) results. Among LLM approaches, few-shot CoT achieved the best classification (F1: 0.671-0.770) and reasoning (F1: 0.682) results, comparable to the BoW model (F1: 0.602-0.753 and 0.675 for classification and reasoning, respectively). It took 78 h to obtain the best LLM results, compared to 0.078 and 0.008 h for the top-performing BioBERT and BoW models, respectively. DISCUSSION: The simple BoW model performed similarly to the most complex LLM prompting. Prompt engineering required significant investment. CONCLUSION: Despite the excitement around viral ChatGPT, fine-tuning for two fundamental biomedical natural language processing tasks remained the best strategy.

Asunto(s)

Lenguaje , Procesamiento de Lenguaje Natural

13.

Enrichment of lung cancer computed tomography collections with AI-derived annotations.

Krishnaswamy, Deepa; Bontempi, Dennis; Thiriveedhi, Vamsi Krishna; Punzo, Davide; Clunie, David; Bridge, Christopher P; Aerts, Hugo J W L; Kikinis, Ron; Fedorov, Andrey.

Sci Data ; 11(1): 25, 2024 Jan 04.

Artículo en Inglés | MEDLINE | ID: mdl-38177130

RESUMEN

Public imaging datasets are critical for the development and evaluation of automated tools in cancer imaging. Unfortunately, many do not include annotations or image-derived features, complicating downstream analysis. Artificial intelligence-based annotation tools have been shown to achieve acceptable performance and can be used to automatically annotate large datasets. As part of the effort to enrich public data available within NCI Imaging Data Commons (IDC), here we introduce AI-generated annotations for two collections containing computed tomography images of the chest, NSCLC-Radiomics, and a subset of the National Lung Screening Trial. Using publicly available AI algorithms, we derived volumetric annotations of thoracic organs-at-risk, their corresponding radiomics features, and slice-level annotations of anatomical landmarks and regions. The resulting annotations are publicly available within IDC, where the DICOM format is used to harmonize the data and achieve FAIR (Findable, Accessible, Interoperable, Reusable) data principles. The annotations are accompanied by cloud-enabled notebooks demonstrating their use. This study reinforces the need for large, publicly accessible curated datasets and demonstrates how AI can aid in cancer imaging.

Asunto(s)

Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Humanos , Inteligencia Artificial , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico por imagen , Pulmón/diagnóstico por imagen , Neoplasias Pulmonares/diagnóstico por imagen , Tomografía Computarizada por Rayos X

14.

Large language models to identify social determinants of health in electronic health records.

Guevara, Marco; Chen, Shan; Thomas, Spencer; Chaunzwa, Tafadzwa L; Franco, Idalid; Kann, Benjamin H; Moningi, Shalini; Qian, Jack M; Goldstein, Madeleine; Harper, Susan; Aerts, Hugo J W L; Catalano, Paul J; Savova, Guergana K; Mak, Raymond H; Bitterman, Danielle S.

NPJ Digit Med ; 7(1): 6, 2024 Jan 11.

Artículo en Inglés | MEDLINE | ID: mdl-38200151

RESUMEN

Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.

15.

Edge roughness quantifies impact of physician variation on training and performance of deep learning auto-segmentation models for the esophagus.

Yan, Yujie; Kehayias, Christopher; He, John; Aerts, Hugo J W L; Fitzgerald, Kelly J; Kann, Benjamin H; Kozono, David E; Guthier, Christian V; Mak, Raymond H.

Sci Rep ; 14(1): 2536, 2024 01 30.

Artículo en Inglés | MEDLINE | ID: mdl-38291051

RESUMEN

Manual segmentation of tumors and organs-at-risk (OAR) in 3D imaging for radiation-therapy planning is time-consuming and subject to variation between different observers. Artificial intelligence (AI) can assist with segmentation, but challenges exist in ensuring high-quality segmentation, especially for small, variable structures, such as the esophagus. We investigated the effect of variation in segmentation quality and style of physicians for training deep-learning models for esophagus segmentation and proposed a new metric, edge roughness, for evaluating/quantifying slice-to-slice inconsistency. This study includes a real-world cohort of 394 patients who each received radiation therapy (mainly for lung cancer). Segmentation of the esophagus was performed by 8 physicians as part of routine clinical care. We evaluated manual segmentation by comparing the length and edge roughness of segmentations among physicians to analyze inconsistencies. We trained eight multiple- and individual-physician segmentation models in total, based on U-Net architectures and residual backbones. We used the volumetric Dice coefficient to measure the performance for each model. We proposed a metric, edge roughness, to quantify the shift of segmentation among adjacent slices by calculating the curvature of edges of the 2D sagittal- and coronal-view projections. The auto-segmentation model trained on multiple physicians (MD1-7) achieved the highest mean Dice of 73.7 ± 14.8%. The individual-physician model (MD7) with the highest edge roughness (mean ± SD: 0.106 ± 0.016) demonstrated significantly lower volumetric Dice for test cases compared with other individual models (MD7: 58.5 ± 15.8%, MD6: 67.1 ± 16.8%, p < 0.001). A multiple-physician model trained after removing the MD7 data resulted in fewer outliers (e.g., Dice ≤ 40%: 4 cases for MD1-6, 7 cases for MD1-7, Ntotal = 394). While we initially detected this pattern in a single clinician, we validated the edge roughness metric across the entire dataset. The model trained with the lowest-quantile edge roughness (MDER-Q1, Ntrain = 62) achieved significantly higher Dice (Ntest = 270) than the model trained with the highest-quantile ones (MDER-Q4, Ntrain = 62) (MDER-Q1: 67.8 ± 14.8%, MDER-Q4: 62.8 ± 15.7%, p < 0.001). This study demonstrates that there is significant variation in style and quality in manual segmentations in clinical care, and that training AI auto-segmentation algorithms from real-world, clinical datasets may result in unexpectedly under-performing algorithms with the inclusion of outliers. Importantly, this study provides a novel evaluation metric, edge roughness, to quantify physician variation in segmentation which will allow developers to filter clinical training data to optimize model performance.

Asunto(s)

Aprendizaje Profundo , Humanos , Inteligencia Artificial , Tórax , Algoritmos , Tomografía Computarizada por Rayos X , Procesamiento de Imagen Asistido por Computador/métodos

16.

National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence.

Fedorov, Andrey; Longabaugh, William J R; Pot, David; Clunie, David A; Pieper, Steven D; Gibbs, David L; Bridge, Christopher; Herrmann, Markus D; Homeyer, André; Lewis, Rob; Aerts, Hugo J W L; Krishnaswamy, Deepa; Thiriveedhi, Vamsi Krishna; Ciausu, Cosmin; Schacherer, Daniela P; Bontempi, Dennis; Pihl, Todd; Wagner, Ulrike; Farahani, Keyvan; Kim, Erika; Kikinis, Ron.

Radiographics ; 43(12): e230180, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37999984

RESUMEN

The remarkable advances of artificial intelligence (AI) technology are revolutionizing established approaches to the acquisition, interpretation, and analysis of biomedical imaging data. Development, validation, and continuous refinement of AI tools requires easy access to large high-quality annotated datasets, which are both representative and diverse. The National Cancer Institute (NCI) Imaging Data Commons (IDC) hosts large and diverse publicly available cancer image data collections. By harmonizing all data based on industry standards and colocalizing it with analysis and exploration resources, the IDC aims to facilitate the development, validation, and clinical translation of AI tools and address the well-documented challenges of establishing reproducible and transparent AI processing pipelines. Balanced use of established commercial products with open-source solutions, interconnected by standard interfaces, provides value and performance, while preserving sufficient agility to address the evolving needs of the research community. Emphasis on the development of tools, use cases to demonstrate the utility of uniform data representation, and cloud-based analysis aim to ease adoption and help define best practices. Integration with other data in the broader NCI Cancer Research Data Commons infrastructure opens opportunities for multiomics studies incorporating imaging data to further empower the research community to accelerate breakthroughs in cancer detection, diagnosis, and treatment. Published under a CC BY 4.0 license.

Asunto(s)

Inteligencia Artificial , Neoplasias , Estados Unidos , Humanos , National Cancer Institute (U.S.) , Reproducibilidad de los Resultados , Diagnóstico por Imagen , Multiómica , Neoplasias/diagnóstico por imagen

17.

Automated temporalis muscle quantification and growth charts for children through adulthood.

Zapaishchykova, Anna; Liu, Kevin X; Saraf, Anurag; Ye, Zezhong; Catalano, Paul J; Benitez, Viviana; Ravipati, Yashwanth; Jain, Arnav; Huang, Julia; Hayat, Hasaan; Likitlersuang, Jirapat; Vajapeyam, Sridhar; Chopra, Rishi B; Familiar, Ariana M; Nabavidazeh, Ali; Mak, Raymond H; Resnick, Adam C; Mueller, Sabine; Cooney, Tabitha M; Haas-Kogan, Daphne A; Poussaint, Tina Y; Aerts, Hugo J W L; Kann, Benjamin H.

Nat Commun ; 14(1): 6863, 2023 11 09.

Artículo en Inglés | MEDLINE | ID: mdl-37945573

RESUMEN

Lean muscle mass (LMM) is an important aspect of human health. Temporalis muscle thickness is a promising LMM marker but has had limited utility due to its unknown normal growth trajectory and reference ranges and lack of standardized measurement. Here, we develop an automated deep learning pipeline to accurately measure temporalis muscle thickness (iTMT) from routine brain magnetic resonance imaging (MRI). We apply iTMT to 23,876 MRIs of healthy subjects, ages 4 through 35, and generate sex-specific iTMT normal growth charts with percentiles. We find that iTMT was associated with specific physiologic traits, including caloric intake, physical activity, sex hormone levels, and presence of malignancy. We validate iTMT across multiple demographic groups and in children with brain tumors and demonstrate feasibility for individualized longitudinal monitoring. The iTMT pipeline provides unprecedented insights into temporalis muscle growth during human development and enables the use of LMM tracking to inform clinical decision-making.

Asunto(s)

Gráficos de Crecimiento , Músculo Temporal , Masculino , Femenino , Humanos , Niño , Músculo Temporal/diagnóstico por imagen , Músculo Temporal/patología

18.

Image based prognosis in head and neck cancer using convolutional neural networks: a case study in reproducibility and optimization.

Mateus, Pedro; Volmer, Leroy; Wee, Leonard; Aerts, Hugo J W L; Hoebers, Frank; Dekker, Andre; Bermejo, Inigo.

Sci Rep ; 13(1): 18176, 2023 10 24.

Artículo en Inglés | MEDLINE | ID: mdl-37875663

RESUMEN

In the past decade, there has been a sharp increase in publications describing applications of convolutional neural networks (CNNs) in medical image analysis. However, recent reviews have warned of the lack of reproducibility of most such studies, which has impeded closer examination of the models and, in turn, their implementation in healthcare. On the other hand, the performance of these models is highly dependent on decisions on architecture and image pre-processing. In this work, we assess the reproducibility of three studies that use CNNs for head and neck cancer outcome prediction by attempting to reproduce the published results. In addition, we propose a new network structure and assess the impact of image pre-processing and model selection criteria on performance. We used two publicly available datasets: one with 298 patients for training and validation and another with 137 patients from a different institute for testing. All three studies failed to report elements required to reproduce their results thoroughly, mainly the image pre-processing steps and the random seed. Our model either outperforms or achieves similar performance to the existing models with considerably fewer parameters. We also observed that the pre-processing efforts significantly impact the model's performance and that some model selection criteria may lead to suboptimal models. Although there have been improvements in the reproducibility of deep learning models, our work suggests that wider implementation of reporting standards is required to avoid a reproducibility crisis.

Asunto(s)

Neoplasias de Cabeza y Cuello , Redes Neurales de la Computación , Humanos , Reproducibilidad de los Resultados , Neoplasias de Cabeza y Cuello/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Pronóstico

19.

Decoding biological age from face photographs using deep learning.

Zalay, Osbert; Bontempi, Dennis; Bitterman, Danielle S; Birkbak, Nicolai; Shyr, Derek; Haugg, Fridolin; Qian, Jack M; Roberts, Hannah; Perni, Subha; Prudente, Vasco; Pai, Suraj; Dekker, Andre; Haibe-Kains, Benjamin; Guthier, Christian; Balboni, Tracy; Warren, Laura; Krishan, Monica; Kann, Benjamin H; Swanton, Charles; Ruysscher, Dirk De; Mak, Raymond H; Aerts, Hugo Jwl.

medRxiv ; 2023 Sep 12.

Artículo en Inglés | MEDLINE | ID: mdl-37745558

RESUMEN

Because humans age at different rates, a person's physical appearance may yield insights into their biological age and physiological health more reliably than their chronological age. In medicine, however, appearance is incorporated into medical judgments in a subjective and non-standardized fashion. In this study, we developed and validated FaceAge, a deep learning system to estimate biological age from easily obtainable and low-cost face photographs. FaceAge was trained on data from 58,851 healthy individuals, and clinical utility was evaluated on data from 6,196 patients with cancer diagnoses from two institutions in the United States and The Netherlands. To assess the prognostic relevance of FaceAge estimation, we performed Kaplan Meier survival analysis. To test a relevant clinical application of FaceAge, we assessed the performance of FaceAge in end-of-life patients with metastatic cancer who received palliative treatment by incorporating FaceAge into clinical prediction models. We found that, on average, cancer patients look older than their chronological age, and looking older is correlated with worse overall survival. FaceAge demonstrated significant independent prognostic performance in a range of cancer types and stages. We found that FaceAge can improve physicians' survival predictions in incurable patients receiving palliative treatments, highlighting the clinical utility of the algorithm to support end-of-life decision-making. FaceAge was also significantly associated with molecular mechanisms of senescence through gene analysis, while age was not. These findings may extend to diseases beyond cancer, motivating using deep learning algorithms to translate a patient's visual appearance into objective, quantitative, and clinically useful measures.

20.

Foundation Models for Quantitative Biomarker Discovery in Cancer Imaging.

Pai, Suraj; Bontempi, Dennis; Prudente, Vasco; Hadzic, Ibrahim; Sokac, Mateo; Chaunzwa, Tafadzwa L; Bernatz, Simon; Hosny, Ahmed; Mak, Raymond H; Birkbak, Nicolai J; Aerts, Hugo Jwl.

medRxiv ; 2023 Sep 05.

Artículo en Inglés | MEDLINE | ID: mdl-37732237

RESUMEN

Foundation models represent a recent paradigm shift in deep learning, where a single large-scale model trained on vast amounts of data can serve as the foundation for various downstream tasks. Foundation models are generally trained using self-supervised learning and excel in reducing the demand for training samples in downstream applications. This is especially important in medicine, where large labeled datasets are often scarce. Here, we developed a foundation model for imaging biomarker discovery by training a convolutional encoder through self-supervised learning using a comprehensive dataset of 11,467 radiographic lesions. The foundation model was evaluated in distinct and clinically relevant applications of imaging-based biomarkers. We found that they facilitated better and more efficient learning of imaging biomarkers and yielded task-specific models that significantly outperformed their conventional supervised counterparts on downstream tasks. The performance gain was most prominent when training dataset sizes were very limited. Furthermore, foundation models were more stable to input and inter-reader variations and showed stronger associations with underlying biology. Our results demonstrate the tremendous potential of foundation models in discovering novel imaging biomarkers that may extend to other clinical use cases and can accelerate the widespread translation of imaging biomarkers into clinical settings.

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA