Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 169
Filter
1.
JAMA Oncol ; 2024 May 23.
Article in English | MEDLINE | ID: mdl-38780929

ABSTRACT

Importance: The association between body composition (BC) and cancer outcomes is complex and incompletely understood. Previous research in non-small-cell lung cancer (NSCLC) has been limited to small, single-institution studies and yielded promising, albeit heterogeneous, results. Objectives: To evaluate the association of BC with oncologic outcomes in patients receiving immunotherapy for advanced or metastatic NSCLC. Design, Setting, and Participants: This comprehensive multicohort analysis included clinical data from cohorts receiving treatment at the Dana-Farber Brigham Cancer Center (DFBCC) who received immunotherapy given alone or in combination with chemotherapy and prospectively collected data from the phase 1/2 Study 1108 and the chemotherapy arm of the phase 3 MYSTIC trial. Baseline and follow-up computed tomography (CT) scans were collected and analyzed using deep neural networks for automatic L3 slice selection and body compartment segmentation (skeletal muscle [SM], subcutaneous adipose tissue [SAT], and visceral adipose tissue). Outcomes were compared based on baseline BC measures or their change at the first follow-up scan. The data were analyzed between July 2022 and April 2023. Main Outcomes and Measures: Hazard ratios (HRs) for the association of BC measurements with overall survival (OS) and progression-free survival (PFS). Results: A total of 1791 patients (878 women [49%]) with NSCLC were analyzed, of whom 487 (27.2%) received chemoimmunotherapy at DFBCC (DFBCC-CIO), 825 (46.1%) received ICI monotherapy at DFBCC (DFBCC-IO), 222 (12.4%) were treated with durvalumab monotherapy on Study 1108, and 257 (14.3%) were treated with chemotherapy on MYSTIC; median (IQR) ages were 65 (58-74), 66 (57-71), 65 (26-87), and 63 (30-84) years, respectively. A loss in SM mass, as indicated by a change in the L3 SM area, was associated with worse oncologic outcome across patient groups (HR, 0.59 [95% CI, 0.43-0.81] and 0.61 [95% CI, 0.47-0.79] for OS and PFS, respectively, in DFBCC-CIO; HR, 0.74 [95% CI, 0.60-0.91] for OS in DFBCC-IO; HR, 0.46 [95% CI, 0.33-0.64] and 0.47 [95% CI, 0.34-0.64] for OS and PFS, respectively, in Study 1108; HR, 0.76 [95% CI, 0.61-0.96] for PFS in the MYSTIC trial). This association was most prominent among male patients, with a nonsignificant association among female patients in the MYSTIC trial and DFBCC-CIO cohorts on Kaplan-Meier analysis. An increase of more than 5% in SAT density, as quantified by the average CT attenuation in Hounsfield units of the SAT compartment, was associated with poorer OS in 3 patient cohorts (HR, 0.61 [95% CI, 0.43-0.86] for DFBCC-CIO; HR, 0.62 [95% CI, 0.49-0.79] for DFBCC-IO; and HR, 0.56 [95% CI, 0.40-0.77] for Study 1108). The change in SAT density was also associated with PFS for DFBCC-CIO (HR, 0.73; 95% CI, 0.54-0.97). This was primarily observed in female patients on Kaplan-Meier analysis. Conclusions and Relevance: The results of this multicohort study suggest that loss in SM mass during systemic therapy for NSCLC is a marker of poor outcomes, especially in male patients. SAT density changes are also associated with prognosis, particularly in female patients. Automated CT-derived BC measurements should be considered in determining NSCLC prognosis.

3.
Nat Mach Intell ; 6(3): 354-367, 2024.
Article in English | MEDLINE | ID: mdl-38523679

ABSTRACT

Foundation models in deep learning are characterized by a single large-scale model trained on vast amounts of data serving as the foundation for various downstream tasks. Foundation models are generally trained using self-supervised learning and excel in reducing the demand for training samples in downstream applications. This is especially important in medicine, where large labelled datasets are often scarce. Here, we developed a foundation model for cancer imaging biomarker discovery by training a convolutional encoder through self-supervised learning using a comprehensive dataset of 11,467 radiographic lesions. The foundation model was evaluated in distinct and clinically relevant applications of cancer imaging-based biomarkers. We found that it facilitated better and more efficient learning of imaging biomarkers and yielded task-specific models that significantly outperformed conventional supervised and other state-of-the-art pretrained implementations on downstream tasks, especially when training dataset sizes were very limited. Furthermore, the foundation model was more stable to input variations and showed strong associations with underlying biology. Our results demonstrate the tremendous potential of foundation models in discovering new imaging biomarkers that may extend to other clinical use cases and can accelerate the widespread translation of imaging biomarkers into clinical settings.

4.
Ann Intern Med ; 177(4): 409-417, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38527287

ABSTRACT

BACKGROUND: Guidelines for primary prevention of atherosclerotic cardiovascular disease (ASCVD) recommend a risk calculator (ASCVD risk score) to estimate 10-year risk for major adverse cardiovascular events (MACE). Because the necessary inputs are often missing, complementary approaches for opportunistic risk assessment are desirable. OBJECTIVE: To develop and test a deep-learning model (CXR CVD-Risk) that estimates 10-year risk for MACE from a routine chest radiograph (CXR) and compare its performance with that of the traditional ASCVD risk score for implications for statin eligibility. DESIGN: Risk prediction study. SETTING: Outpatients potentially eligible for primary cardiovascular prevention. PARTICIPANTS: The CXR CVD-Risk model was developed using data from a cancer screening trial. It was externally validated in 8869 outpatients with unknown ASCVD risk because of missing inputs to calculate the ASCVD risk score and in 2132 outpatients with known risk whose ASCVD risk score could be calculated. MEASUREMENTS: 10-year MACE predicted by CXR CVD-Risk versus the ASCVD risk score. RESULTS: Among 8869 outpatients with unknown ASCVD risk, those with a risk of 7.5% or higher as predicted by CXR CVD-Risk had higher 10-year risk for MACE after adjustment for risk factors (adjusted hazard ratio [HR], 1.73 [95% CI, 1.47 to 2.03]). In the additional 2132 outpatients with known ASCVD risk, CXR CVD-Risk predicted MACE beyond the traditional ASCVD risk score (adjusted HR, 1.88 [CI, 1.24 to 2.85]). LIMITATION: Retrospective study design using electronic medical records. CONCLUSION: On the basis of a single CXR, CXR CVD-Risk predicts 10-year MACE beyond the clinical standard and may help identify individuals at high risk whose ASCVD risk score cannot be calculated because of missing data. PRIMARY FUNDING SOURCE: None.


Subject(s)
Atherosclerosis , Cardiovascular Diseases , Deep Learning , Humans , Risk Factors , Cardiovascular Diseases/diagnostic imaging , Cardiovascular Diseases/epidemiology , Retrospective Studies , Risk Assessment , Heart Disease Risk Factors
5.
Radiol Artif Intell ; 6(3): e230333, 2024 May.
Article in English | MEDLINE | ID: mdl-38446044

ABSTRACT

Purpose To develop and externally test a scan-to-prediction deep learning pipeline for noninvasive, MRI-based BRAF mutational status classification for pediatric low-grade glioma. Materials and Methods This retrospective study included two pediatric low-grade glioma datasets with linked genomic and diagnostic T2-weighted MRI data of patients: Dana-Farber/Boston Children's Hospital (development dataset, n = 214 [113 (52.8%) male; 104 (48.6%) BRAF wild type, 60 (28.0%) BRAF fusion, and 50 (23.4%) BRAF V600E]) and the Children's Brain Tumor Network (external testing, n = 112 [55 (49.1%) male; 35 (31.2%) BRAF wild type, 60 (53.6%) BRAF fusion, and 17 (15.2%) BRAF V600E]). A deep learning pipeline was developed to classify BRAF mutational status (BRAF wild type vs BRAF fusion vs BRAF V600E) via a two-stage process: (a) three-dimensional tumor segmentation and extraction of axial tumor images and (b) section-wise, deep learning-based classification of mutational status. Knowledge-transfer and self-supervised approaches were investigated to prevent model overfitting, with a primary end point of the area under the receiver operating characteristic curve (AUC). To enhance model interpretability, a novel metric, center of mass distance, was developed to quantify the model attention around the tumor. Results A combination of transfer learning from a pretrained medical imaging-specific network and self-supervised label cross-training (TransferX) coupled with consensus logic yielded the highest classification performance with an AUC of 0.82 (95% CI: 0.72, 0.91), 0.87 (95% CI: 0.61, 0.97), and 0.85 (95% CI: 0.66, 0.95) for BRAF wild type, BRAF fusion, and BRAF V600E, respectively, on internal testing. On external testing, the pipeline yielded an AUC of 0.72 (95% CI: 0.64, 0.86), 0.78 (95% CI: 0.61, 0.89), and 0.72 (95% CI: 0.64, 0.88) for BRAF wild type, BRAF fusion, and BRAF V600E, respectively. Conclusion Transfer learning and self-supervised cross-training improved classification performance and generalizability for noninvasive pediatric low-grade glioma mutational status prediction in a limited data scenario. Keywords: Pediatrics, MRI, CNS, Brain/Brain Stem, Oncology, Feature Detection, Diagnosis, Supervised Learning, Transfer Learning, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2024.


Subject(s)
Brain Neoplasms , Glioma , Humans , Child , Male , Female , Brain Neoplasms/diagnostic imaging , Retrospective Studies , Proto-Oncogene Proteins B-raf/genetics , Glioma/diagnosis , Machine Learning
6.
Commun Med (Lond) ; 4(1): 44, 2024 Mar 13.
Article in English | MEDLINE | ID: mdl-38480863

ABSTRACT

BACKGROUND: Heavy smokers are at increased risk for cardiovascular disease and may benefit from individualized risk quantification using routine lung cancer screening chest computed tomography. We investigated the prognostic value of deep learning-based automated epicardial adipose tissue quantification and compared it to established cardiovascular risk factors and coronary artery calcium. METHODS: We investigated the prognostic value of automated epicardial adipose tissue quantification in heavy smokers enrolled in the National Lung Screening Trial and followed for 12.3 (11.9-12.8) years. The epicardial adipose tissue was segmented and quantified on non-ECG-synchronized, non-contrast low-dose chest computed tomography scans using a validated deep-learning algorithm. Multivariable survival regression analyses were then utilized to determine the associations of epicardial adipose tissue volume and density with all-cause and cardiovascular mortality (myocardial infarction and stroke). RESULTS: Here we show in 24,090 adult heavy smokers (59% men; 61 ± 5 years) that epicardial adipose tissue volume and density are independently associated with all-cause (adjusted hazard ratios: 1.10 and 1.38; P < 0.001) and cardiovascular mortality (adjusted hazard ratios: 1.14 and 1.78; P < 0.001) beyond demographics, clinical risk factors, body habitus, level of education, and coronary artery calcium score. CONCLUSIONS: Our findings suggest that automated assessment of epicardial adipose tissue from low-dose lung cancer screening images offers prognostic value in heavy smokers, with potential implications for cardiovascular risk stratification in this high-risk population.


Heavy smokers are at increased risk of poor health outcomes, particularly outcomes related to cardiovascular disease. We explore how fat surrounding the heart, known as epicardial adipose tissue, may be an indicator of the health of heavy smokers. We use an artificial intelligence system to measure the heart fat on chest scans of heavy smokers taken during a lung cancer screening trial and following their health for 12 years. We find that higher amounts and denser epicardial adipose tissue are linked to an increased risk of death from any cause, specifically from heart-related issues, even when considering other health factors. This suggests that measuring epicardial adipose tissue during lung cancer screenings could be a valuable tool for identifying heavy smokers at greater risk of heart problems and death, possibly helping to guide their medical management and improve their cardiovascular health.

7.
J Am Med Inform Assoc ; 31(4): 940-948, 2024 Apr 03.
Article in English | MEDLINE | ID: mdl-38261400

ABSTRACT

OBJECTIVE: Large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates ChatGPT family of models (GPT-3.5, GPT-4) in biomedical tasks beyond question-answering. MATERIALS AND METHODS: We evaluated model performance with 11 122 samples for two fundamental tasks in the biomedical domain-classification (n = 8676) and reasoning (n = 2446). The first task involves classifying health advice in scientific literature, while the second task is detecting causal relations in biomedical literature. We used 20% of the dataset for prompt development, including zero- and few-shot settings with and without chain-of-thought (CoT). We then evaluated the best prompts from each setting on the remaining dataset, comparing them to models using simple features (BoW with logistic regression) and fine-tuned BioBERT models. RESULTS: Fine-tuning BioBERT produced the best classification (F1: 0.800-0.902) and reasoning (F1: 0.851) results. Among LLM approaches, few-shot CoT achieved the best classification (F1: 0.671-0.770) and reasoning (F1: 0.682) results, comparable to the BoW model (F1: 0.602-0.753 and 0.675 for classification and reasoning, respectively). It took 78 h to obtain the best LLM results, compared to 0.078 and 0.008 h for the top-performing BioBERT and BoW models, respectively. DISCUSSION: The simple BoW model performed similarly to the most complex LLM prompting. Prompt engineering required significant investment. CONCLUSION: Despite the excitement around viral ChatGPT, fine-tuning for two fundamental biomedical natural language processing tasks remained the best strategy.


Subject(s)
Language , Natural Language Processing
8.
NPJ Digit Med ; 7(1): 6, 2024 Jan 11.
Article in English | MEDLINE | ID: mdl-38200151

ABSTRACT

Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.

9.
Sci Rep ; 14(1): 1933, 2024 01 22.
Article in English | MEDLINE | ID: mdl-38253545

ABSTRACT

Artificial intelligence (AI) techniques are increasingly applied across various domains, favoured by the growing acquisition and public availability of large, complex datasets. Despite this trend, AI publications often suffer from lack of reproducibility and poor generalisation of findings, undermining scientific value and contributing to global research waste. To address these issues and focusing on the learning aspect of the AI field, we present RENOIR (REpeated random sampliNg fOr machIne leaRning), a modular open-source platform for robust and reproducible machine learning (ML) analysis. RENOIR adopts standardised pipelines for model training and testing, introducing elements of novelty, such as the dependence of the performance of the algorithm on the sample size. Additionally, RENOIR offers automated generation of transparent and usable reports, aiming to enhance the quality and reproducibility of AI studies. To demonstrate the versatility of our tool, we applied it to benchmark datasets from health, computer science, and STEM (Science, Technology, Engineering, and Mathematics) domains. Furthermore, we showcase RENOIR's successful application in recently published studies, where it identified classifiers for SET2D and TP53 mutation status in cancer. Finally, we present a use case where RENOIR was employed to address a significant pharmacological challenge-predicting drug efficacy. RENOIR is freely available at https://github.com/alebarberis/renoir .


Subject(s)
Algorithms , Artificial Intelligence , Reproducibility of Results , Machine Learning , Benchmarking
10.
Sci Data ; 11(1): 25, 2024 Jan 04.
Article in English | MEDLINE | ID: mdl-38177130

ABSTRACT

Public imaging datasets are critical for the development and evaluation of automated tools in cancer imaging. Unfortunately, many do not include annotations or image-derived features, complicating downstream analysis. Artificial intelligence-based annotation tools have been shown to achieve acceptable performance and can be used to automatically annotate large datasets. As part of the effort to enrich public data available within NCI Imaging Data Commons (IDC), here we introduce AI-generated annotations for two collections containing computed tomography images of the chest, NSCLC-Radiomics, and a subset of the National Lung Screening Trial. Using publicly available AI algorithms, we derived volumetric annotations of thoracic organs-at-risk, their corresponding radiomics features, and slice-level annotations of anatomical landmarks and regions. The resulting annotations are publicly available within IDC, where the DICOM format is used to harmonize the data and achieve FAIR (Findable, Accessible, Interoperable, Reusable) data principles. The annotations are accompanied by cloud-enabled notebooks demonstrating their use. This study reinforces the need for large, publicly accessible curated datasets and demonstrates how AI can aid in cancer imaging.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Artificial Intelligence , Carcinoma, Non-Small-Cell Lung/diagnostic imaging , Lung/diagnostic imaging , Lung Neoplasms/diagnostic imaging , Tomography, X-Ray Computed
11.
Sci Rep ; 14(1): 2536, 2024 01 30.
Article in English | MEDLINE | ID: mdl-38291051

ABSTRACT

Manual segmentation of tumors and organs-at-risk (OAR) in 3D imaging for radiation-therapy planning is time-consuming and subject to variation between different observers. Artificial intelligence (AI) can assist with segmentation, but challenges exist in ensuring high-quality segmentation, especially for small, variable structures, such as the esophagus. We investigated the effect of variation in segmentation quality and style of physicians for training deep-learning models for esophagus segmentation and proposed a new metric, edge roughness, for evaluating/quantifying slice-to-slice inconsistency. This study includes a real-world cohort of 394 patients who each received radiation therapy (mainly for lung cancer). Segmentation of the esophagus was performed by 8 physicians as part of routine clinical care. We evaluated manual segmentation by comparing the length and edge roughness of segmentations among physicians to analyze inconsistencies. We trained eight multiple- and individual-physician segmentation models in total, based on U-Net architectures and residual backbones. We used the volumetric Dice coefficient to measure the performance for each model. We proposed a metric, edge roughness, to quantify the shift of segmentation among adjacent slices by calculating the curvature of edges of the 2D sagittal- and coronal-view projections. The auto-segmentation model trained on multiple physicians (MD1-7) achieved the highest mean Dice of 73.7 ± 14.8%. The individual-physician model (MD7) with the highest edge roughness (mean ± SD: 0.106 ± 0.016) demonstrated significantly lower volumetric Dice for test cases compared with other individual models (MD7: 58.5 ± 15.8%, MD6: 67.1 ± 16.8%, p < 0.001). A multiple-physician model trained after removing the MD7 data resulted in fewer outliers (e.g., Dice ≤ 40%: 4 cases for MD1-6, 7 cases for MD1-7, Ntotal = 394). While we initially detected this pattern in a single clinician, we validated the edge roughness metric across the entire dataset. The model trained with the lowest-quantile edge roughness (MDER-Q1, Ntrain = 62) achieved significantly higher Dice (Ntest = 270) than the model trained with the highest-quantile ones (MDER-Q4, Ntrain = 62) (MDER-Q1: 67.8 ± 14.8%, MDER-Q4: 62.8 ± 15.7%, p < 0.001). This study demonstrates that there is significant variation in style and quality in manual segmentations in clinical care, and that training AI auto-segmentation algorithms from real-world, clinical datasets may result in unexpectedly under-performing algorithms with the inclusion of outliers. Importantly, this study provides a novel evaluation metric, edge roughness, to quantify physician variation in segmentation which will allow developers to filter clinical training data to optimize model performance.


Subject(s)
Deep Learning , Humans , Artificial Intelligence , Thorax , Algorithms , Tomography, X-Ray Computed , Image Processing, Computer-Assisted/methods
12.
Nat Commun ; 14(1): 6863, 2023 11 09.
Article in English | MEDLINE | ID: mdl-37945573

ABSTRACT

Lean muscle mass (LMM) is an important aspect of human health. Temporalis muscle thickness is a promising LMM marker but has had limited utility due to its unknown normal growth trajectory and reference ranges and lack of standardized measurement. Here, we develop an automated deep learning pipeline to accurately measure temporalis muscle thickness (iTMT) from routine brain magnetic resonance imaging (MRI). We apply iTMT to 23,876 MRIs of healthy subjects, ages 4 through 35, and generate sex-specific iTMT normal growth charts with percentiles. We find that iTMT was associated with specific physiologic traits, including caloric intake, physical activity, sex hormone levels, and presence of malignancy. We validate iTMT across multiple demographic groups and in children with brain tumors and demonstrate feasibility for individualized longitudinal monitoring. The iTMT pipeline provides unprecedented insights into temporalis muscle growth during human development and enables the use of LMM tracking to inform clinical decision-making.


Subject(s)
Growth Charts , Temporal Muscle , Male , Female , Humans , Child , Temporal Muscle/diagnostic imaging , Temporal Muscle/pathology
13.
Radiographics ; 43(12): e230180, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37999984

ABSTRACT

The remarkable advances of artificial intelligence (AI) technology are revolutionizing established approaches to the acquisition, interpretation, and analysis of biomedical imaging data. Development, validation, and continuous refinement of AI tools requires easy access to large high-quality annotated datasets, which are both representative and diverse. The National Cancer Institute (NCI) Imaging Data Commons (IDC) hosts large and diverse publicly available cancer image data collections. By harmonizing all data based on industry standards and colocalizing it with analysis and exploration resources, the IDC aims to facilitate the development, validation, and clinical translation of AI tools and address the well-documented challenges of establishing reproducible and transparent AI processing pipelines. Balanced use of established commercial products with open-source solutions, interconnected by standard interfaces, provides value and performance, while preserving sufficient agility to address the evolving needs of the research community. Emphasis on the development of tools, use cases to demonstrate the utility of uniform data representation, and cloud-based analysis aim to ease adoption and help define best practices. Integration with other data in the broader NCI Cancer Research Data Commons infrastructure opens opportunities for multiomics studies incorporating imaging data to further empower the research community to accelerate breakthroughs in cancer detection, diagnosis, and treatment. Published under a CC BY 4.0 license.


Subject(s)
Artificial Intelligence , Neoplasms , United States , Humans , National Cancer Institute (U.S.) , Reproducibility of Results , Diagnostic Imaging , Multiomics , Neoplasms/diagnostic imaging
14.
Sci Rep ; 13(1): 18176, 2023 10 24.
Article in English | MEDLINE | ID: mdl-37875663

ABSTRACT

In the past decade, there has been a sharp increase in publications describing applications of convolutional neural networks (CNNs) in medical image analysis. However, recent reviews have warned of the lack of reproducibility of most such studies, which has impeded closer examination of the models and, in turn, their implementation in healthcare. On the other hand, the performance of these models is highly dependent on decisions on architecture and image pre-processing. In this work, we assess the reproducibility of three studies that use CNNs for head and neck cancer outcome prediction by attempting to reproduce the published results. In addition, we propose a new network structure and assess the impact of image pre-processing and model selection criteria on performance. We used two publicly available datasets: one with 298 patients for training and validation and another with 137 patients from a different institute for testing. All three studies failed to report elements required to reproduce their results thoroughly, mainly the image pre-processing steps and the random seed. Our model either outperforms or achieves similar performance to the existing models with considerably fewer parameters. We also observed that the pre-processing efforts significantly impact the model's performance and that some model selection criteria may lead to suboptimal models. Although there have been improvements in the reproducibility of deep learning models, our work suggests that wider implementation of reporting standards is required to avoid a reproducibility crisis.


Subject(s)
Head and Neck Neoplasms , Neural Networks, Computer , Humans , Reproducibility of Results , Head and Neck Neoplasms/diagnostic imaging , Image Processing, Computer-Assisted/methods , Prognosis
15.
medRxiv ; 2023 Nov 22.
Article in English | MEDLINE | ID: mdl-37609311

ABSTRACT

Purpose: To develop and externally validate a scan-to-prediction deep-learning pipeline for noninvasive, MRI-based BRAF mutational status classification for pLGG. Materials and Methods: We conducted a retrospective study of two pLGG datasets with linked genomic and diagnostic T2-weighted MRI of patients: BCH (development dataset, n=214 [60 (28%) BRAF fusion, 50 (23%) BRAF V600E, 104 (49%) wild-type), and Child Brain Tumor Network (CBTN) (external validation, n=112 [60 (53%) BRAF-Fusion, 17 (15%) BRAF-V600E, 35 (32%) wild-type]). We developed a deep learning pipeline to classify BRAF mutational status (V600E vs. fusion vs. wildtype) via a two-stage process: 1) 3D tumor segmentation and extraction of axial tumor images, and 2) slice-wise, deep learning-based classification of mutational status. We investigated knowledge-transfer and self-supervised approaches to prevent model overfitting with a primary endpoint of the area under the receiver operating characteristic curve (AUC). To enhance model interpretability, we developed a novel metric, COMDist, that quantifies the accuracy of model attention around the tumor. Results: A combination of transfer learning from a pretrained medical imaging-specific network and self-supervised label cross-training (TransferX) coupled with consensus logic yielded the highest macro-average AUC (0.82 [95% CI: 0.70-0.90]) and accuracy (77%) on internal validation, with an AUC improvement of +17.7% and a COMDist improvement of +6.4% versus training from scratch. On external validation, the TransferX model yielded AUC (0.73 [95% CI 0.68-0.88]) and accuracy (75%). Conclusion: Transfer learning and self-supervised cross-training improved classification performance and generalizability for noninvasive pLGG mutational status prediction in a limited data scenario.

16.
JAMA Oncol ; 9(10): 1459-1462, 2023 Oct 01.
Article in English | MEDLINE | ID: mdl-37615976

ABSTRACT

This survey study examines the performance of a large language model chatbot in providing cancer treatment recommendations that are concordant with National Comprehensive Cancer Network guidelines.


Subject(s)
Artificial Intelligence , Neoplasms , Humans , Neoplasms/therapy
17.
JAMA Netw Open ; 6(8): e2328280, 2023 08 01.
Article in English | MEDLINE | ID: mdl-37561460

ABSTRACT

Importance: Sarcopenia is an established prognostic factor in patients with head and neck squamous cell carcinoma (HNSCC); the quantification of sarcopenia assessed by imaging is typically achieved through the skeletal muscle index (SMI), which can be derived from cervical skeletal muscle segmentation and cross-sectional area. However, manual muscle segmentation is labor intensive, prone to interobserver variability, and impractical for large-scale clinical use. Objective: To develop and externally validate a fully automated image-based deep learning platform for cervical vertebral muscle segmentation and SMI calculation and evaluate associations with survival and treatment toxicity outcomes. Design, Setting, and Participants: For this prognostic study, a model development data set was curated from publicly available and deidentified data from patients with HNSCC treated at MD Anderson Cancer Center between January 1, 2003, and December 31, 2013. A total of 899 patients undergoing primary radiation for HNSCC with abdominal computed tomography scans and complete clinical information were selected. An external validation data set was retrospectively collected from patients undergoing primary radiation therapy between January 1, 1996, and December 31, 2013, at Brigham and Women's Hospital. The data analysis was performed between May 1, 2022, and March 31, 2023. Exposure: C3 vertebral skeletal muscle segmentation during radiation therapy for HNSCC. Main Outcomes and Measures: Overall survival and treatment toxicity outcomes of HNSCC. Results: The total patient cohort comprised 899 patients with HNSCC (median [range] age, 58 [24-90] years; 140 female [15.6%] and 755 male [84.0%]). Dice similarity coefficients for the validation set (n = 96) and internal test set (n = 48) were 0.90 (95% CI, 0.90-0.91) and 0.90 (95% CI, 0.89-0.91), respectively, with a mean 96.2% acceptable rate between 2 reviewers on external clinical testing (n = 377). Estimated cross-sectional area and SMI values were associated with manually annotated values (Pearson r = 0.99; P < .001) across data sets. On multivariable Cox proportional hazards regression, SMI-derived sarcopenia was associated with worse overall survival (hazard ratio, 2.05; 95% CI, 1.04-4.04; P = .04) and longer feeding tube duration (median [range], 162 [6-1477] vs 134 [15-1255] days; hazard ratio, 0.66; 95% CI, 0.48-0.89; P = .006) than no sarcopenia. Conclusions and Relevance: This prognostic study's findings show external validation of a fully automated deep learning pipeline to accurately measure sarcopenia in HNSCC and an association with important disease outcomes. The pipeline could enable the integration of sarcopenia assessment into clinical decision making for individuals with HNSCC.


Subject(s)
Deep Learning , Head and Neck Neoplasms , Sarcopenia , Humans , Male , Female , Middle Aged , Squamous Cell Carcinoma of Head and Neck/diagnostic imaging , Retrospective Studies , Sarcopenia/diagnostic imaging , Sarcopenia/complications , Head and Neck Neoplasms/complications , Head and Neck Neoplasms/diagnostic imaging
18.
medRxiv ; 2023 Sep 18.
Article in English | MEDLINE | ID: mdl-37425854

ABSTRACT

Purpose: Artificial intelligence (AI)-automated tumor delineation for pediatric gliomas would enable real-time volumetric evaluation to support diagnosis, treatment response assessment, and clinical decision-making. Auto-segmentation algorithms for pediatric tumors are rare, due to limited data availability, and algorithms have yet to demonstrate clinical translation. Methods: We leveraged two datasets from a national brain tumor consortium (n=184) and a pediatric cancer center (n=100) to develop, externally validate, and clinically benchmark deep learning neural networks for pediatric low-grade glioma (pLGG) segmentation using a novel in-domain, stepwise transfer learning approach. The best model [via Dice similarity coefficient (DSC)] was externally validated and subject to randomized, blinded evaluation by three expert clinicians wherein clinicians assessed clinical acceptability of expert- and AI-generated segmentations via 10-point Likert scales and Turing tests. Results: The best AI model utilized in-domain, stepwise transfer learning (median DSC: 0.877 [IQR 0.715-0.914]) versus baseline model (median DSC 0.812 [IQR 0.559-0.888]; p<0.05). On external testing (n=60), the AI model yielded accuracy comparable to inter-expert agreement (median DSC: 0.834 [IQR 0.726-0.901] vs. 0.861 [IQR 0.795-0.905], p=0.13). On clinical benchmarking (n=100 scans, 300 segmentations from 3 experts), the experts rated the AI model higher on average compared to other experts (median Likert rating: 9 [IQR 7-9]) vs. 7 [IQR 7-9], p<0.05 for each). Additionally, the AI segmentations had significantly higher (p<0.05) overall acceptability compared to experts on average (80.2% vs. 65.4%). Experts correctly predicted the origins of AI segmentations in an average of 26.0% of cases. Conclusions: Stepwise transfer learning enabled expert-level, automated pediatric brain tumor auto-segmentation and volumetric measurement with a high level of clinical acceptability. This approach may enable development and translation of AI imaging segmentation algorithms in limited data scenarios.

19.
Cancer Res Commun ; 3(6): 1140-1151, 2023 06.
Article in English | MEDLINE | ID: mdl-37397861

ABSTRACT

Artificial intelligence (AI) and machine learning (ML) are becoming critical in developing and deploying personalized medicine and targeted clinical trials. Recent advances in ML have enabled the integration of wider ranges of data including both medical records and imaging (radiomics). However, the development of prognostic models is complex as no modeling strategy is universally superior to others and validation of developed models requires large and diverse datasets to demonstrate that prognostic models developed (regardless of method) from one dataset are applicable to other datasets both internally and externally. Using a retrospective dataset of 2,552 patients from a single institution and a strict evaluation framework that included external validation on three external patient cohorts (873 patients), we crowdsourced the development of ML models to predict overall survival in head and neck cancer (HNC) using electronic medical records (EMR) and pretreatment radiological images. To assess the relative contributions of radiomics in predicting HNC prognosis, we compared 12 different models using imaging and/or EMR data. The model with the highest accuracy used multitask learning on clinical data and tumor volume, achieving high prognostic accuracy for 2-year and lifetime survival prediction, outperforming models relying on clinical data only, engineered radiomics, or complex deep neural network architecture. However, when we attempted to extend the best performing models from this large training dataset to other institutions, we observed significant reductions in the performance of the model in those datasets, highlighting the importance of detailed population-based reporting for AI/ML model utility and stronger validation frameworks. We have developed highly prognostic models for overall survival in HNC using EMRs and pretreatment radiological images based on a large, retrospective dataset of 2,552 patients from our institution.Diverse ML approaches were used by independent investigators. The model with the highest accuracy used multitask learning on clinical data and tumor volume.External validation of the top three performing models on three datasets (873 patients) with significant differences in the distributions of clinical and demographic variables demonstrated significant decreases in model performance. Significance: ML combined with simple prognostic factors outperformed multiple advanced CT radiomics and deep learning methods. ML models provided diverse solutions for prognosis of patients with HNC but their prognostic value is affected by differences in patient populations and require extensive validation.


Subject(s)
Deep Learning , Head and Neck Neoplasms , Humans , Prognosis , Retrospective Studies , Artificial Intelligence , Head and Neck Neoplasms/diagnostic imaging
20.
JCO Clin Cancer Inform ; 7: e2300048, 2023 07.
Article in English | MEDLINE | ID: mdl-37506330

ABSTRACT

PURPOSE: Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. METHODS: Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers-based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. RESULTS: Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning. CONCLUSION: To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.


Subject(s)
Esophageal Neoplasms , Esophagitis , Humans , Natural Language Processing , Quality of Life , Silver , Esophagitis/diagnosis , Esophagitis/etiology
SELECTION OF CITATIONS
SEARCH DETAIL
...