Pesquisa | Portal Regional da BVS (teste)

Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale.

Tay, See Boon; Low, Guat Hwa; Wong, Gillian Jing En; Tey, Han Jieh; Leong, Fun Loon; Li, Constance; Chua, Melvin Lee Kiang; Tan, Daniel Shao Weng; Thng, Choon Hua; Tan, Iain Bee Huat; Tan, Ryan Shea Ying Cong.

JCO Clin Cancer Inform ; 8: e2300122, 2024 05.

Artigo em Inglês | MEDLINE | ID: mdl-38788166

RESUMO

PURPOSE: To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports. METHODS: A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports. RESULTS: The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% v 12.2%; P < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% v 7.3%; P < .001). CONCLUSION: We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.

Assuntos

Processamento de Linguagem Natural , Metástase Neoplásica , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Neoplasias/patologia , Neoplasias/diagnóstico por imagem , Feminino , Algoritmos , Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Masculino

Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting.

Tan, Ryan Shea Ying Cong; Lin, Qian; Low, Guat Hwa; Lin, Ruixi; Goh, Tzer Chew; Chang, Christopher Chu En; Lee, Fung Fung; Chan, Wei Yin; Tan, Wei Chong; Tey, Han Jieh; Leong, Fun Loon; Tan, Hong Qi; Nei, Wen Long; Chay, Wen Yee; Tai, David Wai Meng; Lai, Gillianne Geet Yi; Cheng, Lionel Tim-Ee; Wong, Fuh Yong; Chua, Matthew Chin Heng; Chua, Melvin Lee Kiang; Tan, Daniel Shao Weng; Thng, Choon Hua; Tan, Iain Bee Huat; Ng, Hwee Tou.

J Am Med Inform Assoc ; 30(10): 1657-1664, 2023 09 25.

Artigo em Inglês | MEDLINE | ID: mdl-37451682

RESUMO

OBJECTIVE: To assess large language models on their ability to accurately infer cancer disease response from free-text radiology reports. MATERIALS AND METHODS: We assembled 10 602 computed tomography reports from cancer patients seen at a single institution. All reports were classified into: no evidence of disease, partial response, stable disease, or progressive disease. We applied transformer models, a bidirectional long short-term memory model, a convolutional neural network model, and conventional machine learning methods to this task. Data augmentation using sentence permutation with consistency loss as well as prompt-based fine-tuning were used on the best-performing models. Models were validated on a hold-out test set and an external validation set based on Response Evaluation Criteria in Solid Tumors (RECIST) classifications. RESULTS: The best-performing model was the GatorTron transformer which achieved an accuracy of 0.8916 on the test set and 0.8919 on the RECIST validation set. Data augmentation further improved the accuracy to 0.8976. Prompt-based fine-tuning did not further improve accuracy but was able to reduce the number of training reports to 500 while still achieving good performance. DISCUSSION: These models could be used by researchers to derive progression-free survival in large datasets. It may also serve as a decision support tool by providing clinicians an automated second opinion of disease response. CONCLUSIONS: Large clinical language models demonstrate potential to infer cancer disease response from radiology reports at scale. Data augmentation techniques are useful to further improve performance. Prompt-based fine-tuning can significantly reduce the size of the training dataset.

Assuntos

Neoplasias , Radiologia , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Neoplasias/diagnóstico por imagem , Relatório de Pesquisa , Processamento de Linguagem Natural

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA