Search | VHL Regional Portal

Improving Information Extraction from Pathology Reports using Named Entity Recognition.

Zeng, Ken G; Dutt, Tarun; Witowski, Jan; Kranthi Kiran, G V; Yeung, Frank; Kim, Michelle; Kim, Jesi; Pleasure, Mitchell; Moczulski, Christopher; Lopez, L Julian Lechuga; Zhang, Hao; Harbi, Mariam Al; Shamout, Farah E; Major, Vincent J; Heacock, Laura; Moy, Linda; Schnabel, Freya; Pak, Linda M; Shen, Yiqiu; Geras, Krzysztof J.

Res Sq ; 2023 Jul 03.

Article in English | MEDLINE | ID: mdl-37461545

ABSTRACT

Pathology reports are considered the gold standard in medical research due to their comprehensive and accurate diagnostic information. Natural language processing (NLP) techniques have been developed to automate information extraction from pathology reports. However, existing studies suffer from two significant limitations. First, they typically frame their tasks as report classification, which restricts the granularity of extracted information. Second, they often fail to generalize to unseen reports due to variations in language, negation, and human error. To overcome these challenges, we propose a BERT (bidirectional encoder representations from transformers) named entity recognition (NER) system to extract key diagnostic elements from pathology reports. We also introduce four data augmentation methods to improve the robustness of our model. Trained and evaluated on 1438 annotated breast pathology reports, acquired from a large medical center in the United States, our BERT model trained with data augmentation achieves an entity F1-score of 0.916 on an internal test set, surpassing the BERT baseline (0.843). We further assessed the model's generalizability using an external validation dataset from the United Arab Emirates, where our model maintained satisfactory performance (F1-score 0.860). Our findings demonstrate that our NER systems can effectively extract fine-grained information from widely diverse medical reports, offering the potential for large-scale information extraction in a wide range of medical and AI research. We publish our code at https://github.com/nyukat/pathology_extraction.

Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams.

Shen, Yiqiu; Shamout, Farah E; Oliver, Jamie R; Witowski, Jan; Kannan, Kawshik; Park, Jungkyu; Wu, Nan; Huddleston, Connor; Wolfson, Stacey; Millet, Alexandra; Ehrenpreis, Robin; Awal, Divya; Tyma, Cathy; Samreen, Naziya; Gao, Yiming; Chhor, Chloe; Gandhi, Stacey; Lee, Cindy; Kumari-Subaiya, Sheila; Leonard, Cindy; Mohammed, Reyhan; Moczulski, Christopher; Altabet, Jaime; Babb, James; Lewin, Alana; Reig, Beatriu; Moy, Linda; Heacock, Laura; Geras, Krzysztof J.

Nat Commun ; 12(1): 5645, 2021 09 24.

Article in English | MEDLINE | ID: mdl-34561440

ABSTRACT

Though consistently shown to detect mammographically occult cancers, breast ultrasound has been noted to have high false-positive rates. In this work, we present an AI system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. Developed on 288,767 exams, consisting of 5,442,907 B-mode and Color Doppler images, the AI achieves an area under the receiver operating characteristic curve (AUROC) of 0.976 on a test set consisting of 44,755 exams. In a retrospective reader study, the AI achieves a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924 ± 0.02 radiologists). With the help of the AI, radiologists decrease their false positive rates by 37.3% and reduce requested biopsies by 27.8%, while maintaining the same level of sensitivity. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis.

Subject(s)

Algorithms , Artificial Intelligence , Breast Neoplasms/diagnostic imaging , Breast/diagnostic imaging , Early Detection of Cancer , Ultrasonography/methods , Adult , Aged , Breast Neoplasms/diagnosis , Female , Humans , Mammography/methods , Middle Aged , ROC Curve , Radiologists/statistics & numerical data , Reproducibility of Results , Retrospective Studies

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL