Search | VHL Regional Portal

The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings.

Syed, Shorabuddin; Angel, Adam Jackson; Syeda, Hafsa Bareen; Jennings, Carole France; VanScoy, Joseph; Syed, Mahanazuddin; Greer, Melody; Bhattacharyya, Sudeepa; Zozus, Meredith; Tharian, Benjamin; Prior, Fred.

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap ; 5: 189-200, 2022 Feb.

Article in English | MEDLINE | ID: mdl-35373222

ABSTRACT

Colonoscopy is a screening and diagnostic procedure for detection of colorectal carcinomas with specific quality metrics that monitor and improve adenoma detection rates. These quality metrics are stored in disparate documents i.e., colonoscopy, pathology, and radiology reports. The lack of integrated standardized documentation is impeding colorectal cancer research. Clinical concept extraction using Natural Language Processing (NLP) and Machine Learning (ML) techniques is an alternative to manual data abstraction. Contextual word embedding models such as BERT (Bidirectional Encoder Representations from Transformers) and FLAIR have enhanced performance of NLP tasks. Combining multiple clinically-trained embeddings can improve word representations and boost the performance of the clinical NLP systems. The objective of this study is to extract comprehensive clinical concepts from the consolidated colonoscopy documents using concatenated clinical embeddings. We built high-quality annotated corpora for three report types. BERT and FLAIR embeddings were trained on unlabeled colonoscopy related documents. We built a hybrid Artificial Neural Network (h-ANN) to concatenate and fine-tune BERT and FLAIR embeddings. To extract concepts of interest from three report types, 3 models were initialized from the h-ANN and fine-tuned using the annotated corpora. The models achieved best F1-scores of 91.76%, 92.25%, and 88.55% for colonoscopy, pathology, and radiology reports respectively.

TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.

Syed, Shorabuddin; Angel, Adam Jackson; Syeda, Hafsa Bareen; Jennings, Carole Franc; VanScoy, Joseph; Syed, Mahanazuddin; Greer, Melody; Bhattacharyya, Sudeepa; Al-Shukri, Shaymaa; Zozus, Meredith; Prior, Fred; Tharian, Benjamin.

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap ; 2022: 162-169, 2022 Feb.

Article in English | MEDLINE | ID: mdl-35300321

ABSTRACT

Colonoscopy plays a critical role in screening of colorectal carcinomas (CC). Unfortunately, the data related to this procedure are stored in disparate documents, colonoscopy, pathology, and radiology reports respectively. The lack of integrated standardized documentation is impeding accurate reporting of quality metrics and clinical and translational research. Natural language processing (NLP) has been used as an alternative to manual data abstraction. Performance of Machine Learning (ML) based NLP solutions is heavily dependent on the accuracy of annotated corpora. Availability of large volume annotated corpora is limited due to data privacy laws and the cost and effort required. In addition, the manual annotation process is error-prone, making the lack of quality annotated corpora the largest bottleneck in deploying ML solutions. The objective of this study is to identify clinical entities critical to colonoscopy quality, and build a high-quality annotated corpus using domain specific taxonomies following standardized annotation guidelines. The annotated corpus can be used to train ML models for a variety of downstream tasks.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL