Search | VHL Regional Portal

AMMU: A survey of transformer-based biomedical pretrained language models.

Kalyan, Katikapalli Subramanyam; Rajasekharan, Ajit; Sangeetha, Sivanesan.

J Biomed Inform ; 126: 103982, 2022 02.

Article in English | MEDLINE | ID: mdl-34974190

ABSTRACT

Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.

Subject(s)

Biomedical Research , Natural Language Processing , Language

BertMCN: Mapping colloquial phrases to standard medical concepts using BERT and highway network.

Kalyan, Katikapalli Subramanyam; Sangeetha, Sivanesan.

Artif Intell Med ; 112: 102008, 2021 02.

Article in English | MEDLINE | ID: mdl-33581833

ABSTRACT

In the last few years, people started to share lots of information related to health in the form of tweets, reviews and blog posts. All these user generated clinical texts can be mined to generate useful insights. However, automatic analysis of clinical text requires identification of standard medical concepts. Most of the existing deep learning based medical concept normalization systems are based on CNN or RNN. Performance of these models is limited as they have to be trained from scratch (except embeddings). In this work, we propose a medical concept normalization system based on BERT and highway layer. BERT, a pre-trained context sensitive deep language representation model advanced state-of-the-art performance in many NLP tasks and gating mechanism in highway layer helps the model to choose only important information. Experimental results show that our model outperformed all existing methods on two standard datasets. Further, we conduct a series of experiments to study the impact of different learning rates and batch sizes, noise and freezing encoder layers on our model.

Subject(s)

Language , Natural Language Processing , Humans

SECNLP: A survey of embeddings in clinical natural language processing.

Kalyan, Katikapalli Subramanyam; Sangeetha, S.

J Biomed Inform ; 101: 103323, 2020 01.

Article in English | MEDLINE | ID: mdl-31711972

ABSTRACT

Distributed vector representations or embeddings map variable length text to dense fixed length vectors as well as capture prior knowledge which can transferred to downstream tasks. Even though embeddings have become de facto standard for text representation in deep learning based NLP tasks in both general and clinical domains, there is no survey paper which presents a detailed review of embeddings in Clinical Natural Language Processing. In this survey paper, we discuss various medical corpora and their characteristics, medical codes and present a brief overview as well as comparison of popular embeddings models. We classify clinical embeddings and discuss each embedding type in detail. We discuss various evaluation methods followed by possible solutions to various challenges in clinical embeddings. Finally, we conclude with some of the future directions which will advance research in clinical embeddings.

Subject(s)

Knowledge , Natural Language Processing , Surveys and Questionnaires

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL