Search | VHL Regional Portal

NeighBERT: Medical Entity Linking Using Relation-Induced Dense Retrieval.

Singh, Ayush; Krishnamoorthy, Saranya; Ortega, John E.

J Healthc Inform Res ; 8(2): 353-369, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38681752

ABSTRACT

One of the common tasks in clinical natural language processing is medical entity linking (MEL) which involves mention detection followed by linking the mention to an entity in a knowledge base. One reason that MEL has not been solved is due to a problem that occurs in language where ambiguous texts can be resolved to several named entities. This problem is exacerbated when processing the text found in electronic health records. Recent work has shown that deep learning models based on transformers outperform previous methods on linking at higher rates of performance. We introduce NeighBERT, a custom pre-training technique which extends BERT (Devlin et al [1]) by encoding how entities are related within a knowledge graph. This technique adds relational context that has been traditionally missing in original BERT, helping resolve the ambiguity found in clinical text. In our experiments, NeighBERT improves the precision, recall, and F1-score of the state of the art by 1-3 points for named entity recognition and 10-15 points for MEL on two widely known clinical datasets. Supplementary Information: The online version contains supplementary material available at 10.1007/s41666-023-00136-3.

AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas.

Kann, Katharina; Ebrahimi, Abteen; Mager, Manuel; Oncevay, Arturo; Ortega, John E; Rios, Annette; Fan, Angela; Gutierrez-Vasques, Ximena; Chiruzzo, Luis; Giménez-Lugo, Gustavo A; Ramos, Ricardo; Meza Ruiz, Ivan Vladimir; Mager, Elisabeth; Chaudhary, Vishrav; Neubig, Graham; Palmer, Alexis; Coto-Solano, Rolando; Vu, Ngoc Thang.

Front Artif Intell ; 5: 995667, 2022.

Article in English | MEDLINE | ID: mdl-36530357

ABSTRACT

Little attention has been paid to the development of human language technology for truly low-resource languages-i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine translation models for those languages. Finally, using a standard transformer model, we explore translation-based approaches for natural language inference. We find that the zero-shot performance of pretrained models without adaptation is poor for all languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task.

Fuzzy-Match Repair Guided by Quality Estimation.

Ortega, John E; Forcada, Mikel L; Sanchez-Martinez, Felipe.

IEEE Trans Pattern Anal Mach Intell ; 44(3): 1264-1277, 2022 03.

Article in English | MEDLINE | ID: mdl-32877333

ABSTRACT

Computer-aided translation tools based on translation memories are widely used to assist professional translators. A translation memory (TM) consists of a set of translation units (TU) made up of source- and target-language segment pairs. For the translation of a new source segment s', these tools search the TM and retrieve the TUs (s,t) whose source segments are more similar to s'. The translator then chooses a TU and edit the target segment t to turn it into an adequate translation of s'. Fuzzy-match repair (FMR) techniques can be used to automatically modify the parts of t that need to be edited. We describe a language-independent FMR method that first uses machine translation to generate, given s' and (s,t), a set of candidate fuzzy-match repaired segments, and then chooses the best one by estimating their quality. An evaluation on three different language pairs shows that the selected candidate is a good approximation to the best (oracle) candidate produced and is closer to reference translations than machine-translated segments and unrepaired fuzzy matches ( t). In addition, a single quality estimation model trained on a mix of data from all the languages performs well on any of the languages used.

Subject(s)

Algorithms , Translating , Language , Translations

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL