Search | VHL Regional Portal

NILINKER: Attention-based approach to NIL Entity Linking.

Ruas, Pedro; Couto, Francisco M.

J Biomed Inform ; 132: 104137, 2022 08.

Article in English | MEDLINE | ID: mdl-35811025

ABSTRACT

The existence of unlinkable (NIL) entities is a major hurdle affecting the performance of Named Entity Linking approaches, and, consequently, the performance of downstream models that depend on them. Existing approaches to deal with NIL entities focus mainly on clustering and prediction and are limited to general entities. However, other domains, such as the biomedical sciences, are also prone to the existence of NIL entities, given the growing nature of scientific literature. We propose NILINKER, a model that includes a candidate retrieval module for biomedical NIL entities and a neural network that leverages the attention mechanism to find the top-k relevant concepts from target Knowledge Bases (MEDIC, CTD-Chemicals, ChEBI, HP, CTD-Anatomy and Gene Ontology-Biological Process) that may partially represent a given NIL entity. We also make available a new evaluation dataset designated by EvaNIL, suitable for training and evaluating models focusing on the NIL entity linking task. This dataset contains 846,165 documents (abstracts and full-text biomedical articles), including 1,071,776 annotations, distributed by six different partitions: EvaNIL-MEDIC, EvaNIL-CTD-Chemicals, EvaNIL-ChEBI, EvaNIL-HP, EvaNIL-CTD-Anatomy and EvaNIL-Gene Ontology-Biological Process. NILINKER was integrated into a graph-based Named Entity Linking model (REEL) and the results of the experiments show that this approach is able to increase the performance of the Named Entity Linking model.

Subject(s)

Data Mining , Neural Networks, Computer , Cluster Analysis , Data Mining/methods , Gene Ontology , Knowledge Bases

COVID-19 recommender system based on an annotated multilingual corpus.

Barros, Márcia; Ruas, Pedro; Sousa, Diana; Bangash, Ali Haider; Couto, Francisco M.

Genomics Inform ; 19(3): e24, 2021 Sep.

Article in English | MEDLINE | ID: mdl-34638171

ABSTRACT

Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature.

Ruas, Pedro; Lamurias, Andre; Couto, Francisco M.

J Cheminform ; 12(1): 57, 2020 Sep 21.

Article in English | MEDLINE | ID: mdl-33430995

ABSTRACT

BACKGROUND: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. FINDINGS: This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. CONCLUSIONS: We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.

Cutaneous Leishmaniasis: The Complexity of Host's Effective Immune Response against a Polymorphic Parasitic Disease.

Gabriel, Áurea; Valério-Bolas, Ana; Palma-Marques, Joana; Mourata-Gonçalves, Patrícia; Ruas, Pedro; Dias-Guerreiro, Tatiana; Santos-Gomes, Gabriela.

J Immunol Res ; 2019: 2603730, 2019.

Article in English | MEDLINE | ID: mdl-31871953

ABSTRACT

This review is aimed at providing a comprehensive outline of the immune response displayed against cutaneous leishmaniasis (CL), the more common zoonotic infection caused by protozoan parasites of the genus Leishmania. Although of polymorphic clinical presentation, classically CL is characterized by leishmaniotic lesions on the face and extremities of the patients, which can be ulcerative, and even after healing can lead to permanent injuries and disfigurement, affecting significantly their psychological, social, and economic well-being. According a report released by the World Health Organization, the disability-adjusted life years (DALYs) lost due to leishmaniasis are close to 2.4 million, annually there are 1.0-1.5 million new cases of CL, and a numerous population is at risk in the endemic areas. Despite its increasing worldwide incidence, it is one of the so-called neglected tropical diseases. Furthermore, this review provides an overview of the existing knowledge of the host innate and acquired immune response to cutaneous species of Leishmania. The use of animal models and of in vitro studies has improved the understanding of parasite-host interplay and the complexity of immune mechanisms involved. The importance of diagnosis accuracy associated with effective patient management in CL reduction is highlighted. However, the multiple factors involved in CL epizoology associated with the unavailability of vaccines or drugs to prevent infection make difficult to formulate an effective strategy for CL control.

Subject(s)

Host-Pathogen Interactions/immunology , Leishmania/immunology , Leishmaniasis, Cutaneous/immunology , Leishmaniasis, Cutaneous/parasitology , Disease Management , Disease Susceptibility/immunology , Geography, Medical , Global Health , Humans , Immunity , Leishmaniasis, Cutaneous/diagnosis , Leishmaniasis, Cutaneous/epidemiology , Patient Outcome Assessment , Severity of Illness Index

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking.

Lamurias, Andre; Ruas, Pedro; Couto, Francisco M.

BMC Bioinformatics ; 20(1): 534, 2019 Oct 29.

Article in English | MEDLINE | ID: mdl-31664891

ABSTRACT

BACKGROUND: Biomedical literature concerns a wide range of concepts, requiring controlled vocabularies to maintain a consistent terminology across different research groups. However, as new concepts are introduced, biomedical literature is prone to ambiguity, specifically in fields that are advancing more rapidly, for example, drug design and development. Entity linking is a text mining task that aims at linking entities mentioned in the literature to concepts in a knowledge base. For example, entity linking can help finding all documents that mention the same concept and improve relation extraction methods. Existing approaches focus on the local similarity of each entity and the global coherence of all entities in a document, but do not take into account the semantics of the domain. RESULTS: We propose a method, PPR-SSM, to link entities found in documents to concepts from domain-specific ontologies. Our method is based on Personalized PageRank (PPR), using the relations of the ontology to generate a graph of candidate concepts for the mentioned entities. We demonstrate how the knowledge encoded in a domain-specific ontology can be used to calculate the coherence of a set of candidate concepts, improving the accuracy of entity linking. Furthermore, we explore weighting the edges between candidate concepts using semantic similarity measures (SSM). We show how PPR-SSM can be used to effectively link named entities to biomedical ontologies, namely chemical compounds, phenotypes, and gene-product localization and processes. CONCLUSIONS: We demonstrated that PPR-SSM outperforms state-of-the-art entity linking methods in four distinct gold standards, by taking advantage of the semantic information contained in ontologies. Moreover, PPR-SSM is a graph-based method that does not require training data. Our method improved the entity linking accuracy of chemical compounds by 0.1385 when compared to a method that does not use SSMs.

Subject(s)

Semantics , Biological Ontologies , Data Mining/methods , Databases, Factual , Humans , Knowledge Bases , Vocabulary, Controlled

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL