Search | VHL Regional Portal

Multimodal learning for temporal relation extraction in clinical texts.

Knez, Timotej; Zitnik, Slavko.

J Am Med Inform Assoc ; 31(6): 1380-1387, 2024 May 20.

Article in English | MEDLINE | ID: mdl-38531680

ABSTRACT

OBJECTIVES: This study focuses on refining temporal relation extraction within medical documents by introducing an innovative bimodal architecture. The overarching goal is to enhance our understanding of narrative processes in the medical domain, particularly through the analysis of extensive reports and notes concerning patient experiences. MATERIALS AND METHODS: Our approach involves the development of a bimodal architecture that seamlessly integrates information from both text documents and knowledge graphs. This integration serves to infuse common knowledge about events into the temporal relation extraction process. Rigorous testing was conducted on diverse clinical datasets, emulating real-world scenarios where the extraction of temporal relationships is paramount. RESULTS: The performance of our proposed bimodal architecture was thoroughly evaluated across multiple clinical datasets. Comparative analyses demonstrated its superiority over existing methods reliant solely on textual information for temporal relation extraction. Notably, the model showcased its effectiveness even in scenarios where not provided with additional information. DISCUSSION: The amalgamation of textual data and knowledge graph information in our bimodal architecture signifies a notable advancement in the field of temporal relation extraction. This approach addresses the critical need for a more profound understanding of narrative processes in medical contexts. CONCLUSION: In conclusion, our study introduces a pioneering bimodal architecture that harnesses the synergy of text and knowledge graph data, exhibiting superior performance in temporal relation extraction from medical documents. This advancement holds significant promise for improving the comprehension of patients' healthcare journeys and enhancing the overall effectiveness of extracting temporal relationships in complex medical narratives.

Subject(s)

Electronic Health Records , Natural Language Processing , Humans , Data Mining/methods , Narration , Machine Learning , Datasets as Topic , Information Storage and Retrieval/methods

Process models of interrelated speech intentions from online health-related conversations.

Epure, Elena V; Compagno, Dario; Salinesi, Camille; Deneckere, Rébecca; Bajec, Marko; Zitnik, Slavko.

Artif Intell Med ; 91: 23-38, 2018 09.

Article in English | MEDLINE | ID: mdl-30030089

ABSTRACT

Being related to the adoption of new beliefs, attitudes and, ultimately, behaviors, analyzing online communication is of utmost importance for medicine. Multiple health care, academic communities, such as information seeking and dissemination and persuasive technologies, acknowledge this need. However, in order to obtain understanding, a relevant way to model online communication for the study of behavior is required. In this paper, we propose an automatic method to reveal process models of interrelated speech intentions from conversations. Specifically, a domain-independent taxonomy of speech intentions is adopted, an annotated corpus of Reddit conversations is released, supervised classifiers for speech intention prediction from utterances are trained and assessed using 10-fold cross validation (multi-class, one-versus-all and multi-label setups) and an approach to transform conversations into well-defined, representative logs of verbal behavior, needed by process mining techniques, is designed. The experimental results show that: (1) the automatic classification of intentions is feasible (with Kappa scores varying between 0.52 and 1); (2) predicting pairs of intentions, also known as adjacency pairs, or including more utterances from even other heterogeneous corpora can improve the predictions of some classes; and (3) the classifiers in the current state are robust to be used on other corpora, although the results are poorer and suggest that the input corpus may not sufficiently capture varied ways of expressing certain speech intentions. The extracted process models of interrelated speech intentions open new views on grasping the formation of beliefs and behavioral intentions in and from speech, but in-depth evaluation of these conversational models is further required.

Subject(s)

Consumer Health Information/methods , Data Mining/methods , Information Seeking Behavior , Internet , Speech , Communication , Humans , Intention , Machine Learning , Natural Language Processing

Sieve-based relation extraction of gene regulatory networks from biological literature.

Zitnik, Slavko; Zitnik, Marinka; Zupan, Blaz; Bajec, Marko.

BMC Bioinformatics ; 16 Suppl 16: S1, 2015.

Article in English | MEDLINE | ID: mdl-26551454

ABSTRACT

BACKGROUND: Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. RESULTS: We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. CONCLUSIONS: Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.

Subject(s)

Gene Regulatory Networks , Information Storage and Retrieval , Publications , Algorithms , Models, Theoretical

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Zitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel.

J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S2, 2015.

Article in English | MEDLINE | ID: mdl-25810773

ABSTRACT

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

SkipCor: skip-mention coreference resolution using linear-chain conditional random fields.

Zitnik, Slavko; Subelj, Lovro; Bajec, Marko.

PLoS One ; 9(6): e100101, 2014.

Article in English | MEDLINE | ID: mdl-24956272

ABSTRACT

Coreference resolution tries to identify all expressions (called mentions) in observed text that refer to the same entity. Beside entity extraction and relation extraction, it represents one of the three complementary tasks in Information Extraction. In this paper we describe a novel coreference resolution system SkipCor that reformulates the problem as a sequence labeling task. None of the existing supervised, unsupervised, pairwise or sequence-based models are similar to our approach, which only uses linear-chain conditional random fields and supports high scalability with fast model training and inference, and a straightforward parallelization. We evaluate the proposed system against the ACE 2004, CoNLL 2012 and SemEval 2010 benchmark datasets. SkipCor clearly outperforms two baseline systems that detect coreferentiality using the same features as SkipCor. The obtained results are at least comparable to the current state-of-the-art in coreference resolution.

Subject(s)

Artificial Intelligence , Models, Theoretical

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL