Search | VHL Regional Portal

Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts.

Ladas, Nektarios; Borchert, Florian; Franz, Stefan; Rehberg, Alina; Strauch, Natalia; Sommer, Kim Katrin; Marschollek, Michael; Gietzelt, Matthias.

Health Informatics J ; 29(2): 14604582231164696, 2023.

Article in English | MEDLINE | ID: mdl-37068028

ABSTRACT

BACKGROUND: Extraction of medical terms and their corresponding values from semi-structured and unstructured texts of medical reports can be a time-consuming and error-prone process. Methods of natural language processing (NLP) can help define an extraction pipeline for accomplishing a structured format transformation strategy. OBJECTIVES: In this paper, we build an NLP pipeline to extract values of the classification of malignant tumors (TNM) from unstructured and semi-structured pathology reports and import them further to a structured data source for a clinical study. Our research interest is not focused on standard performance metrics like precision, recall, and F-measure on the test and validation data. We discuss how with the help of software programming techniques the readability of rule-based (RB) information extraction (IE) pipelines can be improved, and therefore minimize the time to correct or update the rules, and efficiently import them to another programming language. METHODS: The extract rules were manually programmed with training data of TNM classification and tested in two separate pipelines based on design specifications from domain experts and data curators. Firstly we implemented each rule directly in one line for each extraction item. Secondly, we reprogrammed them in a readable fashion through decomposition and intention-revealing names for the variable declaration. To measure the impact of both methods we measure the time for the fine-tuning and programming of the extractions through test data of semi-structured and unstructured texts. RESULTS: We analyze the benefits of improving through readability of the writing of rules, through parallel programming with regular expressions (REGEX), and the Apache Uima Ruta language (AURL). The time for correcting the readable rules in AURL and REGEX was significantly reduced. Complicated rules in REGEX are decomposed and intention-revealing declarations were reprogrammed in AURL in 5 min. CONCLUSION: We discuss the importance of factor readability and how can it be improved when programming RB text IE pipelines. Independent of the features of the programming language and the tools applied, a readable coding strategy can be proven beneficial for future maintenance and offer an interpretable solution for understanding the extraction and for transferring the rules to other domains and NLP pipelines.

Subject(s)

Electronic Health Records , Natural Language Processing , Humans , Comprehension , Algorithms , Information Storage and Retrieval

openEHR-to-FHIR: Converting openEHR Compositions to Fast Healthcare Interoperability Resources (FHIR) for the German Corona Consensus Dataset (GECCO).

Ladas, Nektarios; Franz, Stefan; Haarbrandt, Birger; Sommer, Kim Katrin; Kohler, Severin; Ballout, Sarah; Fiebeck, Johanna; Marschollek, Michael; Gietzelt, Matthias.

Stud Health Technol Inform ; 289: 485-486, 2022 Jan 14.

Article in English | MEDLINE | ID: mdl-35062196

ABSTRACT

The German Corona Consensus (GECCO) established a uniform dataset in FHIR format for exchanging and sharing interoperable COVID-19 patient specific data between health information systems (HIS) for universities. For sharing the COVID-19 information with other locations that use openEHR, the data are to be converted in FHIR format. In this paper, we introduce our solution through a web-tool named "openEHR-to-FHIR" that converts compositions from an openEHR repository and stores in their respective GECCO FHIR profiles. The tool provides a REST web service for ad hoc conversion of openEHR compositions to FHIR profiles.

Subject(s)

COVID-19 , Electronic Health Records , Consensus , Delivery of Health Care , Humans , SARS-CoV-2

Transformation of microbiology data into a standardised data representation using OpenEHR.

Wulff, Antje; Baier, Claas; Ballout, Sarah; Tute, Erik; Sommer, Kim Katrin; Kaase, Martin; Sargeant, Anneka; Drenkhahn, Cora; Schlüter, Dirk; Marschollek, Michael; Scheithauer, Simone.

Sci Rep ; 11(1): 10556, 2021 05 18.

Article in English | MEDLINE | ID: mdl-34006956

ABSTRACT

The spread of multidrug resistant organisms (MDRO) is a global healthcare challenge. Nosocomial outbreaks caused by MDRO are an important contributor to this threat. Computer-based applications facilitating outbreak detection can be essential to address this issue. To allow application reusability across institutions, the various heterogeneous microbiology data representations needs to be transformed into standardised, unambiguous data models. In this work, we present a multi-centric standardisation approach by using openEHR as modelling standard. Data models have been consented in a multicentre and international approach. Participating sites integrated microbiology reports from primary source systems into an openEHR-based data platform. For evaluation, we implemented a prototypical application, compared the transformed data with original reports and conducted automated data quality checks. We were able to develop standardised and interoperable microbiology data models. The publicly available data models can be used across institutions to transform real-life microbiology reports into standardised representations. The implementation of a proof-of-principle and quality control application demonstrated that the new formats as well as the integration processes are feasible. Holistic transformation of microbiological data into standardised openEHR based formats is feasible in a real-life multicentre setting and lays the foundation for developing cross-institutional, automated outbreak detection systems.

Subject(s)

Cross Infection/microbiology , Drug Resistance, Microbial , Electronic Health Records/standards , Computer Simulation , Cross Infection/epidemiology , Disease Outbreaks , Humans , Interinstitutional Relations , Proof of Concept Study , Reference Standards

A Report on Archetype Modelling in a Nationwide Data Infrastructure Project.

Wulff, Antje; Sommer, Kim Katrin; Ballout, Sarah; Haarbrandt, Birger; Gietzelt, Matthias.

Stud Health Technol Inform ; 258: 146-150, 2019.

Article in English | MEDLINE | ID: mdl-30942733

ABSTRACT

BACKGROUND: The nationwide data infrastructure project HiGHmed strives for achieving semantic interoperability through the use of openEHR archetypes. Therefore, a knowledge governance framework defining collaborative modelling processes has been established. For long-sustained success and the creation of high-quality archetypes, continuous monitoring is vital. OBJECTIVES: To present an update on archetype modelling and governance framework establishment in HiGHmed. METHODS: Qualitative and quantitative analyses of the progress in establishing modelling groups, roles and users, realizing modelling workflows, and modelling archetypes. RESULTS: Currently, 25 modellers and 17 domain experts are participating. 79 archetypes have been identified, from which 69 are pre-existing and internationally published; completion rates of review rounds are satisfying but improvable. CONCLUSIONS: The governance framework is valuable to make the activities manageable and to accelerate modelling. Combined with highly engaged data stewards and clinicians, a reasonable number of archetypes have already been developed.

Subject(s)

Electronic Health Records , Semantics , Data Systems

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL