Search | VHL Regional Portal

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.

Rabby, Gollam; D'Souza, Jennifer; Oelen, Allard; Dvorackova, Lucie; Svátek, Vojtech; Auer, Sören.

J Biomed Semantics ; 14(1): 18, 2023 Nov 28.

Article in English | MEDLINE | ID: mdl-38017587

ABSTRACT

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

Subject(s)

COVID-19 , Pattern Recognition, Automated , Humans , Machine Learning , Algorithms , Language

Tool-supported Interactive Correction and Semantic Annotation of Narrative Clinical Reports.

Zvára, Karel; Tomecková, Marie; Peleska, Jan; Svátek, Vojtech; Zvárová, Jana.

Methods Inf Med ; 56(3): 217-229, 2017 May 18.

Article in English | MEDLINE | ID: mdl-28451691

ABSTRACT

OBJECTIVES: Our main objective is to design a method of, and supporting software for, interactive correction and semantic annotation of narrative clinical reports, which would allow for their easier and less erroneous processing outside their original context: first, by physicians unfamiliar with the original language (and possibly also the source specialty), and second, by tools requiring structured information, such as decision-support systems. Our additional goal is to gain insights into the process of narrative report creation, including the errors and ambiguities arising therein, and also into the process of report annotation by clinical terms. Finally, we also aim to provide a dataset of ground-truth transformations (specific for Czech as the source language), set up by expert physicians, which can be reused in the future for subsequent analytical studies and for training automated transformation procedures. METHODS: A three-phase preprocessing method has been developed to support secondary use of narrative clinical reports in electronic health record. Narrative clinical reports are narrative texts of healthcare documentation often stored in electronic health records. In the first phase a narrative clinical report is tokenized. In the second phase the tokenized clinical report is normalized. The normalized clinical report is easily readable for health professionals with the knowledge of the language used in the narrative clinical report. In the third phase the normalized clinical report is enriched with extracted structured information. The final result of the third phase is a semi-structured normalized clinical report where the extracted clinical terms are matched to codebook terms. Software tools for interactive correction, expansion and semantic annotation of narrative clinical reports has been developed and the three-phase preprocessing method validated in the cardiology area. RESULTS: The three-phase preprocessing method was validated on 49 anonymous Czech narrative clinical reports in the field of cardiology. Descriptive statistics from the database of accomplished transformations has been calculated. Two cardiologists participated in the annotation phase. The first cardiologist annotated 1500 clinical terms found in 49 narrative clinical reports to codebook terms using the classification systems ICD 10, SNOMED CT, LOINC and LEKY. The second cardiologist validated annotations of the first cardiologist. The correct clinical terms and the codebook terms have been stored in a database. CONCLUSIONS: We extracted structured information from Czech narrative clinical reports by the proposed three-phase preprocessing method and linked it to electronic health records. The software tool, although generic, is tailored for Czech as the specific language of electronic health record pool under study. This will provide a potential etalon for porting this approach to dozens of other less-spoken languages. Structured information can support medical decision making, quality assurance tasks and further medical research.

Subject(s)

Electronic Health Records/standards , Machine Learning , Natural Language Processing , Semantics , Vocabulary, Controlled , Word Processing/standards , Writing/standards , Data Accuracy , Guidelines as Topic , International Classification of Diseases , Meaningful Use/standards , Software , User-Computer Interface

OntoCheck: verifying ontology naming conventions and metadata completeness in Protégé 4.

Schober, Daniel; Tudose, Ilinca; Svatek, Vojtech; Boeker, Martin.

J Biomed Semantics ; 3 Suppl 2: S4, 2012 Sep 21.

Article in English | MEDLINE | ID: mdl-23046606

ABSTRACT

BACKGROUND: Although policy providers have outlined minimal metadata guidelines and naming conventions, ontologies of today still display inter- and intra-ontology heterogeneities in class labelling schemes and metadata completeness. This fact is at least partially due to missing or inappropriate tools. Software support can ease this situation and contribute to overall ontology consistency and quality by helping to enforce such conventions. OBJECTIVE: We provide a plugin for the Protégé Ontology editor to allow for easy checks on compliance towards ontology naming conventions and metadata completeness, as well as curation in case of found violations. IMPLEMENTATION: In a requirement analysis, derived from a prior standardization approach carried out within the OBO Foundry, we investigate the needed capabilities for software tools to check, curate and maintain class naming conventions. A Protégé tab plugin was implemented accordingly using the Protégé 4.1 libraries. The plugin was tested on six different ontologies. Based on these test results, the plugin could be refined, also by the integration of new functionalities. RESULTS: The new Protégé plugin, OntoCheck, allows for ontology tests to be carried out on OWL ontologies. In particular the OntoCheck plugin helps to clean up an ontology with regard to lexical heterogeneity, i.e. enforcing naming conventions and metadata completeness, meeting most of the requirements outlined for such a tool. Found test violations can be corrected to foster consistency in entity naming and meta-annotation within an artefact. Once specified, check constraints like name patterns can be stored and exchanged for later re-use. Here we describe a first version of the software, illustrate its capabilities and use within running ontology development efforts and briefly outline improvements resulting from its application. Further, we discuss OntoChecks capabilities in the context of related tools and highlight potential future expansions. CONCLUSIONS: The OntoCheck plugin facilitates labelling error detection and curation, contributing to lexical quality assurance in OWL ontologies. Ultimately, we hope this Protégé extension will ease ontology alignments as well as lexical post-processing of annotated data and hence can increase overall secondary data usage by humans and computers.

Mark-up based analysis of narrative guidelines with the Stepper tool.

Ruzicka, Marek; Svátek, Vojtech.

Stud Health Technol Inform ; 101: 132-6, 2004.

Article in English | MEDLINE | ID: mdl-15537215

ABSTRACT

The Stepper tool was developed to assist a knowledge engineer in developing a computable version of narrative guidelines. The system is document-centric: it formalises the initial text in multiple user-definable steps corresponding to interactive XML transformations. In this paper, we report on experience obtained by applying the tool on a narrative guideline document addressing unstable angina pectoris. Possible role of the tool and associated methodology in developing a guideline-based application is also discussed.

Subject(s)

Artificial Intelligence , Decision Support Systems, Clinical , Practice Guidelines as Topic , Humans , Programming Languages , User-Computer Interface

Analysis of guideline compliance--a data mining approach.

Svátek, Vojtech; Ríha, Antonín; Peleska, Jan; Rauch, Jan.

Stud Health Technol Inform ; 101: 157-61, 2004.

Article in English | MEDLINE | ID: mdl-15537220

ABSTRACT

While guideline-based decision support is safety-critical and typically requires human interaction, offline analysis of guideline compliance can be performed to large extent automatically. We examine the possibility of automatic detection of potential non-compliance followed up with (statistical) association mining. Only frequent associations of non-compliance patterns with various patient data are submitted to medical expert for interpretation. The initial experiment was carried out in the domain of hypertension management.

Subject(s)

Decision Support Systems, Clinical , Guideline Adherence , Practice Guidelines as Topic , Decision Making, Computer-Assisted , Evidence-Based Medicine , Humans , Practice Patterns, Physicians' , Software

Step-by-step mark-up of medical guideline documents.

Svátek, Vojtech; Ruzicka, Marek.

Int J Med Inform ; 70(2-3): 329-35, 2003 Jul.

Article in English | MEDLINE | ID: mdl-12909185

ABSTRACT

Approaches to formalization of medical guidelines can be divided into model-centric and document-centric. While model-centric approaches dominate in the development of clinical decision support applications, document-centric, mark-up-based formalization is suitable for application tasks requiring the 'literal' content of the document to be transferred into the formal model. Examples of such tasks are logical verification of the document or compliance analysis of health records. The quality and efficiency of document-centric formalization can be improved using a decomposition of the whole process into several explicit steps. We present a methodology and software tool supporting the step-by-step formalization process. The knowledge elements can be marked up in the source text, refined to a tree structure with increasing level of detail, rearranged into an XML knowledge base, and, finally, exported into the operational representation. User-definable transformation rules enable to automate a large part of the process. The approach is being tested in the domain of cardiology. For parts of the WHO/ISH Guidelines for Hypertension, the process has been carried out through all the stages, to the form of executable application, generated automatically from the XML knowledge base.

Subject(s)

Artificial Intelligence , Decision Support Systems, Clinical , Medical Records Systems, Computerized , Practice Guidelines as Topic , Cardiology/standards , Decision Trees , Humans , Hypertension/therapy

Step-by-step mark-up of medical guideline documents.

Svátek, Vojtech; Ruzicka, Marek.

Stud Health Technol Inform ; 90: 591-5, 2002.

Article in English | MEDLINE | ID: mdl-15460762

ABSTRACT

The quality of document-centric formalisation of medical guidelines can be improved using a decomposition of the whole process into several explicit steps. We present a methodology and a software tool supporting the step-by-step formalisation process. The knowledge elements can be marked up in the text with increasing level of detail, rearranged into an XML knowledge base and exported into the operational representation. Semi-automated transitions can be specified by means of rules. The approach has been tested in a hypertension application.

Subject(s)

Documentation , Practice Guidelines as Topic , Czech Republic , Programming Languages

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL