Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 91
Filter
1.
AMIA Jt Summits Transl Sci Proc ; 2024: 545-554, 2024.
Article in English | MEDLINE | ID: mdl-38827070

ABSTRACT

SNOMED CT is the most comprehensive clinical terminology employed worldwide and enhancing its accuracy is of utmost importance. In this work, we introduce an automated approach to identifying erroneous IS-A relations in SNOMED CT. We first extract linked concept-pairs from which we generate Term Difference Pairs (TDPs) that contain differences between the concepts. Given a TDP, if the reversed TDP also exists and the number of linked-pairs generating this TDP is less than those generating the reversed TDP, then we suggest the former linked-pairs as potentially erroneous IS-A relations. We applied this approach to the Clinical finding and Procedure subhierarchies of the 2022 March US Edition of SNOMED CT, and obtained 52 potentially erroneous IS-A relations and a candidate list of 48 linked-pairs. A domain expert confirmed 41 out of 52 (78.8%) are valid and identified 26 erroneous IS-A relations out of 48 linked-pairs demonstrating the effectiveness of the approach.

2.
J Biomed Semantics ; 15(1): 6, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38693592

ABSTRACT

Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than  that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the "Clinical Findings" and "Procedure" subhierarchies of SNOMED CT and results belonging to the "Drug, Food, Chemical or Biomedical Material" subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.


Subject(s)
Systematized Nomenclature of Medicine , Terminology as Topic , Vocabulary, Controlled , Logic
3.
BMC Med Inform Decis Mak ; 24(Suppl 3): 103, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38641585

ABSTRACT

BACKGROUND: Alzheimer's Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI); Additionally, the National Institutes of Health (NIH) Common Data Elements (CDE) Repository has been developed to facilitate data sharing and improve the interoperability among data sets in various disease research areas. METHOD: To better understand how AD-related data elements in these resources are interoperable with each other, we leverage different representation models to map data elements from different resources: NACC to ADNI, NACC to NIH CDE, and ADNI to NIH CDE. We explore bag-of-words based and word embeddings based models (Word2Vec and BioWordVec) to perform the data element mappings in these resources. RESULTS: The data dictionaries downloaded on November 23, 2021 contain 1,195 data elements in NACC, 13,918 in ADNI, and 27,213 in NIH CDE Repository. Data element preprocessing reduced the numbers of NACC and ADNI data elements for mapping to 1,099 and 7,584 respectively. Manual evaluation of the mapping results showed that the bag-of-words based approach achieved the best precision, while the BioWordVec based approach attained the best recall. In total, the three approaches mapped 175 out of 1,099 (15.92%) NACC data elements to ADNI; 107 out of 1,099 (9.74%) NACC data elements to NIH CDE; and 171 out of 7,584 (2.25%) ADNI data elements to NIH CDE. CONCLUSIONS: The bag-of-words based and word embeddings based approaches showed promise in mapping AD-related data elements between different resources. Although the mapping approaches need further improvement, our result indicates that there is a critical need to standardize CDEs across these valuable AD research resources in order to maximize the discoveries regarding AD pathophysiology, diagnosis, and treatment that can be gleaned from them.


Subject(s)
Alzheimer Disease , United States/epidemiology , Humans , Alzheimer Disease/diagnostic imaging , Alzheimer Disease/epidemiology , Common Data Elements , Neuroimaging , National Institutes of Health (U.S.)
4.
Yearb Med Inform ; 32(1): 225-229, 2023 Aug.
Article in English | MEDLINE | ID: mdl-38147864

ABSTRACT

OBJECTIVES: To select, present, and summarize the best papers in 2022 for the Knowledge Representation and Management (KRM) section of the International Medical Informatics Association (IMIA) Yearbook. METHODS: We conducted PubMed queries and followed the IMIA Yearbook guidelines for performing biomedical informatics literature review to select the best papers in KRM published in 2022. RESULTS: We retrieved 1,847 publications from PubMed. We nominated 15 candidate best papers, and two of them were finally selected as the best papers in the KRM section. The topics covered by the candidate papers include ontology and knowledge graph creation, ontology applications, ontology quality assurance, ontology mapping standard, and conceptual model. CONCLUSIONS: In the KRM best paper selection for 2022, the candidate best papers encompassed a broad range of topics, with ontology and knowledge graph creation remaining a considerable research focus.


Subject(s)
Medical Informatics , Knowledge Management
5.
BMC Med Inform Decis Mak ; 23(Suppl 1): 151, 2023 08 04.
Article in English | MEDLINE | ID: mdl-37542312

ABSTRACT

BACKGROUND: In the United States, the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI) are two major data sharing resources for Alzheimer's Disease (AD) research. NACC and ADNI strive to make their data more FAIR (findable, interoperable, accessible and reusable) for the broader research community. However, there is limited work harmonizing and supporting cross-cohort interoperability of the two resources. METHOD: In this paper, we leverage an ontology-based approach to harmonize data elements in the two resources and develop a web-based query system to search patient cohorts across the two resources. We first mapped data elements across NACC and ADNI, and performed value harmonization for the mapped data elements with inconsistent permissible values. Then we built an Alzheimer's Disease Data Element Ontology (ADEO) to model the mapped data elements in NACC and ADNI. We further developed a prototype cross-cohort query system to search patient cohorts across NACC and ADNI. RESULTS: After manual review, we found 172 mappings between NACC and ADNI. These 172 mappings were further used to construct common concepts in ADEO. Our data element mapping and harmonization resulted in five files storing common concepts, variables in NACC and ADNI, mappings between variables and common concepts, permissible values of categorical type data elements, and coding inconsistency harmonization, respectively. Our cross-cohort query system consists of three core architectural elements: a web-based interface, an advanced query engine, and a backend MongoDB database. CONCLUSIONS: In this work, ADEO has been specifically designed to facilitate data harmonization and cross-cohort query of NACC and ADNI data resources. Although our prototype cross-cohort query system was developed for exploring NACC and ADNI, its backend and frontend framework has been designed and implemented to be generally applicable to other domains for querying patient cohorts from multiple heterogeneous data sources.


Subject(s)
Alzheimer Disease , Humans , United States , Alzheimer Disease/diagnostic imaging , Neuroimaging
6.
Lab Chip ; 23(17): 3794-3801, 2023 Aug 22.
Article in English | MEDLINE | ID: mdl-37498210

ABSTRACT

As core parts of microfluidic chip analysis systems, micromixers show robust applications in wide fields. However, restricted by the fabrication technology, it remains challenging to achieve high-quality micromixers with both delicately designed structure and efficient mixing. In this study, based on the theory of chaotic mixing, sinusoidal structures with variable phases were designed and then fabricated through scanning probe lithography (SPL) and post-selective etching. It was found that scratches with phase differences can lead to the periodic formation of amorphous silicon (a-Si), which can resist etching. Consequentially, misaligned sine channels with thick-thin alternating 3D shapes can be generated in situ from the scratched traces after the etching. Further analysis showed that a thicker a-Si layer can be obtained by reducing the line spacing in the scratching, confirmed by Raman detections and simulations. With the proposed method, the misaligned sine micromixer was achieved with higher mixing efficiency than ever. The duplicating process was also investigated for high-precision production of micromixers. The study provided strategies for the miniaturization of high-performance microfluidic chips.

7.
Discov Nano ; 18(1): 78, 2023 May 27.
Article in English | MEDLINE | ID: mdl-37382849

ABSTRACT

Metallic micro/nanostructures present a wide range of applications due to the small size and superior performances. In order to obtain high-performance devices, it is of great importance to develop new methods for preparing metallic micro/nanostructures with high quality, low cost, and precise position. It is found that metallic micro/nanostructures can be obtained by scratch-induced directional deposition of metals on silicon surface, where the mask plays a key role in the process. This study is focused on the preparation of keto-aldehyde resin masks and their effects on the formation of scratch-induced gold (Au) micro/nanostructures. It is also found that the keto-aldehyde resin with a certain thickness can act as a satisfactory mask for high-quality Au deposition, and the scratches produced under lower normal load and less scratching cycles are more conducive to the formation of compact Au structures. According to the proposed method, two-dimensional Au structures can be prepared on the designed scratching traces, providing a feasible path for fabricating high-quality metal-based sensors.

8.
Article in English | MEDLINE | ID: mdl-37350887

ABSTRACT

Laterality is an important anatomic directional property indicating the sidedness of body structures, diseases, and procedures. Errors in laterality could have catastrophic consequences in patient care. In this paper, we investigate how different biomedical terminologies organize terms indicating laterality. We leverage the Unified Medical Language System (UMLS) to identify lateral terms in different terminologies. For each lateral term, we attempt to obtain other matched lateral terms and further analyze how they are interrelated. Our results indicated that only 1.68% of the matched lateral term-pairs are hierarchically related. It was also seen that 44.24% of matched-pairs were siblings. We found that in SNOMED CT, bilateral concepts were hierarchically related to both left and right lateral concepts different to most other terminologies. Further investigation revealed that the likely causes for these relations are how the logical definitions of SNOMED CT concepts are arranged.

9.
AMIA Jt Summits Transl Sci Proc ; 2023: 350-359, 2023.
Article in English | MEDLINE | ID: mdl-37350916

ABSTRACT

Self-controlled case series (SCCS) is a statistical method in epidemiological study design that uses individuals as their own controls, with comparisons made within the same individuals at different time points of observation. SCCS has been applied in settings where it is difficult to identify comparison or control groups. To provide computational support for SCCS, we introduce a query engine called Self-Controlled Case Query (SCCQ) and use it to extract cohorts of self-controlled case series from a large-scale COVID-19 Electronic Health Records (EHR) dataset. Visual summary of the queried population through the R-Shiny visualization framework offers SCCQ's query result dashboard to the researcher. SCCQ allows the export of query-generated raw data files with a portable format that researchers can extend to create more intricate and robust visualization capabilities without needing a high-level of technical or statistical background. Our validation and evaluation experiments uncovered COVID-19 outcomes to be consistent with existing research findings. With SCCQ, cohort exploration, data extraction, and information visualization can be provided for structured EHR data to lower the barrier for clinical and epidemiological research.

10.
AMIA Jt Summits Transl Sci Proc ; 2023: 515-524, 2023.
Article in English | MEDLINE | ID: mdl-37350927

ABSTRACT

Early onset of seizure is a potential risk factor for Sudden Unexpected Death in Epilepsy (SUDEP). However, the first seizure onset information is often documented as clinical narratives in epilepsy monitoring unit (EMU) discharge summaries. Manually extracting first seizure onset time from discharge summaries is time consuming and labor-intensive. In this work, we developed a rule-based natural language processing pipeline for automatically extracting the temporal information of patients' first seizure onset from EMU discharge summaries. We use the Epilepsy and Seizure Ontology (EpSO) as the core knowledge resource and construct 4 extraction rules based on 300 randomly selected EMU discharge summaries. To evaluate the effectiveness of the extraction pipeline, we apply the constructed rules on another 200 unseen discharge summaries and compare the results against the manual evaluation of a domain expert. Overall, our extraction pipeline achieved a precision of 0.75, recall of 0.651, and F1-score of 0.697. This is an encouraging initial result which will allow us to gain insights into potentially better-performing approaches.

11.
ACS Nano ; 17(10): 9255-9261, 2023 May 23.
Article in English | MEDLINE | ID: mdl-37171168

ABSTRACT

Nanowires (NWs) provide opportunities for building high-performance sensors and devices at micro-/nanoscales. Directional movement and assembly of NWs have attracted extensive attention; however, controllable manipulation remains challenging partly due to the lack of understanding on interfacial interactions between NWs and substrates (or contacting probes). In the present study, lateral bending of Ag NWs was investigated under various bending angles and pushing velocities, and the mechanical performance corresponding to microstructures was clarified based on high-resolution transmission electron microscope (HRTRM) detections. The bending-angle-dependent fractures of Ag NWs were detected by an atomic force microscope (AFM) and a scanning electron microscope (SEM), and the fractures occurred when the bending angle was larger than 80°. Compared with an Ag substrate, Ag NWs exhibited a lower system stiffness according to the nanoindentation with an AFM probe. HRTRM observations indicated that there were grain boundaries inside Ag NWs, which would be contributors to the generation of fractures and cracks on Ag NWs during lateral bending and nanoindentation. This study provides a guide to controllably manipulate NWs and fabricate high-performance micro-/nanodevices.

12.
BMC Med Inform Decis Mak ; 23(Suppl 1): 87, 2023 05 09.
Article in English | MEDLINE | ID: mdl-37161566

ABSTRACT

BACKGROUND: Biomedical ontologies are representations of biomedical knowledge that provide terms with precisely defined meanings. They play a vital role in facilitating biomedical research in a cross-disciplinary manner. Quality issues of biomedical ontologies will hinder their effective usage. One such quality issue is missing concepts. In this study, we introduce a logical definition-based approach to identify potential missing concepts in SNOMED CT. A unique contribution of our approach is that it is capable of obtaining both logical definitions and fully specified names for potential missing concepts. METHOD: The logical definitions of unrelated pairs of fully defined concepts in non-lattice subgraphs that indicate quality issues are intersected to generate the logical definitions of potential missing concepts. A text summarization model (called PEGASUS) is fine-tuned to predict the fully specified names of the potential missing concepts from their generated logical definitions. Furthermore, the identified potential missing concepts are validated using external resources including the Unified Medical Language System (UMLS), biomedical literature in PubMed, and a newer version of SNOMED CT. RESULTS: From the March 2021 US Edition of SNOMED CT, we obtained a total of 30,313 unique logical definitions for potential missing concepts through the intersecting process. We fine-tuned a PEGASUS summarization model with 289,169 training instances and tested it on 36,146 instances. The model achieved 72.83 of ROUGE-1, 51.06 of ROUGE-2, and 71.76 of ROUGE-L on the test dataset. The model correctly predicted 11,549 out of 36,146 fully specified names in the test dataset. Applying the fine-tuned model on the 30,313 unique logical definitions, 23,031 total potential missing concepts were identified. Out of these, a total of 2,312 (10.04%) were automatically validated by either of the three resources. CONCLUSIONS: The results showed that our logical definition-based approach for identification of potential missing concepts in SNOMED CT is encouraging. Nevertheless, there is still room for improving the performance of naming concepts based on logical definitions.


Subject(s)
Biological Ontologies , Biomedical Research , Humans , Systematized Nomenclature of Medicine , Knowledge , Language
13.
J Alzheimers Dis ; 92(4): 1323-1339, 2023.
Article in English | MEDLINE | ID: mdl-36872776

ABSTRACT

BACKGROUND: Accurately identifying cognitive changes in Mexican American (MA) adults using the Mini-Mental State Examination (MMSE) requires knowledge of population-based norms for the MMSE, a scale which has widespread use in research settings. OBJECTIVE: To describe the distribution of MMSE scores in a large cohort of MA adults, assess the impact of MMSE requirements on their clinical trial eligibility, and explore which factors are most strongly associated with their MMSE scores. METHODS: Visits between 2004-2021 in the Cameron County Hispanic Cohort were analyzed. Eligible participants were ≥18 years old and of Mexican descent. MMSE distributions before and after stratification by age and years of education (YOE) were assessed, as was the proportion of trial-aged (50-85- year-old) participants with MMSE <24, a minimum MMSE cutoff most frequently used in Alzheimer's disease (AD) clinical trials. As a secondary analysis, random forest models were constructed to estimate the relative association of the MMSE with potentially relevant variables. RESULTS: The mean age of the sample set (n = 3,404) was 44.4 (SD, 16.0) years old and 64.5% female. Median MMSE was 28 (IQR, 28-29). The percentage of trial-aged participants (n = 1,267) with MMSE <24 was 18.6% overall and 54.3% among the subset with 0-4 YOE (n = 230). The five variables most associated with the MMSE in the study sample were education, age, exercise, C-reactive protein, and anxiety. CONCLUSION: The minimum MMSE cutoffs in most phase III prodromal-to-mild AD trials would exclude a significant proportion of trial-aged participants in this MA cohort, including over half of those with 0-4 YOE.


Subject(s)
Alzheimer Disease , Mental Status and Dementia Tests , Mexican Americans , Aged , Aged, 80 and over , Female , Humans , Male , Alzheimer Disease/diagnosis , Alzheimer Disease/psychology , Educational Status , Mexican Americans/psychology , Texas , Reference Values , Adult , Middle Aged
14.
J Am Med Inform Assoc ; 30(3): 475-484, 2023 02 16.
Article in English | MEDLINE | ID: mdl-36539234

ABSTRACT

OBJECTIVE: SNOMED CT is the largest clinical terminology worldwide. Quality assurance of SNOMED CT is of utmost importance to ensure that it provides accurate domain knowledge to various SNOMED CT-based applications. In this work, we introduce a deep learning-based approach to uncover missing is-a relations in SNOMED CT. MATERIALS AND METHODS: Our focus is to identify missing is-a relations between concept-pairs exhibiting a containment pattern (ie, the set of words of one concept being a proper subset of that of the other concept). We use hierarchically related containment concept-pairs as positive instances and hierarchically unrelated containment concept-pairs as negative instances to train a model predicting whether an is-a relation exists between 2 concepts with containment pattern. The model is a binary classifier leveraging concept name features, hierarchical features, enriched lexical attribute features, and logical definition features. We introduce a cross-validation inspired approach to identify missing is-a relations among all hierarchically unrelated containment concept-pairs. RESULTS: We trained and applied our model on the Clinical finding subhierarchy of SNOMED CT (September 2019 US edition). Our model (based on the validation sets) achieved a precision of 0.8164, recall of 0.8397, and F1 score of 0.8279. Applying the model to predict actual missing is-a relations, we obtained a total of 1661 potential candidates. Domain experts performed evaluation on randomly selected 230 samples and verified that 192 (83.48%) are valid. CONCLUSIONS: The results showed that our deep learning approach is effective in uncovering missing is-a relations between containment concept-pairs in SNOMED CT.


Subject(s)
Deep Learning , Systematized Nomenclature of Medicine
15.
AMIA Annu Symp Proc ; 2023: 977-986, 2023.
Article in English | MEDLINE | ID: mdl-38222357

ABSTRACT

The Unified Medical Language System (UMLS), a large repository of biomedical vocabularies, has been used for supporting various biomedical applications. Ensuring the quality of the UMLS is critical to maintain both the accuracy of its content and the reliability of downstream applications. In this work, we present a Graph Convolutional Network (GCN)-based approach to identify misaligned synonymous terms organized under different UMLS concepts. We used synonymous terms grouped under the same concept as positive samples and top lexically similar terms as negative samples to train the GCN model. We applied the model to a test set and suggested those negative samples predicted to be synonymous as potentially misaligned synonymous terms. A total of 147,625 suggestions were made. A human expert evaluated 100 randomly selected suggestions and agreed with 60 of them. The results indicate that our GCN-based approach shows promise to help improve the synonymy grouping in the UMLS.


Subject(s)
Unified Medical Language System , Humans , Reproducibility of Results
16.
Proc Int World Wide Web Conf ; 2023(Companion): 820-825, 2023 Apr.
Article in English | MEDLINE | ID: mdl-38327770

ABSTRACT

Model card reports provide a transparent description of machine learning models which includes information about their evaluation, limitations, intended use, etc. Federal health agencies have expressed an interest in model cards report for research studies using machine-learning based AI. Previously, we have developed an ontology model for model card reports to structure and formalize these reports. In this paper, we demonstrate a Java-based library (OWL API, FaCT++) that leverages our ontology to publish computable model card reports. We discuss future directions and other use cases that highlight applicability and feasibility of ontology-driven systems to support FAIR challenges.

17.
Yearb Med Inform ; 31(1): 236-240, 2022 Aug.
Article in English | MEDLINE | ID: mdl-36463882

ABSTRACT

OBJECTIVES: To select, present, and summarize the best papers in the field of Knowledge Representation and Management (KRM) published in 2021. METHODS: Following the International Medical Informatics Association (IMIA) Yearbook guidelines, a comprehensive and standardized review of the biomedical informatics literature was performed to select the best KRM papers published in 2021, based on PubMed queries. RESULTS: A total of 1,231 publications were retrieved from PubMed. We nominated 15 candidate best papers, and four of them were finally selected as the best papers in the KRM section. The topics covered by these papers include knowledge graph, ontology development, ontology alignment, and the International Classification of Diseases. CONCLUSION: In the KRM best paper selection for 2021, the candidate best papers covered a wider spectrum of topics compared to the last year's significant focus on ontology curation. In particular, ontology development for specific domains (e.g., Alzheimer's disease, infectious diseases, bioethics) has received the most attention.


Subject(s)
Bioethics , Medical Informatics , International Classification of Diseases , Knowledge Management
18.
Front Big Data ; 5: 965715, 2022.
Article in English | MEDLINE | ID: mdl-36059922

ABSTRACT

Epilepsy affects ~2-3 million individuals in the United States, a third of whom have uncontrolled seizures. Sudden unexpected death in epilepsy (SUDEP) is a catastrophic and fatal complication of poorly controlled epilepsy and is the primary cause of mortality in such patients. Despite its huge public health impact, with a ~1/1,000 incidence rate in persons with epilepsy, it is an uncommon enough phenomenon to require multi-center efforts for well-powered studies. We developed the Multimodal SUDEP Data Resource (MSDR), a comprehensive system for sharing multimodal epilepsy data in the NIH funded Center for SUDEP Research. The MSDR aims at accelerating research to address critical questions about personalized risk assessment of SUDEP. We used a metadata-guided approach, with a set of common epilepsy-specific terms enforcing uniform semantic interpretation of data elements across three main components: (1) multi-site annotated datasets; (2) user interfaces for capturing, managing, and accessing data; and (3) computational approaches for the analysis of multimodal clinical data. We incorporated the process for managing dataset-specific data use agreements, evidence of Institutional Review Board review, and the corresponding access control in the MSDR web portal. The metadata-guided approach facilitates structural and semantic interoperability, ultimately leading to enhanced data reusability and scientific rigor. MSDR prospectively integrated and curated epilepsy patient data from seven institutions, and it currently contains data on 2,739 subjects and 10,685 multimodal clinical data files with different data formats. In total, 55 users registered in the current MSDR data repository, and 6 projects have been funded to apply MSDR in epilepsy research, including three R01 projects and three R21 projects.

19.
J Biomed Inform ; 134: 104162, 2022 10.
Article in English | MEDLINE | ID: mdl-36029954

ABSTRACT

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) provides a unified model to integrate disparate real-world data (RWD) sources. An integral part of the OMOP CDM is the Standardized Vocabularies (henceforth referred to as the OMOP vocabulary), which enables organization and standardization of medical concepts across various clinical domains of the OMOP CDM. For concepts with the same meaning from different source vocabularies, one is designated as the standard concept, while the others are specified as non-standard or source concepts and mapped to the standard one. However, due to the heterogeneity of source vocabularies, there may exist mapping issues such as erroneous mappings and missing mappings in the OMOP vocabulary, which could affect the results of downstream analyses with RWD. In this paper, we focus on quality assurance of vaccine concept mappings in the OMOP vocabulary, which is necessary to accurately harness the power of RWD on vaccines. We introduce a semi-automated lexical approach to audit vaccine mappings in the OMOP vocabulary. We generated two types of vaccine-pairs: mapped and unmapped, where mapped vaccine-pairs are pairs of vaccine concepts with a "Maps to" relationship, while unmapped vaccine-pairs are those without a "Maps to" relationship. We represented each vaccine concept name as a set of words, and derived term-difference pairs (i.e., name differences) for mapped and unmapped vaccine-pairs. If the same term-difference pair can be obtained by both mapped and unmapped vaccine-pairs, then this is considered as a potential mapping inconsistency. Applying this approach to the vaccine mappings in OMOP, a total of 2087 potentially mapping inconsistencies were obtained. A randomly selected 200 samples were evaluated by domain experts to identify, validate, and categorize the inconsistencies. Experts identified 95 cases revealing valid mapping issues. The remaining 105 cases were found to be invalid due to the external and/or contextual information used in the mappings that were not reflected in the concept names of vaccines. This indicates that our semi-automated approach shows promise in identifying mapping inconsistencies among vaccine concepts in the OMOP vocabulary.


Subject(s)
Vaccines , Vocabulary , Quality Improvement , Vocabulary, Controlled
20.
J Biomed Semantics ; 13(1): 22, 2022 08 13.
Article in English | MEDLINE | ID: mdl-35964149

ABSTRACT

BACKGROUND: The Vaccine Ontology (VO) is a biomedical ontology that standardizes vaccine annotation. Errors in VO will affect a multitude of applications that it is being used in. Quality assurance of VO is imperative to ensure that it provides accurate domain knowledge to these downstream tasks. Manual review to identify and fix quality issues (such as missing hierarchical is-a relations) is challenging given the complexity of the ontology. Automated approaches are highly desirable to facilitate the quality assurance of VO. METHODS: We developed an automated lexical approach that identifies potentially missing is-a relations in VO. First, we construct two types of VO concept-pairs: (1) linked; and (2) unlinked. Each concept-pair further derives an Acquired Term Pair (ATP) based on their lexical features. If the same ATP is obtained by a linked concept-pair and an unlinked concept-pair, this is considered to indicate a potentially missing is-a relation between the unlinked pair of concepts. RESULTS: Applying this approach on the 1.1.192 version of VO, we were able to identify 232 potentially missing is-a relations. A manual review by a VO domain expert on a random sample of 70 potentially missing is-a relations revealed that 65 of the cases were valid missing is-a relations in VO (a precision of 92.86%). CONCLUSIONS: The results indicate that our approach is highly effective in identifying missing is-a relation in VO.


Subject(s)
Biological Ontologies , Vaccines , Adenosine Triphosphate
SELECTION OF CITATIONS
SEARCH DETAIL
...