Search | VHL Regional Portal

1.

Mapping of Alzheimer's disease related data elements and the NIH Common Data Elements.

Hao, Xubing; Abeysinghe, Rashmie; Zheng, Fengbo; Schulz, Paul E; Cui, Licong.

BMC Med Inform Decis Mak ; 24(Suppl 3): 103, 2024 Apr 19.

Article in English | MEDLINE | ID: mdl-38641585

ABSTRACT

BACKGROUND: Alzheimer's Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI); Additionally, the National Institutes of Health (NIH) Common Data Elements (CDE) Repository has been developed to facilitate data sharing and improve the interoperability among data sets in various disease research areas. METHOD: To better understand how AD-related data elements in these resources are interoperable with each other, we leverage different representation models to map data elements from different resources: NACC to ADNI, NACC to NIH CDE, and ADNI to NIH CDE. We explore bag-of-words based and word embeddings based models (Word2Vec and BioWordVec) to perform the data element mappings in these resources. RESULTS: The data dictionaries downloaded on November 23, 2021 contain 1,195 data elements in NACC, 13,918 in ADNI, and 27,213 in NIH CDE Repository. Data element preprocessing reduced the numbers of NACC and ADNI data elements for mapping to 1,099 and 7,584 respectively. Manual evaluation of the mapping results showed that the bag-of-words based approach achieved the best precision, while the BioWordVec based approach attained the best recall. In total, the three approaches mapped 175 out of 1,099 (15.92%) NACC data elements to ADNI; 107 out of 1,099 (9.74%) NACC data elements to NIH CDE; and 171 out of 7,584 (2.25%) ADNI data elements to NIH CDE. CONCLUSIONS: The bag-of-words based and word embeddings based approaches showed promise in mapping AD-related data elements between different resources. Although the mapping approaches need further improvement, our result indicates that there is a critical need to standardize CDEs across these valuable AD research resources in order to maximize the discoveries regarding AD pathophysiology, diagnosis, and treatment that can be gleaned from them.

Subject(s)

Alzheimer Disease , United States/epidemiology , Humans , Alzheimer Disease/diagnostic imaging , Alzheimer Disease/epidemiology , Common Data Elements , Neuroimaging , National Institutes of Health (U.S.)

2.

An ontology-based approach for harmonization and cross-cohort query of Alzheimer's disease data resources.

Hao, Xubing; Li, Xiaojin; Zhang, Guo-Qiang; Tao, Cui; Schulz, Paul E; Cui, Licong.

BMC Med Inform Decis Mak ; 23(Suppl 1): 151, 2023 08 04.

Article in English | MEDLINE | ID: mdl-37542312

ABSTRACT

BACKGROUND: In the United States, the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI) are two major data sharing resources for Alzheimer's Disease (AD) research. NACC and ADNI strive to make their data more FAIR (findable, interoperable, accessible and reusable) for the broader research community. However, there is limited work harmonizing and supporting cross-cohort interoperability of the two resources. METHOD: In this paper, we leverage an ontology-based approach to harmonize data elements in the two resources and develop a web-based query system to search patient cohorts across the two resources. We first mapped data elements across NACC and ADNI, and performed value harmonization for the mapped data elements with inconsistent permissible values. Then we built an Alzheimer's Disease Data Element Ontology (ADEO) to model the mapped data elements in NACC and ADNI. We further developed a prototype cross-cohort query system to search patient cohorts across NACC and ADNI. RESULTS: After manual review, we found 172 mappings between NACC and ADNI. These 172 mappings were further used to construct common concepts in ADEO. Our data element mapping and harmonization resulted in five files storing common concepts, variables in NACC and ADNI, mappings between variables and common concepts, permissible values of categorical type data elements, and coding inconsistency harmonization, respectively. Our cross-cohort query system consists of three core architectural elements: a web-based interface, an advanced query engine, and a backend MongoDB database. CONCLUSIONS: In this work, ADEO has been specifically designed to facilitate data harmonization and cross-cohort query of NACC and ADNI data resources. Although our prototype cross-cohort query system was developed for exploring NACC and ADNI, its backend and frontend framework has been designed and implemented to be generally applicable to other domains for querying patient cohorts from multiple heterogeneous data sources.

Subject(s)

Alzheimer Disease , Humans , United States , Alzheimer Disease/diagnostic imaging , Neuroimaging

3.

A UMLS-based Investigation of Laterality in Biomedical Terminologies.

Abeysinghe, Rashmie; Hao, Xubing; Cui, Licong; Zhang, Guo-Qiang.

AMIA Jt Summits Transl Sci Proc ; 2023: 16-24, 2023.

Article in English | MEDLINE | ID: mdl-37350887

ABSTRACT

Laterality is an important anatomic directional property indicating the sidedness of body structures, diseases, and procedures. Errors in laterality could have catastrophic consequences in patient care. In this paper, we investigate how different biomedical terminologies organize terms indicating laterality. We leverage the Unified Medical Language System (UMLS) to identify lateral terms in different terminologies. For each lateral term, we attempt to obtain other matched lateral terms and further analyze how they are interrelated. Our results indicated that only 1.68% of the matched lateral term-pairs are hierarchically related. It was also seen that 44.24% of matched-pairs were siblings. We found that in SNOMED CT, bilateral concepts were hierarchically related to both left and right lateral concepts different to most other terminologies. Further investigation revealed that the likely causes for these relations are how the logical definitions of SNOMED CT concepts are arranged.

4.

Logical definition-based identification of potential missing concepts in SNOMED CT.

Hao, Xubing; Abeysinghe, Rashmie; Roberts, Kirk; Cui, Licong.

BMC Med Inform Decis Mak ; 23(Suppl 1): 87, 2023 05 09.

Article in English | MEDLINE | ID: mdl-37161566

ABSTRACT

BACKGROUND: Biomedical ontologies are representations of biomedical knowledge that provide terms with precisely defined meanings. They play a vital role in facilitating biomedical research in a cross-disciplinary manner. Quality issues of biomedical ontologies will hinder their effective usage. One such quality issue is missing concepts. In this study, we introduce a logical definition-based approach to identify potential missing concepts in SNOMED CT. A unique contribution of our approach is that it is capable of obtaining both logical definitions and fully specified names for potential missing concepts. METHOD: The logical definitions of unrelated pairs of fully defined concepts in non-lattice subgraphs that indicate quality issues are intersected to generate the logical definitions of potential missing concepts. A text summarization model (called PEGASUS) is fine-tuned to predict the fully specified names of the potential missing concepts from their generated logical definitions. Furthermore, the identified potential missing concepts are validated using external resources including the Unified Medical Language System (UMLS), biomedical literature in PubMed, and a newer version of SNOMED CT. RESULTS: From the March 2021 US Edition of SNOMED CT, we obtained a total of 30,313 unique logical definitions for potential missing concepts through the intersecting process. We fine-tuned a PEGASUS summarization model with 289,169 training instances and tested it on 36,146 instances. The model achieved 72.83 of ROUGE-1, 51.06 of ROUGE-2, and 71.76 of ROUGE-L on the test dataset. The model correctly predicted 11,549 out of 36,146 fully specified names in the test dataset. Applying the fine-tuned model on the 30,313 unique logical definitions, 23,031 total potential missing concepts were identified. Out of these, a total of 2,312 (10.04%) were automatically validated by either of the three resources. CONCLUSIONS: The results showed that our logical definition-based approach for identification of potential missing concepts in SNOMED CT is encouraging. Nevertheless, there is still room for improving the performance of naming concepts based on logical definitions.

Subject(s)

Biological Ontologies , Biomedical Research , Humans , Systematized Nomenclature of Medicine , Knowledge , Language

5.

A GCN-based approach to uncover misaligned synonymous terms in the UMLS Metathesaurus.

Hao, Xubing; Abeysinghe, Rashmie; Shi, Jay; Cui, Licong.

AMIA Annu Symp Proc ; 2023: 977-986, 2023.

Article in English | MEDLINE | ID: mdl-38222357

ABSTRACT

The Unified Medical Language System (UMLS), a large repository of biomedical vocabularies, has been used for supporting various biomedical applications. Ensuring the quality of the UMLS is critical to maintain both the accuracy of its content and the reliability of downstream applications. In this work, we present a Graph Convolutional Network (GCN)-based approach to identify misaligned synonymous terms organized under different UMLS concepts. We used synonymous terms grouped under the same concept as positive samples and top lexically similar terms as negative samples to train the GCN model. We applied the model to a test set and suggested those negative samples predicted to be synonymous as potentially misaligned synonymous terms. A total of 147,625 suggestions were made. A human expert evaluated 100 randomly selected suggestions and agreed with 60 of them. The results indicate that our GCN-based approach shows promise to help improve the synonymy grouping in the UMLS.

Subject(s)

Unified Medical Language System , Humans , Reproducibility of Results

6.

Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology.

Mohtashamian, Maryamsadat; Hu, Ran; Abeysinghe, Rashmie; Hao, Xubing; Xu, Hua; Cui, Licong.

AMIA Annu Symp Proc ; 2022: 785-794, 2022.

Article in English | MEDLINE | ID: mdl-37128366

ABSTRACT

Auditing the Human Phenotype Ontology (HPO) is necessary to provide accurate terminology for its use in clinical research. We investigate an approach leveraging the lexical features of concepts in HPO to identify missing IS-A relations among HPO concepts. We first model the names of HPO concepts as sets of words in lower case. Then, we generate two types of concept-pairs which have at least a single common word: (1) Linked concept-pairs generated from concept-pairs having an IS-A relation; (2) Unlinked concept-pairs generated from concept-pairs without an IS- A relation. Concept-pairs generate Derived Term Pairs (DTPs) emphasizing unique lexical information of each concept. If a linked concept-pair and an unlinked concept-pair generate the same DTP, then we suggest a potential missing IS-A relation among the unlinked concept-pair. Applying our approach to the 2022-02-14 release of HPO, we uncovered 2,516 potential missing IS-A relations in HPO. We validated 59 missing IS-A relations leveraging the Unified Medical Language System (UMLS) by mapping the concept-pair to UMLS concepts and verifying whether UMLS records an IS-A relation between the pair of concepts.

Subject(s)

Unified Medical Language System , Humans , Phenotype

7.

A substring replacement approach for identifying missing IS-A relations in SNOMED CT.

Hao, Xubing; Abeysinghe, Rashmie; Shi, Jay; Cui, Licong.

Proceedings (IEEE Int Conf Bioinformatics Biomed) ; 2022: 2611-2618, 2022 Dec.

Article in English | MEDLINE | ID: mdl-36776766

ABSTRACT

Biomedical ontologies provide formalized information and knowledge in the biomedical domain. Over the years, biomedical ontologies have played an important role in facilitating biomedical research and applications. Common quality issues of biomedical ontologies include inconsistent naming of concepts, redundant concepts, redundant relations, incomplete/incorrect concept definitions, and incomplete/incorrect class hierarchies. In this work, we focus on addressing the incompleteness of the class hierarchy in SNOMED CT. We develop a substring replacement approach, leveraging concepts' lexical features and existing IS-A relations to identify potential missing IS-A relations in SNOMED CT. To evaluate the effectiveness of our approach, we performed both automated and manual validation. For the automated evaluation, we leverage relations from external terminologies in the Unified Medical Language System (UMLS) to validate the identified missing IS-A relations. For the manual validation, a randomly selected 100 samples from the results are reviewed by a domain expert. Applying our approach to the March 2022 release of SNOMED CT US Edition, we identified 3,228 potential missing IS-A relations, among which 63 were validated through the UMLS. The evaluation by the domain expert revealed that 89 out of 100 (a precision of 89%) missing IS-A relations are valid cases, showing the effectiveness of this substring replacement approach to facilitate the quality assurance of IS-A relations in SNOMED CT.

8.

Identifying Missing IS-A Relations in Orphanet Rare Disease Ontology.

Mohtashamian, Maryamsadat; Abeysinghe, Rashmie; Hao, Xubing; Cui, Licong.

Proceedings (IEEE Int Conf Bioinformatics Biomed) ; 2022: 3274-3279, 2022 Dec.

Article in English | MEDLINE | ID: mdl-36776767

ABSTRACT

The Orphanet Rare Disease Ontology (ORDO) provides a structured vocabulary encapsulating rare diseases. Downstream applications of ORDO depend on its accuracy to effectively perform their tasks. In this paper, we implement an automated quality assurance pipeline to identify missing is-a relations in ORDO. We first obtain lexical features from concept names. Then we generate related and unrelated feature sharing concept-pairs, where a feature sharing concept-pair can further generate derived term-pairs. If an unrelated and related feature sharing concept-pair generate the same derived term-pair, then we suggest a potential missing is-a relation between the unrelated feature sharing concept-pair. Applying this approach on the 2022-06-27 release of ORDO, we obtained 705 potential missing is-a relations. Leveraging external ontological information in the Unified Medical Language System, we validated 164 missing is-a relations. This indicates that our approach is a promising way to audit is-a relations in ORDO, even though further domain expert evaluation is still needed to validate the remaining potential missing is-a relations identified.

9.

Leveraging non-lattice subgraphs for suggestion of new concepts for SNOMED CT.

Hao, Xubing; Abeysinghe, Rashmie; Zheng, Fengbo; Cui, Licong.

Proceedings (IEEE Int Conf Bioinformatics Biomed) ; 2021: 1805-1812, 2021 Dec.

Article in English | MEDLINE | ID: mdl-35291311

ABSTRACT

Missing hierarchical is-a relations and missing concepts are common quality issues in biomedical ontologies. Non-lattice subgraphs have been extensively studied for automatically identifying missing is-a relations in biomedical ontologies like SNOMED CT. However, little is known about non-lattice subgraphs' capability to uncover new or missing concepts in biomedical ontologies. In this work, we investigate a lexical-based intersection approach based on non-lattice subgraphs to identify potential missing concepts in SNOMED CT. We first construct lexical features of concepts using their fully specified names. Then we generate hierarchically unrelated concept pairs in non-lattice subgraphs as the candidates to derive new concepts. For each candidate pair of concepts, we conduct an order-preserving intersection based on the two concepts' lexical features, with the intersection result serving as the potential new concept name suggested. We further perform automatic validation through terminologies in the Unified Medical Language System (UMLS) and literature in PubMed. Applying this approach to the March 2021 release of SNOMED CT US Edition, we obtained 7,702 potential missing concepts, among which 1,288 were validated through UMLS and 1,309 were validated through PubMed. The results showed that non-lattice subgraphs have the potential to facilitate suggestion of new concepts for SNOMED CT.

10.

Detection and Comparative Analysis of Methylomic Biomarkers of Rheumatoid Arthritis.

Feng, Xin; Hao, Xubing; Shi, Ruoyao; Xia, Zhiqiang; Huang, Lan; Yu, Qiong; Zhou, Fengfeng.

Front Genet ; 11: 238, 2020.

Article in English | MEDLINE | ID: mdl-32292416

ABSTRACT

Rheumatoid arthritis (RA) is a common autoimmune disorder influenced by both genetic and environmental factors. To investigate possible contributions of DNA methylation to the etiology of RA with minimum confounding genetic heterogeneity, we investigated genome-wide DNA methylation in disease-discordant monozygotic twin pairs. This study hypothesized that methylomic biomarkers might facilitate accurate RA detection. A comprehensive series of biomarker detection algorithms were utilized to find the best methylomic biomarkers for detecting RA patients using the methylomic data of the peripheral blood samples. The best model achieved 100.00% in accuracy (Acc) with 81 methylomic biomarkers and a 10-fold cross-validation (10FCV) strategy. Some of the methylomic biomarkers were experimentally confirmed to be associated with the onset or development of RA. It is also interesting to observe that many of the detected biomarkers were from chromosome Y, supporting the knowledge that RA has a significant gender discrepancy.

11.

Detecting Methylomic Biomarkers of Pediatric Autism in the Peripheral Blood Leukocytes.

Feng, Xin; Hao, Xubing; Xin, Ruihao; Gao, Xiaoqian; Liu, Minge; Li, Fei; Wang, Yubo; Shi, Ruoyao; Zhao, Shishun; Zhou, Fengfeng.

Interdiscip Sci ; 11(2): 237-246, 2019 Jun.

Article in English | MEDLINE | ID: mdl-30993567

ABSTRACT

Autism was a spectrum of multiple complex diseases that required an interdisciplinary group of experts to make a diagnostic decision. Both genetic and environmental factors play essential roles in causing the onset of Autism. Therefore, this study hypothesized that methylomic biomarkers may facilitate the accurate Autism detection. A comprehensive series of biomarker detection algorithms were utilized to find the best methylomic biomarkers for the Autism detection using the methylomic data of the peripheral blood samples. The best model achieved 99.70% in accuracy with 678 methylomic biomarkers and a tenfold cross validation strategy. Some of the methylomic biomarkers were experimentally confirmed to be associated with the onset or development of Autism.

Subject(s)

Autistic Disorder/blood , Autistic Disorder/genetics , Biomarkers/blood , DNA Methylation/genetics , Leukocytes/metabolism , Algorithms , Child , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL