Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 78
Filter
1.
Arab J Gastroenterol ; 24(2): 104-108, 2023 May.
Article in English | MEDLINE | ID: mdl-36725375

ABSTRACT

BACKGROUND AND STUDY AIMS: The introduction of direct-acting antiviral (DAA) drugs has dramatically improved chronic hepatitis C (CHC) treatment. The pangenotype DAA therapy glecaprevir/pibrentasvir (G/P) was recently recommended for treating CHC in Korea. Unfortunately, given its recent introduction, little real-world data from a Korean population exists. We examined the effectiveness and safety of G/P treatment in Koreans with CHC. PATIENTS AND METHODS: We analyzed CHC patients at Samsung Changwon Hospital from June 2018 to December 2020. Sustained virologic response at 12 weeks posttreatment (SVR 12) was evaluated after treatment, and the associated factors were analyzed. Furthermore, the degree of liver fibrosis before and after treatment was compared to determine whether liver fibrosis improved. RESULTS: In total, 102 patients were enrolled; 35.3 % had compensated liver cirrhosis (LC), and 11.8 % had received previous treatment. Of the 102 patients, 99 (97.1 %) reached SVR 12. Of the 81 patients who completed 8 weeks of G/P treatment, 80 (98.8 %) reached SVR 12, while 19 of the 21 (90.5 %) patients in the 12- or 16-week group reached SVR 12, with no significant difference between the two groups (P = 0.107). As a secondary endpoint, liver fibrosis before and after treatment was also compared. The Fibrosis-4 index (FIB-4) (3.3 vs 2.8, P = 0.010), aspartate transaminase (AST)-platelet ratio index (APRI) (1.3 vs 1.0, P < 0.001), and liver stiffness measurements (LSM) (9.5 vs 4.6, P < 0.001) were significantly different after G/P treatment. CONCLUSIONS: Regardless of genotype, G/P treatment for Koreans with CHC is safe, highly effective, and can improve liver fibrosis.


Subject(s)
Antiviral Agents , Hepatitis C, Chronic , Humans , Antiviral Agents/therapeutic use , East Asian People , Genotype , Hepatitis C, Chronic/complications , Hepatitis C, Chronic/drug therapy , Liver Cirrhosis , Retrospective Studies
2.
Korean J Gastroenterol ; 81(1): 17-28, 2023 01 25.
Article in Korean | MEDLINE | ID: mdl-36695063

ABSTRACT

Acute liver failure (ALF) is a rare disease condition with a dynamic clinical course and catastrophic outcomes. Several etiologies are involved in ALF. Hepatitis A and B infections and indiscriminate use of untested herbs or supplemental agents are the most common causes of ALF in Korea. Noninvasive neurological monitoring tools have been used in patients with ALF in recent times. Ongoing improvements in intensive care, including continuous renal replacement therapy, therapeutic plasma exchange, vasopressor, and extracorporeal membrane oxygenation, have reduced the mortality rate of patients with ALF. However, liver transplantation is still the most effective treatment for patients with intractable ALF. There is a need for further research in the areas of better prognostication and precise selection of patients for emergency transplantation.


Subject(s)
Chemical and Drug Induced Liver Injury , Hepatitis A , Liver Failure, Acute , Liver Transplantation , Humans , Liver Failure, Acute/diagnosis , Liver Failure, Acute/etiology , Liver Failure, Acute/therapy , Treatment Outcome , Liver Transplantation/adverse effects , Hepatitis A/complications , Chemical and Drug Induced Liver Injury/complications
3.
Pac Symp Biocomput ; 28: 371-382, 2023.
Article in English | MEDLINE | ID: mdl-36540992

ABSTRACT

Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.


Subject(s)
Pre-Eclampsia , Pregnancy , Female , Humans , Pre-Eclampsia/genetics , Computational Biology/methods , Placenta , Fetus
4.
J Biomed Inform ; 126: 103973, 2022 02.
Article in English | MEDLINE | ID: mdl-34995810

ABSTRACT

MOTIVATION: Node embedding of biological entity network has been widely investigated for the downstream application scenarios. To embed full semantics of gene and disease, a multi-relational heterogeneous graph is considered in a scenario where uni-relation between gene/disease and other heterogeneous entities are abundant while multi-relation between gene and disease is relatively sparse. After introducing this novel graph format, it is illuminative to design a specific data integration algorithm to fully capture the graph information and bring embeddings with high quality. RESULTS: First, a typical multi-relational triple dataset was introduced, which carried significant association between gene and disease. Second, we curated all human genes and diseases in seven mainstream datasets and constructed a large-scale gene-disease network, which compromising 163,024 nodes and 25,265,607 edges, and relates to 27,165 genes, 2,665 diseases, 15,067 chemicals, 108,023 mutations, 2,363 pathways, and 7.732 phenotypes. Third, we proposed a Joint Decomposition of Heterogeneous Matrix and Tensor (JDHMT) model, which integrated all heterogeneous data resources and obtained embedding for each gene or disease. Forth, a visualized intrinsic evaluation was performed, which investigated the embeddings in terms of interpretable data clustering. Furthermore, an extrinsic evaluation was performed in the form of linking prediction. Both intrinsic and extrinsic evaluation results showed that JDHMT model outperformed other eleven state-of-the-art (SOTA) methods which are under relation-learning, proximity-preserving or message-passing paradigms. Finally, the constructed gene-disease network, embedding results and codes were made available. DATA AND CODES AVAILABILITY: The constructed massive gene-disease network is available at: https://hzaubionlp.com/heterogeneous-biological-network/. The codes are available at: https://github.com/bionlp-hzau/JDHMT.


Subject(s)
Algorithms , Semantics , Learning , Phenotype
6.
Inquiry ; 58: 469580211035727, 2021.
Article in English | MEDLINE | ID: mdl-34541956

ABSTRACT

This study aimed to investigate factors affecting blood glucose control among middle-aged and older diabetic patients taking medications or receiving insulin therapy. In 2015-2017 data obtained from the Korean National Health and Nutrition Examination Survey (KNHANES), 1257 patients with diabetes were divided into a controlled group and an uncontrolled group based on blood glucose levels (cutoff ≥126 mg/dL). After adjusting for confounding factors, the BMI, total cholesterol level, and triglycerides level of the uncontrolled group were significantly higher than the controlled group. The total amount of moderate-intensity activity in controlled patients was significantly higher than that of the controlled group. Total energy, fat, saturated fatty acids, and cholesterol intakes were found to be significantly higher in the uncontrolled than controlled group. Intakes of calcium, phosphorus, potassium, riboflavin, niacin, and vitamin C were significantly lower in the uncontrolled than controlled group. Adequate nutrition intake and physical activity of patients undergoing diabetes therapy are required for effective blood glucose management for both diabetic drug and insulin therapies.


Subject(s)
Diabetes Mellitus , Glycemic Control , Aged , Cross-Sectional Studies , Eating , Exercise , Humans , Middle Aged , Nutrition Surveys , Republic of Korea
7.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33847357

ABSTRACT

Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations that link genotype and phenotype. However, due to the differences of size and quality between GWAS studies, not all de facto vital variations are able to pass the multiple testing. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain of function and loss of function. To bring together the heterogeneous mutation data, we propose a 'Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)' pipeline with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types and recovers false-negative GWAS mutations that fail to pass significant test but represent supportive evidences of functional biological process in literature. Eventually, we applied GDAMDB in Alzheimer's disease (AD) and predicted 79 AD-associated genes. Besides, 12 of them from the original GWAS, 60 of them are supported to be AD-related by other GWAS or literature report, and rest of them are newly predicted genes. Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.


Subject(s)
Alzheimer Disease/genetics , Genetic Association Studies/methods , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Mutation , Polymorphism, Single Nucleotide , Algorithms , Computational Biology/methods , Data Mining/methods , Gene Regulatory Networks/genetics , Genotype , Humans , Phenotype , Protein Interaction Maps/genetics , Reproducibility of Results
8.
J Liver Cancer ; 21(2): 146-154, 2021 Sep.
Article in English | MEDLINE | ID: mdl-37383084

ABSTRACT

Background/Aims: Surgical resection, transplantation, and radiofrequency ablation (RFA) are generally accepted as amenable treatments for small hepatocellular carcinoma (HCC). Recently drug-eluting beads (DEB) which had several treatment advantages were introduced for transarterial chemoembolization (TACE). The aim of this study was to evaluate feasibility and safety of DEB-TACE compared with RFA for the treatment of single small HCC. Methods: In this pilot non-randomized trial, we assessed retrospective data of 40 patients who underwent DEB-TACE (n=21) or RFA (n=19) for single small (≤3 centimeter in greatest dimension) HCC. The primary outcomes were tumor response and time to recurrence. The secondary outcome was treatment-related complications. Results: Complete response rate to DEB-TACE and RFA after first follow-up assessment was 90.5% and 94.7%, respectively (P=1.000). During mean follow-up of 87.6 months (95% confidence interval, 74.4-102), 7 patients experienced local recurrence. The 6- and 12-month cumulative local recurrence rate was 5.0% and 21.8% in DEB-TACE vs. 11.1% and 17.0% in RFA group (P=0.877). A total 14 distant intrahepatic recurrences were developed and 12- and 24-month cumulative distant intrahepatic recurrence rate was 20.6% and 42.7% in DEB-TACE vs. 17.2% and 36.3% in RFA group (P=0.844). Two patients experienced gangrenous cholecystitis after DEB-TACE requiring cholecystectomy as treatment-related adverse event. Conclusions: Tumor response and recurrence rate after single session of DEB-TACE or RFA were similar. DEB-TACE could be applied selectively in patients with a single small HCC if the other therapeutic modality is unfeasible.

10.
Genomics Inform ; 18(2): e15, 2020 Jun.
Article in English | MEDLINE | ID: mdl-32634869

ABSTRACT

Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity "type 1 diabetes" in the phrase "type 1 and type 2 diabetes." This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE's existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems.

11.
Genomics Inform ; 18(2): e24, 2020 Jun.
Article in English | MEDLINE | ID: mdl-32634878

ABSTRACT

Despite a growing number of natural language processing shared-tasks dedicated to the use of Twitter data, there is currently no ad-hoc annotation tool for the purpose. During the 6th edition of BLAH, after a short review of 19 generic annotation tools, we adapted GATE and TextAE for annotating Twitter timelines. Although none of the tools reviewed allow the annotation of all information inherent of Twitter timelines, a few may be suitable provided the willingness by annotators to compromise on some functionality.

12.
F1000Res ; 9: 136, 2020.
Article in English | MEDLINE | ID: mdl-32308977

ABSTRACT

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


Subject(s)
Biological Science Disciplines , Computational Biology , Semantic Web , Data Mining , Metadata , Reproducibility of Results
14.
Genomics Inform ; 17(2): e19, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31307134

ABSTRACT

In this paper we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.

15.
Bioinformatics ; 35(21): 4372-4380, 2019 11 01.
Article in English | MEDLINE | ID: mdl-30937439

ABSTRACT

MOTIVATION: Most currently available text mining tools share two characteristics that make them less than optimal for use by biomedical researchers: they require extensive specialist skills in natural language processing and they were built on the assumption that they should optimize global performance metrics on representative datasets. This is a problem because most end-users are not natural language processing specialists and because biomedical researchers often care less about global metrics like F-measure or representative datasets than they do about more granular metrics such as precision and recall on their own specialized datasets. Thus, there are fundamental mismatches between the assumptions of much text mining work and the preferences of potential end-users. RESULTS: This article introduces the concept of Agile text mining, and presents the PubAnnotation ecosystem as an example implementation. The system approaches the problems from two perspectives: it allows the reformulation of text mining by biomedical researchers from the task of assembling a complete system to the task of retrieving warehoused annotations, and it makes it possible to do very targeted customization of the pre-existing system to address specific end-user requirements. Two use cases are presented: assisted curation of the GlycoEpitope database, and assessing coverage in the literature of pre-eclampsia-associated genes. AVAILABILITY AND IMPLEMENTATION: The three tools that make up the ecosystem, PubAnnotation, PubDictionaries and TextAE are publicly available as web services, and also as open source projects. The dictionaries and the annotation datasets associated with the use cases are all publicly available through PubDictionaries and PubAnnotation, respectively.


Subject(s)
Computational Biology , Ecosystem , Data Mining , Female , Humans , Natural Language Processing , Pregnancy , PubMed
16.
Math Biosci Eng ; 16(3): 1376-1391, 2019 02 20.
Article in English | MEDLINE | ID: mdl-30947425

ABSTRACT

For discovery of new usage of drugs, the function type of their target genes plays an important role, and the hypothesis of "Antagonist-GOF" and "Agonist-LOF" has laid a solid foundation for supporting drug repurposing. In this research, an active gene annotation corpus was used as training data to predict the gain-of-function or loss-of-function or unknown character of each human gene after variation events. Unlike the design of(entity, predicate, entity) triples in a traditional three way tensor, a four way and a five way tensor, GMFD-/GMAFD-tensor, were designed to represent higher order links among or among part of these entities: genes(G), mutations(M), functions(F), diseases( D) and annotation labels(A). A tensor decomposition algorithm, CP decomposition, was applied to the higher order tensor and to unveil the correlation among entities. Meanwhile, a state-of-the-art baseline tensor decomposition algorithm, RESCAL, was carried on the three way tensor as a comparing method. The result showed that CP decomposition on higher order tensor performed better than RESCAL on traditional three way tensor in recovering masked data and making predictions. In addition, The four way tensor was proved to be the best format for our issue. At the end, a case study reproducing two disease-gene-drug links(Myelodysplatic Syndromes-IL2RA-Aldesleukin, Lymphoma- IL2RA-Aldesleukin) presented the feasibility of our prediction model for drug repurposing.


Subject(s)
Drug Repositioning/economics , Drug Repositioning/methods , Genetic Variation , Machine Learning , Mutation , Algorithms , Cost-Benefit Analysis , Genetic Diseases, Inborn/genetics , Humans , Interleukin-2/analogs & derivatives , Interleukin-2/therapeutic use , Interleukin-2 Receptor alpha Subunit/genetics , Lymphoma/genetics , Models, Genetic , Molecular Sequence Annotation , Myelodysplastic Syndromes/genetics , Recombinant Proteins/therapeutic use , Software
17.
Hepatology ; 70(2): 621-629, 2019 08.
Article in English | MEDLINE | ID: mdl-30194739

ABSTRACT

Acute liver failure (ALF) caused by hepatitis A is a rare but fatal disease. Here, we developed a model to predict outcome in patients with ALF caused by hepatitis A. The derivation set consisted of 294 patients diagnosed with hepatitis A-related ALF (ALFA) from Korea, and a validation set of 56 patients from Japan, India, and United Kingdom. Using a multivariate proportional hazard model, a risk-prediction model (ALFA score) consisting of age, international normalized ratio, bilirubin, ammonia, creatinine, and hemoglobin levels acquired on the day of ALF diagnosis was developed. The ALFA score showed the highest discrimination in the prediction of liver transplant or death at 1 month (c-statistic, 0.87; 95% confidence interval [CI], 0.84-0.92) versus King's College criteria (KCC; c-statistic, 0.56; 95% CI, 0.53-0.59), U.S. Acute Liver Failure Study Group index specific for hepatitis A virus (HAV-ALFSG; c-statistic, 0.70; 95% CI, 0.65-0.76), the new ALFSG index (c-statistic, 0.79; 95% CI, 0.74-0.84), Model for End-Stage Liver Disease (MELD; c-statistic, 0.79; 95% CI, 0.74-0.84), and MELD including sodium (MELD-Na; c-statistic, 0.78; 95% CI, 0.73-0.84) in the derivation set (all P < 0.01). In the validation set, the performance of the ALFA score (c-statistic, 0.84; 95% CI, 0.74-0.94) was significantly better than that of KCC (c-statistic, 0.65; 95% CI, 0.52-0.79), MELD (c-statistic, 0.74; 95% CI, 0.61-0.87), and MELD-Na (c-statistic, 0.72; 95% CI, 0.58-0.85) (all P < 0.05), and better, but not statistically significant, than that of the HAV-ALFSG (c-statistic, 0.76; 95% CI, 0.61-0.90; P = 0.28) and new ALFSG indices (c-statistic, 0.79; 95% CI, 0.65-0.93; P = 0.41). The model was well-calibrated in both sets. Conclusion: Our disease-specific score provides refined prediction of outcome in patients with ALF caused by hepatitis A.


Subject(s)
Hepatitis A/complications , Liver Failure, Acute/etiology , Liver Failure, Acute/surgery , Liver Transplantation/statistics & numerical data , Models, Statistical , Adult , Female , Humans , Liver Failure, Acute/mortality , Male , Middle Aged , Prognosis , Risk Assessment , Time Factors
18.
Am J Hum Genet ; 103(3): 389-399, 2018 09 06.
Article in English | MEDLINE | ID: mdl-30173820

ABSTRACT

Recently, to speed up the differential-diagnosis process based on symptoms and signs observed from an affected individual in the diagnosis of rare diseases, researchers have developed and implemented phenotype-driven differential-diagnosis systems. The performance of those systems relies on the quantity and quality of underlying databases of disease-phenotype associations (DPAs). Although such databases are often developed by manual curation, they inherently suffer from limited coverage. To address this problem, we propose a text-mining approach to increase the coverage of DPA databases and consequently improve the performance of differential-diagnosis systems. Our analysis showed that a text-mining approach using one million case reports obtained from PubMed could increase the coverage of manually curated DPAs in Orphanet by 125.6%. We also present PubCaseFinder (see Web Resources), a new phenotype-driven differential-diagnosis system in a freely available web application. By utilizing automatically extracted DPAs from case reports in addition to manually curated DPAs, PubCaseFinder improves the performance of automated differential diagnosis. Moreover, PubCaseFinder helps clinicians search for relevant case reports by using phenotype-based comparisons and confirm the results with detailed contextual information.


Subject(s)
Rare Diseases/diagnosis , Rare Diseases/genetics , Data Mining/methods , Databases, Genetic , Diagnosis, Differential , Humans , Phenotype
19.
J Gastroenterol Hepatol ; 33(4): 910-917, 2018 Apr.
Article in English | MEDLINE | ID: mdl-28910501

ABSTRACT

BACKGROUND AND AIM: Although serum cystatin C level is considered a more accurate marker of renal function in patients with liver cirrhosis, its prognostic efficacy remains uncertain. This study aimed to evaluate the prognostic efficacy of serum cystatin C level in patients with cirrhotic ascites. METHODS: Patients with cirrhotic ascites from 15 hospitals were prospectively enrolled between September 2009 and March 2013. Cox regression analyses were performed to identify independent predictive factors of mortality and development of type 1 hepatorenal syndrome (HRS-1). RESULTS: In total, 350 patients were enrolled in this study. The mean age was 55.4 ± 10.8 years, and 267 patients (76.3%) were men. The leading cause of liver cirrhosis was alcoholic liver disease (64.3%), followed by chronic viral hepatitis (29.7%). Serum creatinine and cystatin C levels were 0.9 ± 0.4 mg/dL and 1.1 ± 0.5 mg/L, respectively. Multivariate analyses revealed that international normalized ratio and serum bilirubin, sodium, and cystatin C levels were independent predictors of mortality and international normalized ratio and serum sodium and cystatin C levels were independent predictors of the development of HRS-1. Serum creatinine level was not significantly associated with mortality and development of HRS-1 on multivariate analysis. CONCLUSION: Serum cystatin C level was an independent predictor of mortality and development of HRS-1 in patients with cirrhotic ascites, while serum creatinine level was not. Predictive models based on serum cystatin C level instead of serum creatinine level would be more helpful in the assessment of the condition and prognosis of patients with cirrhotic ascites.


Subject(s)
Ascites/diagnosis , Cystatin C/blood , Liver Cirrhosis/diagnosis , Aged , Ascites/etiology , Biomarkers/blood , Female , Hepatitis, Viral, Human/complications , Hepatorenal Syndrome/etiology , Humans , Liver Cirrhosis/etiology , Liver Diseases, Alcoholic/complications , Male , Middle Aged , Predictive Value of Tests , Prognosis , Proportional Hazards Models , Prospective Studies
20.
PeerJ ; 5: e2990, 2017.
Article in English | MEDLINE | ID: mdl-28265499

ABSTRACT

BACKGROUND: In the era of semantic web, life science ontologies play an important role in tasks such as annotating biological objects, linking relevant data pieces, and verifying data consistency. Understanding ontology structures and overlapping ontologies is essential for tasks such as ontology reuse and development. We present an exploratory study where we examine structure and look for patterns in BioPortal, a comprehensive publicly available repository of live science ontologies. METHODS: We report an analysis of biomedical ontology mapping data over time. We apply graph theory methods such as Modularity Analysis and Betweenness Centrality to analyse data gathered at five different time points. We identify communities, i.e., sets of overlapping ontologies, and define similar and closest communities. We demonstrate evolution of identified communities over time and identify core ontologies of the closest communities. We use BioPortal project and category data to measure community coherence. We also validate identified communities with their mutual mentions in scientific literature. RESULTS: With comparing mapping data gathered at five different time points, we identified similar and closest communities of overlapping ontologies, and demonstrated evolution of communities over time. Results showed that anatomy and health ontologies tend to form more isolated communities compared to other categories. We also showed that communities contain all or the majority of ontologies being used in narrower projects. In addition, we identified major changes in mapping data after migration to BioPortal Version 4.

SELECTION OF CITATIONS
SEARCH DETAIL
...