Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
iScience ; 26(11): 108214, 2023 Nov 17.
Article in English | MEDLINE | ID: mdl-37953943

ABSTRACT

Repetitive sequences represent about 45% of the human genome. Some are transposable elements (TEs) with the ability to change their position in the genome, creating genetic variability both as insertions or deletions, with potential pathogenic consequences. We used long-read nanopore sequencing to identify TE variants in the genomes of 24 patients with antithrombin deficiency. We identified 7 344 TE insertions and 3 056 TE deletions, 2 926 were not previously described in publicly available databases. The insertions affected 3 955 genes, with 6 insertions located in exons, 3 929 in introns, and 147 in promoters. Potential functional impact was evaluated with gene annotation and enrichment analysis, which suggested a strong relationship with neuron-related functions and autism. We conclude that this study encourages the generation of a complete map of TEs in the human genome, which will be useful for identifying new TEs involved in genetic disorders.

2.
PLoS One ; 18(8): e0290372, 2023.
Article in English | MEDLINE | ID: mdl-37616197

ABSTRACT

The World Health Organization has estimated that air pollution will be one of the most significant challenges related to the environment in the following years, and air quality monitoring and climate change mitigation actions have been promoted due to the Paris Agreement because of their impact on mortality risk. Thus, generating a methodology that supports experts in making decisions based on exposure data, identifying exposure-related activities, and proposing mitigation scenarios is essential. In this context, the emergence of Interactive Process Mining-a discipline that has progressed in the last years in healthcare-could help to develop a methodology based on human knowledge. For this reason, we propose a new methodology for a sequence-oriented sensitive analysis to identify the best activities and parameters to offer a mitigation policy. This methodology is innovative in the following points: i) we present in this paper the first application of Interactive Process Mining pollution personal exposure mitigation; ii) our solution reduces the computation cost and time of the traditional sensitive analysis; iii) the methodology is human-oriented in the sense that the process should be done with the environmental expert; and iv) our solution has been tested with synthetic data to explore the viability before the move to physical exposure measurements, taking the city of Valencia as the use case, and overcoming the difficulty of performing exposure measurements. This dataset has been generated with a model that considers the city of Valencia's demographic and epidemiological statistics. We have demonstrated that the assessments done using sequence-oriented sensitive analysis can identify target activities. The proposed scenarios can improve the initial KPIs-in the best scenario; we reduce the population exposure by 18% and the relative risk by 12%. Consequently, our proposal could be used with real data in future steps, becoming an innovative point for air pollution mitigation and environmental improvement.


Subject(s)
Air Pollution , Humans , Risk Assessment , Climate Change , Decision Making , Particulate Matter
3.
Front Plant Sci ; 14: 1120183, 2023.
Article in English | MEDLINE | ID: mdl-36778675

ABSTRACT

Short term experiments have identified heat shock and cold response elements in many biological systems. However, the effect of long-term low or high temperatures is not well documented. To address this gap, we grew Antirrhinum majus plants from two-weeks old until maturity under control (normal) (22/16°C), cold (15/5°C), and hot (30/23°C) conditions for a period of two years. Flower size, petal anthocyanin content and pollen viability obtained higher values in cold conditions, decreasing in middle and high temperatures. Leaf chlorophyll content was higher in cold conditions and stable in control and hot temperatures, while pedicel length increased under hot conditions. The control conditions were optimal for scent emission and seed production. Scent complexity was low in cold temperatures. The transcriptomic analysis of mature flowers, followed by gene enrichment analysis and CNET plot visualization, showed two groups of genes. One group comprised genes controlling the affected traits, and a second group appeared as long-term adaptation to non-optimal temperatures. These included hypoxia, unsaturated fatty acid metabolism, ribosomal proteins, carboxylic acid, sugar and organic ion transport, or protein folding. We found a differential expression of floral organ identity functions, supporting the flower size data. Pollinator-related traits such as scent and color followed opposite trends, indicating an equilibrium for rendering the organs for pollination attractive under changing climate conditions. Prolonged heat or cold cause structural adaptations in protein synthesis and folding, membrane composition, and transport. Thus, adaptations to cope with non-optimal temperatures occur in basic cellular processes.

4.
J Biomed Semantics ; 13(1): 19, 2022 07 15.
Article in English | MEDLINE | ID: mdl-35841031

ABSTRACT

BACKGROUND: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. RESULTS: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings' classes belonged to top-level classes that matched. CONCLUSIONS: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.


Subject(s)
Biological Ontologies , Ecosystem , Consensus , Information Storage and Retrieval , Systematized Nomenclature of Medicine , Unified Medical Language System
5.
Comput Struct Biotechnol J ; 20: 2728-2744, 2022.
Article in English | MEDLINE | ID: mdl-35685360

ABSTRACT

The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.

6.
Biochim Biophys Acta Gene Regul Mech ; 1864(11-12): 194766, 2021.
Article in English | MEDLINE | ID: mdl-34710644

ABSTRACT

Gene regulation computational research requires handling and integrating large amounts of heterogeneous data. The Gene Ontology has demonstrated that ontologies play a fundamental role in biological data interoperability and integration. Ontologies help to express data and knowledge in a machine processable way, which enables complex querying and advanced exploitation of distributed data. Contributing to improve data interoperability in gene regulation is a major objective of the GREEKC Consortium, which aims to develop a standardized gene regulation knowledge commons. GREEKC proposes the use of ontologies and semantic tools for developing interoperable gene regulation knowledge models, which should support data annotation. In this work, we study how such knowledge models can be generated from cartoons of gene regulation scenarios. The proposed method consists of generating descriptions in natural language of the cartoons; extracting the entities from the texts; finding those entities in existing ontologies to reuse as much content as possible, especially from well known and maintained ontologies such as the Gene Ontology, the Sequence Ontology, the Relations Ontology and ChEBI; and implementation of the knowledge models. The models have been implemented using Protégé, a general ontology editor, and Noctua, the tool developed by the Gene Ontology Consortium for the development of causal activity models to capture more comprehensive annotations of genes and link their activities in a causal framework for Gene Ontology Annotations. We applied the method to two gene regulation scenarios and illustrate how to apply the models generated to support the annotation of data from research articles.


Subject(s)
Gene Expression Regulation , Models, Genetic , Data Curation , Gene Ontology , Molecular Sequence Annotation
7.
BMC Med Inform Decis Mak ; 20(Suppl 10): 284, 2020 12 15.
Article in English | MEDLINE | ID: mdl-33319711

ABSTRACT

BACKGROUND: The increasing adoption of ontologies in biomedical research and the growing number of ontologies available have made it necessary to assure the quality of these resources. Most of the well-established ontologies, such as the Gene Ontology or SNOMED CT, have their own quality assurance processes. These have demonstrated their usefulness for the maintenance of the resources but are unable to detect all of the modelling flaws in the ontologies. Consequently, the development of efficient and effective quality assurance methods is needed. METHODS: Here, we propose a series of quantitative metrics based on the processing of the lexical regularities existing in the content of the ontology, to analyse readability and structural accuracy. The readability metrics account for the ratio of labels, descriptions, and synonyms associated with the ontology entities. The structural accuracy metrics evaluate how two ontology modelling best practices are followed: (1) lexically suggest locally define (LSLD), that is, if what is expressed in natural language for humans is available as logical axioms for machines; and (2) systematic naming, which accounts for the amount of label content of the classes in a given taxonomy shared. RESULTS: We applied the metrics to different versions of SNOMED CT. Both readability and structural accuracy metrics remained stable in time but could capture some changes in the modelling decisions in SNOMED CT. The value of the LSLD metric increased from 0.27 to 0.31, and the value of the systematic naming metric was around 0.17. We analysed the readability and structural accuracy in the SNOMED CT July 2019 release. The results showed that the fulfilment of the structural accuracy criteria varied among the SNOMED CT hierarchies. The value of the metrics for the hierarchies was in the range of 0-0.92 (LSLD) and 0.08-1 (systematic naming). We also identified the cases that did not meet the best practices. CONCLUSIONS: We generated useful information about the engineering of the ontology, making the following contributions: (1) a set of readability metrics, (2) the use of lexical regularities to define structural accuracy metrics, and (3) the generation of quality assurance information for SNOMED CT.


Subject(s)
Biological Ontologies , Systematized Nomenclature of Medicine , Comprehension , Gene Ontology , Humans , Language , Natural Language Processing
8.
Comput Methods Programs Biomed ; 197: 105616, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32629294

ABSTRACT

BACKGROUND AND OBJECTIVE: Effective sharing and reuse of Electronic Health Records (EHR) requires technological solutions which deal with different representations and different models of data. This includes information models, domain models and, ideally, inference models, which enable clinical decision support based on a knowledge base and facts. Our goal is to develop a framework to support EHR interoperability based on transformation and reasoning services intended for clinical data and knowledge. METHODS: Our framework is based on workflows whose primary components are reusable mappings. Key features are an integrated representation, storage, and exploitation of different types of mappings for clinical data transformation purposes, as well as the support for the discovery of new workflows. The current framework supports mappings which take advantage of the best features of EHR standards and ontologies. Our proposal is based on our previous results and experience working with both technological infrastructures. RESULTS: We have implemented CLIN-IK-LINKS, a web-based platform that enables users to create, modify and delete mappings as well as to define and execute workflows. The platform has been applied in two use cases: semantic publishing of clinical laboratory test results; and implementation of two colorectal cancer screening protocols. Real data have been used in both use cases. CONCLUSIONS: The CLIN-IK-LINKS platform allows the composition and execution of clinical data transformation workflows to convert EHR data into EHR and/or semantic web standards. Having proved its usefulness to implement clinical data transformation applications of interest, CLIN-IK-LINKS can be regarded as a valuable contribution to improve the semantic interoperability of EHR systems.


Subject(s)
Decision Support Systems, Clinical , Electronic Health Records , Workflow , Computer Systems , Knowledge Bases
9.
Brief Bioinform ; 21(2): 473-485, 2020 03 23.
Article in English | MEDLINE | ID: mdl-30715146

ABSTRACT

The development and application of biological ontologies have increased significantly in recent years. These ontologies can be retrieved from different repositories, which do not provide much information about quality aspects of the ontologies. In the past years, some ontology structural metrics have been proposed, but their validity as measurement instrument has not been sufficiently studied to date. In this work, we evaluate a set of reproducible and objective ontology structural metrics. Given the lack of standard methods for this purpose, we have applied an evaluation method based on the stability and goodness of the classifications of ontologies produced by each metric on an ontology corpus. The evaluation has been done using ontology repositories as corpora. More concretely, we have used 119 ontologies from the OBO Foundry repository and 78 ontologies from AgroPortal. First, we study the correlations between the metrics. Second, we study whether the clusters for a given metric are stable and have a good structure. The results show that the existing correlations are not biasing the evaluation, there are no metrics generating unstable clusterings and all the metrics evaluated provide at least reasonable clustering structure. Furthermore, our work permits to review and suggest the most reliable ontology structural metrics in terms of stability and goodness of their classifications. Availability: http://sele.inf.um.es/ontology-metrics.


Subject(s)
Biological Ontologies , Database Management Systems , Public Sector
10.
Sci Data ; 6(1): 255, 2019 10 31.
Article in English | MEDLINE | ID: mdl-31672979

ABSTRACT

Colorectal cancer (CRC) is the third leading cause of cancer mortality worldwide. Different pathological pathways and molecular drivers have been described and some of the associated markers are used to select effective anti-neoplastic therapy. More recent evidence points to a causal role of microbiota and altered microRNA expression in CRC carcinogenesis, but their relationship with pathological drivers or molecular phenotypes is not clearly established. Joint analysis of clinical and omics data can help clarify such relations. We present ColPortal, a platform that integrates transcriptomic, microtranscriptomic, methylomic and microbiota data of patients with colorectal cancer. ColPortal also includes detailed information of histological features and digital histological slides from the study cases, since histology is a morphological manifestation of a complex molecular change. The current cohort consists of Caucasian patients from Europe. For each patient, demographic information, location, histology, tumor staging, tissue prognostic factors, molecular biomarker status and clinical outcomes are integrated with omics data. ColPortal allows one to perform multiomics analyses for groups of patients selected by their clinical data.


Subject(s)
Colorectal Neoplasms/genetics , Epigenesis, Genetic , Europe , Gene Expression Regulation, Neoplastic , Humans , Microbiota , Transcriptome
11.
Bioinformatics ; 34(22): 3788-3794, 2018 11 15.
Article in English | MEDLINE | ID: mdl-29868922

ABSTRACT

Motivation: Translation is a key biological process controlled in eukaryotes by the initiation AUG codon. Variations affecting this codon may have pathological consequences by disturbing the correct initiation of translation. Unfortunately, there is no systematic study describing these variations in the human genome. Moreover, we aimed to develop new tools for in silico prediction of the pathogenicity of gene variations affecting AUG codons, because to date, these gene defects have been wrongly classified as missense. Results: Whole-exome analysis revealed the mean of 12 gene variations per person affecting initiation codons, mostly with high (>0.01) minor allele frequency (MAF). Moreover, analysis of Ensembl data (December 2017) revealed 11 261 genetic variations affecting the initiation AUG codon of 7205 genes. Most of these variations (99.5%) have low or unknown MAF, probably reflecting deleterious consequences. Only 62 variations had high MAF. Genetic variations with high MAF had closer alternative AUG downstream codons than did those with low MAF. Besides, the high-MAF group better maintained both the signal peptide and reading frame. These differentiating elements could help to determine the pathogenicity of this kind of variation. Availability and implementation: Data and scripts in Perl and R are freely available at https://github.com/fanavarro/hemodonacion. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Codon, Initiator , Computational Biology , Genome, Human , Codon , Humans , Protein Biosynthesis
12.
J Biomed Inform ; 84: 59-74, 2018 08.
Article in English | MEDLINE | ID: mdl-29908358

ABSTRACT

Ontologies and terminologies have been identified as key resources for the achievement of semantic interoperability in biomedical domains. The development of ontologies is performed as a joint work by domain experts and knowledge engineers. The maintenance and auditing of these resources is also the responsibility of such experts, and this is usually a time-consuming, mostly manual task. Manual auditing is impractical and ineffective for most biomedical ontologies, especially for larger ones. An example is SNOMED CT, a key resource in many countries for codifying medical information. SNOMED CT contains more than 300000 concepts. Consequently its auditing requires the support of automatic methods. Many biomedical ontologies contain natural language content for humans and logical axioms for machines. The 'lexically suggest, logically define' principle means that there should be a relation between what is expressed in natural language and as logical axioms, and that such a relation should be useful for auditing and quality assurance. Besides, the meaning of this principle is that the natural language content for humans could be used to generate the logical axioms for the machines. In this work, we propose a method that combines lexical analysis and clustering techniques to (1) identify regularities in the natural language content of ontologies; (2) cluster, by similarity, labels exhibiting a regularity; (3) extract relevant information from those clusters; and (4) propose logical axioms for each cluster with the support of axiom templates. These logical axioms can then be evaluated with the existing axioms in the ontology to check their correctness and completeness, which are two fundamental objectives in auditing and quality assurance. In this paper, we describe the application of the method to two SNOMED CT modules, a 'congenital' module, obtained using concepts exhibiting the attribute Occurrence - Congenital, and a 'chronic' module, using concepts exhibiting the attribute Clinical course - Chronic. We obtained a precision and a recall of respectively 75% and 28% for the 'congenital' module, and 64% and 40% for the 'chronic' one. We consider these results to be promising, so our method can contribute to the support of content editors by using automatic methods for assuring the quality of biomedical ontologies and terminologies.


Subject(s)
Biological Ontologies , Computational Biology/methods , Systematized Nomenclature of Medicine , Algorithms , Cluster Analysis , Language , Medical Informatics , Natural Language Processing , Pattern Recognition, Automated , Programming Languages , Quality Control , Reproducibility of Results , Software , Terminology as Topic
13.
Bioinformatics ; 34(2): 323-329, 2018 Jan 15.
Article in English | MEDLINE | ID: mdl-28968857

ABSTRACT

The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

14.
AMIA Annu Symp Proc ; 2018: 922-931, 2018.
Article in English | MEDLINE | ID: mdl-30815135

ABSTRACT

Clinical Practice Guidelines (CPGs) contain recommendations intended to optimize patient care, produced based on a systematic review of evidence. In turn, Computer-Interpretable Guidelines (CIGs) are formalized versions of CPGs for use as decision-support systems. We consider the enrichment of the CIG by means of an OWL ontology that describes the clinical domain of the CIG, which could be exploited e.g. for the interoperability with the Electronic Health Record (EHR). As a first step, in this paper we describe a method to support the development of such an ontology starting from a CIG. The method uses an alignment algorithm for the automated identification of ontological terms relevant to the clinical domain of the CIG, as well as a web platform to manually review the alignments and select the appropriate ones. Finally, we present the results of the application of the method to a small corpus of CIGs.


Subject(s)
Decision Support Systems, Clinical , Electronic Health Records , Practice Guidelines as Topic , Vocabulary, Controlled , Algorithms , Health Information Interoperability , Humans , Semantics
15.
J Biomed Semantics ; 8(1): 46, 2017 Sep 29.
Article in English | MEDLINE | ID: mdl-28962670

ABSTRACT

BACKGROUND: Regional and epidemiological cancer registries are important for cancer research and the quality management of cancer treatment. Many technological solutions are available to collect and analyse data for cancer registries nowadays. However, the lack of a well-defined common semantic model is a problem when user-defined analyses and data linking to external resources are required. The objectives of this study are: (1) design of a semantic model for local cancer registries; (2) development of a semantically-enabled cancer registry based on this model; and (3) semantic exploitation of the cancer registry for analysing and visualising disease courses. RESULTS: Our proposal is based on our previous results and experience working with semantic technologies. Data stored in a cancer registry database were transformed into RDF employing a process driven by OWL ontologies. The semantic representation of the data was then processed to extract semantic patient profiles, which were exploited by means of SPARQL queries to identify groups of similar patients and to analyse the disease timelines of patients. Based on the requirements analysis, we have produced a draft of an ontology that models the semantics of a local cancer registry in a pragmatic extensible way. We have implemented a Semantic Web platform that allows transforming and storing data from cancer registries in RDF. This platform also permits users to formulate incremental user-defined queries through a graphical user interface. The query results can be displayed in several customisable ways. The complex disease timelines of individual patients can be clearly represented. Different events, e.g. different therapies and disease courses, are presented according to their temporal and causal relations. CONCLUSION: The presented platform is an example of the parallel development of ontologies and applications that take advantage of semantic web technologies in the medical field. The semantic structure of the representation renders it easy to analyse key figures of the patients and their evolution at different granularity levels.


Subject(s)
Database Management Systems , Neoplasms/therapy , Registries , Semantics , Databases, Factual , Humans , Information Storage and Retrieval/methods , Internet , Medical Informatics/methods , Neoplasms/diagnosis
16.
Stud Health Technol Inform ; 235: 416-420, 2017.
Article in English | MEDLINE | ID: mdl-28423826

ABSTRACT

ArchMS is a framework that represents clinical information and knowledge using ontologies in OWL, which facilitates semantic interoperability and thereby the exploitation and secondary use of clinical data. However, it does not yet support the automated assessment of quality of care. CLIF is a stepwise method to formalize quality indicators. The method has been implemented in the CLIF tool which supports its users in generating computable queries based on a patient data model which can be based on archetypes. To enable the automated computation of quality indicators using ontologies and archetypes, we tested whether ArchMS and the CLIF tool can be integrated. We successfully automated the process of generating SPARQL queries from quality indicators that have been formalized with CLIF and integrated them into ArchMS. Hence, ontologies and archetypes can be combined for the execution of formalized quality indicators.


Subject(s)
Medical Informatics , Quality Indicators, Health Care , Semantics , Biological Ontologies , Humans , Knowledge
17.
Stud Health Technol Inform ; 235: 426-430, 2017.
Article in English | MEDLINE | ID: mdl-28423828

ABSTRACT

The biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task as they evolve rapidly, being new versions regularly published. Therefore, methods to support ontology developers in analysing and tracking the evolution of their ontologies are needed. OQuaRE is an ontology evaluation framework based on quantitative metrics that permits to obtain normalised scores for different ontologies. In this work, OQuaRE has been applied to 408 versions of the eight OBO Foundry member ontologies. The OBO Foundry member ontologies are supposed to have been built by applying the OBO Foundry principles. Our results show that this set of ontologies is actually following principles such as the naming convention, and that the evolution of the OBO Foundry member ontologies is generating ontologies with higher OQuaRE quality scores.


Subject(s)
Biological Ontologies , Software
18.
J Biomed Semantics ; 7(1): 63, 2016 10 17.
Article in English | MEDLINE | ID: mdl-27751176

ABSTRACT

BACKGROUND: The biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task and biomedical ontologies evolve rapidly, so new versions are regularly and frequently published in ontology repositories. This has the implication of there being a high number of ontology versions over a short time span. Given this level of activity, ontology designers need to be supported in the effective management of the evolution of biomedical ontologies as the different changes may affect the engineering and quality of the ontology. This is why there is a need for methods that contribute to the analysis of the effects of changes and evolution of ontologies. RESULTS: In this paper we approach this issue from the ontology quality perspective. In previous work we have developed an ontology evaluation framework based on quantitative metrics, called OQuaRE. Here, OQuaRE is used as a core component in a method that enables the analysis of the different versions of biomedical ontologies using the quality dimensions included in OQuaRE. Moreover, we describe and use two scales for evaluating the changes between the versions of a given ontology. The first one is the static scale used in OQuaRE and the second one is a new, dynamic scale, based on the observed values of the quality metrics of a corpus defined by all the versions of a given ontology (life-cycle). In this work we explain how OQuaRE can be adapted for understanding the evolution of ontologies. Its use has been illustrated with the ontology of bioinformatics operations, types of data, formats, and topics (EDAM). CONCLUSIONS: The two scales included in OQuaRE provide complementary information about the evolution of the ontologies. The application of the static scale, which is the original OQuaRE scale, to the versions of the EDAM ontology reveals a design based on good ontological engineering principles. The application of the dynamic scale has enabled a more detailed analysis of the evolution of the ontology, measured through differences between versions. The statistics of change based on the OQuaRE quality scores make possible to identify key versions where some changes in the engineering of the ontology triggered a change from the OQuaRE quality perspective. In the case of the EDAM, this study let us to identify that the fifth version of the ontology has the largest impact in the quality metrics of the ontology, when comparative analyses between the pairs of consecutive versions are performed.


Subject(s)
Biological Ontologies , Quality Control
19.
Stud Health Technol Inform ; 228: 384-8, 2016.
Article in English | MEDLINE | ID: mdl-27577409

ABSTRACT

The number of biomedical ontologies has increased significantly in recent years. Many of such ontologies are the result of efforts of communities of domain experts and ontology engineers. The development and application of quality assurance (QA) methods should help these communities to develop useful ontologies for both humans and machines. According to previous studies, biomedical ontologies are rich in natural language content, but most of them are not so rich in axiomatic terms. Here, we are interested in studying the relation between content in natural language and content in axiomatic form. The analysis of the labels of the classes permits to identify lexical regularities (LRs), which are sets of words that are shared by labels of different classes. Our assumption is that the classes exhibiting an LR should be logically related through axioms, which is used to propose an algorithm to detect missing relations in the ontology. Here, we analyse a lexical regularity of SNOMED CT, congenital stenosis, which is reported as problematic by the SNOMED CT maintenance team.


Subject(s)
Biological Ontologies , Natural Language Processing , Aortic Valve Stenosis/congenital , Language , Systematized Nomenclature of Medicine
20.
Stud Health Technol Inform ; 228: 765-9, 2016.
Article in English | MEDLINE | ID: mdl-27577489

ABSTRACT

The construction and publication of predications form scientific literature databases like MEDLINE is necessary due to the large amount of resources available. The main goal is to infer meaningful predicates between relevant co-occurring MeSH concepts manually annotated from MEDLINE records. The resulting predications are formed as subject-predicate-object triples. We exploit the content of MRCOC file to extract the MeSH indexing terms (main headings and subheadings) of MEDLINE. The predications were inferred by combining the semantic predicates from SemMedDB, the clustering of MeSH terms by their associated MeSH subheadings and the frequency of relevant terms in the abstracts of MEDLINE records. The inferring process also obtains and associates a weight to each generated predication. As a result, we published the generated dataset of predications using the Linked Data principles to make it available for future projects.


Subject(s)
MEDLINE , Medical Subject Headings , Cluster Analysis , Semantics
SELECTION OF CITATIONS
SEARCH DETAIL
...