Search | VHL Regional Portal

An automated process for supporting decisions in clustering-based data analysis.

Bernabé-Díaz, José Antonio; Franco, Manuel; Vivo, Juana-María; Quesada-Martínez, Manuel; Fernández-Breis, Jesualdo T.

Comput Methods Programs Biomed ; 219: 106765, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35367914

ABSTRACT

BACKGROUND AND OBJECTIVE: Metrics are commonly used by biomedical researchers and practitioners to measure and evaluate properties of individuals, instruments, models, methods, or datasets. Due to the lack of a standardized validation procedure for a metric, it is assumed that if a metric is appropriate for analyzing a dataset in a certain domain, then it will be appropriate for other datasets in the same domain. However, such generalizability cannot be taken for granted, since the behavior of a metric can vary in different scenarios. The study of such behavior of a metric is the objective of this paper, since it would allow for assessing its reliability before drawing any conclusion about biomedical datasets. METHODS: We present a method to support in evaluating the behavior of quantitative metrics on datasets. Our approach assesses a metric by using clustering-based data analysis, and enhancing the decision-making process in the optimal classification. Our method assesses the metrics by applying two important criteria of the unsupervised classification validation that are calculated on the clusterings generated by the metric, namely stability and goodness of the clusters. The application of our method is facilitated to biomedical researchers by our evaluomeR tool. RESULTS: The analytical power of our methods is shown in the results of the application of our method to analyze (1) the behavior of the impact factor metric for a series of journal categories; (2) which structural metrics provide a better partitioning of the content of a repository of biomedical ontologies, and (3) the heterogeneity sources in effect size metrics of biomedical primary studies. CONCLUSIONS: The use of statistical properties such as stability and goodness of classifications allows for a useful analysis of the behavior of quantitative metrics, which can be used for supporting decisions about which metrics to apply on a certain dataset.

Subject(s)

Biological Ontologies , Data Analysis , Benchmarking , Cluster Analysis , Humans , Reproducibility of Results

Analysis of readability and structural accuracy in SNOMED CT.

Abad-Navarro, Francisco; Quesada-Martínez, Manuel; Duque-Ramos, Astrid; Fernández-Breis, Jesualdo Tomás.

BMC Med Inform Decis Mak ; 20(Suppl 10): 284, 2020 12 15.

Article in English | MEDLINE | ID: mdl-33319711

ABSTRACT

BACKGROUND: The increasing adoption of ontologies in biomedical research and the growing number of ontologies available have made it necessary to assure the quality of these resources. Most of the well-established ontologies, such as the Gene Ontology or SNOMED CT, have their own quality assurance processes. These have demonstrated their usefulness for the maintenance of the resources but are unable to detect all of the modelling flaws in the ontologies. Consequently, the development of efficient and effective quality assurance methods is needed. METHODS: Here, we propose a series of quantitative metrics based on the processing of the lexical regularities existing in the content of the ontology, to analyse readability and structural accuracy. The readability metrics account for the ratio of labels, descriptions, and synonyms associated with the ontology entities. The structural accuracy metrics evaluate how two ontology modelling best practices are followed: (1) lexically suggest locally define (LSLD), that is, if what is expressed in natural language for humans is available as logical axioms for machines; and (2) systematic naming, which accounts for the amount of label content of the classes in a given taxonomy shared. RESULTS: We applied the metrics to different versions of SNOMED CT. Both readability and structural accuracy metrics remained stable in time but could capture some changes in the modelling decisions in SNOMED CT. The value of the LSLD metric increased from 0.27 to 0.31, and the value of the systematic naming metric was around 0.17. We analysed the readability and structural accuracy in the SNOMED CT July 2019 release. The results showed that the fulfilment of the structural accuracy criteria varied among the SNOMED CT hierarchies. The value of the metrics for the hierarchies was in the range of 0-0.92 (LSLD) and 0.08-1 (systematic naming). We also identified the cases that did not meet the best practices. CONCLUSIONS: We generated useful information about the engineering of the ontology, making the following contributions: (1) a set of readability metrics, (2) the use of lexical regularities to define structural accuracy metrics, and (3) the generation of quality assurance information for SNOMED CT.

Subject(s)

Biological Ontologies , Systematized Nomenclature of Medicine , Comprehension , Gene Ontology , Humans , Language , Natural Language Processing

Evaluation of ontology structural metrics based on public repository data.

Franco, Manuel; Vivo, Juana María; Quesada-Martínez, Manuel; Duque-Ramos, Astrid; Fernández-Breis, Jesualdo Tomás.

Brief Bioinform ; 21(2): 473-485, 2020 03 23.

Article in English | MEDLINE | ID: mdl-30715146

ABSTRACT

The development and application of biological ontologies have increased significantly in recent years. These ontologies can be retrieved from different repositories, which do not provide much information about quality aspects of the ontologies. In the past years, some ontology structural metrics have been proposed, but their validity as measurement instrument has not been sufficiently studied to date. In this work, we evaluate a set of reproducible and objective ontology structural metrics. Given the lack of standard methods for this purpose, we have applied an evaluation method based on the stability and goodness of the classifications of ontologies produced by each metric on an ontology corpus. The evaluation has been done using ontology repositories as corpora. More concretely, we have used 119 ontologies from the OBO Foundry repository and 78 ontologies from AgroPortal. First, we study the correlations between the metrics. Second, we study whether the clusters for a given metric are stable and have a good structure. The results show that the existing correlations are not biasing the evaluation, there are no metrics generating unstable clusterings and all the metrics evaluated provide at least reasonable clustering structure. Furthermore, our work permits to review and suggest the most reliable ontology structural metrics in terms of stability and goodness of their classifications. Availability: http://sele.inf.um.es/ontology-metrics.

Subject(s)

Biological Ontologies , Database Management Systems , Public Sector

From lexical regularities to axiomatic patterns for the quality assurance of biomedical terminologies and ontologies.

van Damme, Philip; Quesada-Martínez, Manuel; Cornet, Ronald; Fernández-Breis, Jesualdo Tomás.

J Biomed Inform ; 84: 59-74, 2018 08.

Article in English | MEDLINE | ID: mdl-29908358

ABSTRACT

Ontologies and terminologies have been identified as key resources for the achievement of semantic interoperability in biomedical domains. The development of ontologies is performed as a joint work by domain experts and knowledge engineers. The maintenance and auditing of these resources is also the responsibility of such experts, and this is usually a time-consuming, mostly manual task. Manual auditing is impractical and ineffective for most biomedical ontologies, especially for larger ones. An example is SNOMED CT, a key resource in many countries for codifying medical information. SNOMED CT contains more than 300000 concepts. Consequently its auditing requires the support of automatic methods. Many biomedical ontologies contain natural language content for humans and logical axioms for machines. The 'lexically suggest, logically define' principle means that there should be a relation between what is expressed in natural language and as logical axioms, and that such a relation should be useful for auditing and quality assurance. Besides, the meaning of this principle is that the natural language content for humans could be used to generate the logical axioms for the machines. In this work, we propose a method that combines lexical analysis and clustering techniques to (1) identify regularities in the natural language content of ontologies; (2) cluster, by similarity, labels exhibiting a regularity; (3) extract relevant information from those clusters; and (4) propose logical axioms for each cluster with the support of axiom templates. These logical axioms can then be evaluated with the existing axioms in the ontology to check their correctness and completeness, which are two fundamental objectives in auditing and quality assurance. In this paper, we describe the application of the method to two SNOMED CT modules, a 'congenital' module, obtained using concepts exhibiting the attribute Occurrence - Congenital, and a 'chronic' module, using concepts exhibiting the attribute Clinical course - Chronic. We obtained a precision and a recall of respectively 75% and 28% for the 'congenital' module, and 64% and 40% for the 'chronic' one. We consider these results to be promising, so our method can contribute to the support of content editors by using automatic methods for assuring the quality of biomedical ontologies and terminologies.

Subject(s)

Biological Ontologies , Computational Biology/methods , Systematized Nomenclature of Medicine , Algorithms , Cluster Analysis , Language , Medical Informatics , Natural Language Processing , Pattern Recognition, Automated , Programming Languages , Quality Control , Reproducibility of Results , Software , Terminology as Topic

Towards the semantic enrichment of Computer Interpretable Guidelines: a method for the identification of relevant ontological terms.

Quesada-Martínez, Manuel; Marcos, Mar; Abad-Navarro, Francisco; Martínez-Salvador, Begoña; Fernández-Breis, Jesualdo Tomás.

AMIA Annu Symp Proc ; 2018: 922-931, 2018.

Article in English | MEDLINE | ID: mdl-30815135

ABSTRACT

Clinical Practice Guidelines (CPGs) contain recommendations intended to optimize patient care, produced based on a systematic review of evidence. In turn, Computer-Interpretable Guidelines (CIGs) are formalized versions of CPGs for use as decision-support systems. We consider the enrichment of the CIG by means of an OWL ontology that describes the clinical domain of the CIG, which could be exploited e.g. for the interoperability with the Electronic Health Record (EHR). As a first step, in this paper we describe a method to support the development of such an ontology starting from a CIG. The method uses an alignment algorithm for the automated identification of ontological terms relevant to the clinical domain of the CIG, as well as a web platform to manually review the alignments and select the appropriate ones. Finally, we present the results of the application of the method to a small corpus of CIGs.

Subject(s)

Decision Support Systems, Clinical , Electronic Health Records , Practice Guidelines as Topic , Vocabulary, Controlled , Algorithms , Health Information Interoperability , Humans , Semantics

Preliminary Analysis of the OBO Foundry Ontologies and Their Evolution Using OQuaRE.

Quesada-Martínez, Manuel; Duque-Ramos, Astrid; Iniesta-Moreno, Miguela; Fernández-Breis, Jesualdo Tomás.

Stud Health Technol Inform ; 235: 426-430, 2017.

Article in English | MEDLINE | ID: mdl-28423828

ABSTRACT

The biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task as they evolve rapidly, being new versions regularly published. Therefore, methods to support ontology developers in analysing and tracking the evolution of their ontologies are needed. OQuaRE is an ontology evaluation framework based on quantitative metrics that permits to obtain normalised scores for different ontologies. In this work, OQuaRE has been applied to 408 versions of the eight OBO Foundry member ontologies. The OBO Foundry member ontologies are supposed to have been built by applying the OBO Foundry principles. Our results show that this set of ontologies is actually following principles such as the naming convention, and that the evolution of the OBO Foundry member ontologies is generating ontologies with higher OQuaRE quality scores.

Subject(s)

Biological Ontologies , Software

Supporting the analysis of ontology evolution processes through the combination of static and dynamic scaling functions in OQuaRE.

Duque-Ramos, Astrid; Quesada-Martínez, Manuel; Iniesta-Moreno, Miguela; Fernández-Breis, Jesualdo Tomás; Stevens, Robert.

J Biomed Semantics ; 7(1): 63, 2016 10 17.

Article in English | MEDLINE | ID: mdl-27751176

ABSTRACT

BACKGROUND: The biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task and biomedical ontologies evolve rapidly, so new versions are regularly and frequently published in ontology repositories. This has the implication of there being a high number of ontology versions over a short time span. Given this level of activity, ontology designers need to be supported in the effective management of the evolution of biomedical ontologies as the different changes may affect the engineering and quality of the ontology. This is why there is a need for methods that contribute to the analysis of the effects of changes and evolution of ontologies. RESULTS: In this paper we approach this issue from the ontology quality perspective. In previous work we have developed an ontology evaluation framework based on quantitative metrics, called OQuaRE. Here, OQuaRE is used as a core component in a method that enables the analysis of the different versions of biomedical ontologies using the quality dimensions included in OQuaRE. Moreover, we describe and use two scales for evaluating the changes between the versions of a given ontology. The first one is the static scale used in OQuaRE and the second one is a new, dynamic scale, based on the observed values of the quality metrics of a corpus defined by all the versions of a given ontology (life-cycle). In this work we explain how OQuaRE can be adapted for understanding the evolution of ontologies. Its use has been illustrated with the ontology of bioinformatics operations, types of data, formats, and topics (EDAM). CONCLUSIONS: The two scales included in OQuaRE provide complementary information about the evolution of the ontologies. The application of the static scale, which is the original OQuaRE scale, to the versions of the EDAM ontology reveals a design based on good ontological engineering principles. The application of the dynamic scale has enabled a more detailed analysis of the evolution of the ontology, measured through differences between versions. The statistics of change based on the OQuaRE quality scores make possible to identify key versions where some changes in the engineering of the ontology triggered a change from the OQuaRE quality perspective. In the case of the EDAM, this study let us to identify that the fifth version of the ontology has the largest impact in the quality metrics of the ontology, when comparative analyses between the pairs of consecutive versions are performed.

Subject(s)

Biological Ontologies , Quality Control

Suggesting Missing Relations in Biomedical Ontologies Based on Lexical Regularities.

Quesada-Martínez, Manuel; Fernández-Breis, Jesualdo Tomás; Karlsson, Daniel.

Stud Health Technol Inform ; 228: 384-8, 2016.

Article in English | MEDLINE | ID: mdl-27577409

ABSTRACT

The number of biomedical ontologies has increased significantly in recent years. Many of such ontologies are the result of efforts of communities of domain experts and ontology engineers. The development and application of quality assurance (QA) methods should help these communities to develop useful ontologies for both humans and machines. According to previous studies, biomedical ontologies are rich in natural language content, but most of them are not so rich in axiomatic terms. Here, we are interested in studying the relation between content in natural language and content in axiomatic form. The analysis of the labels of the classes permits to identify lexical regularities (LRs), which are sets of words that are shared by labels of different classes. Our assumption is that the classes exhibiting an LR should be logically related through axioms, which is used to propose an algorithm to detect missing relations in the ontology. Here, we analyse a lexical regularity of SNOMED CT, congenital stenosis, which is reported as problematic by the SNOMED CT maintenance team.

Subject(s)

Biological Ontologies , Natural Language Processing , Aortic Valve Stenosis/congenital , Language , Systematized Nomenclature of Medicine

Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective.

Quesada-Martínez, Manuel; Mikroyannidi, Eleni; Fernández-Breis, Jesualdo Tomás; Stevens, Robert.

Artif Intell Med ; 65(1): 35-48, 2015 Sep.

Article in English | MEDLINE | ID: mdl-25488031

ABSTRACT

OBJECTIVE: The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO). METHODS: In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium. RESULTS: The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value. CONCLUSIONS: We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of biomedical ontologies.

Subject(s)

Gene Ontology/statistics & numerical data , Natural Language Processing , Humans , Vocabulary, Controlled

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL