Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
1.
BMC Med Inform Decis Mak ; 20(Suppl 10): 305, 2020 12 15.
Article in English | MEDLINE | ID: mdl-33319709

ABSTRACT

BACKGROUND: Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored. METHODS: The area taxonomy and the subtaxonomy are deployed to help reveal concepts that have a high likelihood of exhibiting missing relationship errors. A specific top-level grouping unit found within the area taxonomy and subtaxonomy, when deemed to be anomalous, is used as an indicator that missing relationship errors are likely to be found among certain concepts. Two hypotheses pertaining to the effectiveness of our Quality Assurance approach are studied. RESULTS: Our Quality Assurance methodology was applied to the Biological Process hierarchy of the National Cancer Institute thesaurus (NCIt) and SNOMED CT's Eye/vision finding subhierarchy within its Clinical finding hierarchy. Many missing relationship errors were discovered and confirmed in our analysis. For both test-bed hierarchies, our Quality Assurance methodology yielded a statistically significantly higher number of concepts with missing relationship errors in comparison to a control sample of concepts. Two hypotheses are confirmed by these findings. CONCLUSIONS: Quality assurance is a critical part of an ontology's lifecycle, and automated or semi-automated tools for supporting this process are invaluable. We introduced a Quality Assurance methodology targeted at missing relationship errors. Its successful application to the NCIt's Biological Process hierarchy and SNOMED CT's Eye/vision finding subhierarchy indicates that it can be a useful addition to the arsenal of tools available to ontology maintenance personnel.


Subject(s)
Systematized Nomenclature of Medicine , Vocabulary, Controlled , Electronic Data Processing , Humans , Probability
2.
AMIA Annu Symp Proc ; 2018: 750-759, 2018.
Article in English | MEDLINE | ID: mdl-30815117

ABSTRACT

Many major medical ontologies go through a regular (bi-annual, monthly, etc.) release cycle. A new release will contain corrections to the previous release, as well as genuinely new concepts that are the result of either user requests or new developments in the domain. New concepts need to be placed at the correct place in the ontology hierarchy. Traditionally, this is done by an expert modeling a new concept and running a classifier algorithm. We propose an alternative approach that is based on providing only the name of a new concept and using a Convolutional Neural Network-based machine learning method. We first tested this approach within one version of SNOMED CT and achieved an average 88.5% precision and an F1 score of 0.793. In comparing the July 2017 release with the January 2018 release, limiting ourselves to predicting one out of two or more parents, our average F1 score was 0.701.


Subject(s)
Machine Learning , Neural Networks, Computer , Systematized Nomenclature of Medicine , Support Vector Machine
3.
J Healthc Eng ; 2017: 3495723, 2017.
Article in English | MEDLINE | ID: mdl-29158885

ABSTRACT

Ontologies are important components of health information management systems. As such, the quality of their content is of paramount importance. It has been proven to be practical to develop quality assurance (QA) methodologies based on automated identification of sets of concepts expected to have higher likelihood of errors. Four kinds of such sets (called QA-sets) organized around the themes of complex and uncommonly modeled concepts are introduced. A survey of different methodologies based on these QA-sets and the results of applying them to various ontologies are presented. Overall, following these approaches leads to higher QA yields and better utilization of QA personnel. The formulation of additional QA-set methodologies will further enhance the suite of available ontology QA tools.


Subject(s)
Biological Ontologies , Classification , Quality Assurance, Health Care , Humans
4.
Methods Inf Med ; 56(3): 200-208, 2017 May 18.
Article in English | MEDLINE | ID: mdl-28244549

ABSTRACT

OBJECTIVES: Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts. METHODS: A measure of lateral complexity defined as the number of exhibited role types is used to distinguish between more complex and simpler concepts. Using a framework called an area taxonomy, a kind of abstraction network that summarizes the structural organization of an ontology, concepts are divided into two groups along these lines. Various concepts from each group are then subjected to a two-phase QA analysis to uncover and verify errors and inconsistencies in their modeling. A hierarchy of the National Cancer Institute thesaurus (NCIt) is used as our test-bed. A hypothesis pertaining to the expected error rates of the complex and simple concepts is tested. RESULTS: Our study was done on the NCIt's Biological Process hierarchy. Various errors, including missing roles, incorrect role targets, and incorrectly assigned roles, were discovered and verified in the two phases of our QA analysis. The overall findings confirmed our hypothesis by showing a statistically significant difference between the amounts of errors exhibited by more laterally complex concepts vis-à-vis simpler concepts. CONCLUSIONS: QA is an essential part of any ontology's maintenance regimen. In this paper, we reported on the results of a QA study targeting two groups of ontology concepts distinguished by their level of complexity, defined in terms of the number of exhibited role types. The study was carried out on a major component of an important ontology, the NCIt. The findings suggest that more complex concepts tend to have a higher error rate than simpler concepts. These findings can be utilized to guide ongoing efforts in ontology QA.


Subject(s)
Biological Ontologies , Comprehension , Meaningful Use/standards , Models, Statistical , National Cancer Institute (U.S.)/standards , Neoplasms/classification , Computer Simulation , Humans , Natural Language Processing , Quality Assurance, Health Care/standards , United States , Vocabulary, Controlled
5.
Stud Health Technol Inform ; 245: 978-982, 2017.
Article in English | MEDLINE | ID: mdl-29295246

ABSTRACT

Maintenance and use of a large ontology, consisting of thousands of knowledge assertions, are hampered by its scope and complexity. It is important to provide tools for summarization of ontology content in order to facilitate user "big picture" comprehension. We present a parameterized methodology for the semi-automatic summarization of major topics in an ontology, based on a compact summary of the ontology, called an "aggregate partial-area taxonomy", followed by manual enhancement. An experiment is presented to test the effectiveness of such summarization measured by coverage of a given list of major topics of the corresponding application domain. SNOMED CT's Specimen hierarchy is the test-bed. A domain-expert provided a list of topics that serves as a gold standard. The enhanced results show that the aggregate taxonomy covers most of the domain's main topics.


Subject(s)
Biological Ontologies , Systematized Nomenclature of Medicine , Automation , Humans , Knowledge Bases
6.
Ann N Y Acad Sci ; 1387(1): 12-24, 2017 01.
Article in English | MEDLINE | ID: mdl-27750400

ABSTRACT

The purpose of the Big Data to Knowledge initiative is to develop methods for discovering new knowledge from large amounts of data. However, if the resulting knowledge is so large that it resists comprehension, referred to here as Big Knowledge (BK), how can it be used properly and creatively? We call this secondary challenge, Big Knowledge to Use. Without a high-level mental representation of the kinds of knowledge in a BK knowledgebase, effective or innovative use of the knowledge may be limited. We describe summarization and visualization techniques that capture the big picture of a BK knowledgebase, possibly created from Big Data. In this research, we distinguish between assertion BK and rule-based BK (rule BK) and demonstrate the usefulness of summarization and visualization techniques of assertion BK for clinical phenotyping. As an example, we illustrate how a summary of many intracranial bleeding concepts can improve phenotyping, compared to the traditional approach. We also demonstrate the usefulness of summarization and visualization techniques of rule BK for drug-drug interaction discovery.


Subject(s)
Computational Biology/methods , Drug Interactions , Image Interpretation, Computer-Assisted , Intracranial Hemorrhages/classification , Knowledge Bases , Models, Neurological , Translational Research, Biomedical/methods , Animals , Computational Biology/trends , Data Mining/methods , Data Mining/trends , Decision Making, Computer-Assisted , Humans , Image Processing, Computer-Assisted , Intracranial Hemorrhages/epidemiology , Intracranial Hemorrhages/etiology , Intracranial Hemorrhages/physiopathology , Pharmaceutical Preparations/classification , Systematized Nomenclature of Medicine , Terminology as Topic , Translational Research, Biomedical/trends
7.
J Bioinform Comput Biol ; 14(3): 1642001, 2016 06.
Article in English | MEDLINE | ID: mdl-27301779

ABSTRACT

The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology. They have been shown to successfully support QA work in the context of various ontologies. Two kinds of AbNs, called the area taxonomy and the partial-area taxonomy, are developed for GO hierarchies and derived specifically for the biological process (BP) hierarchy. Within this framework, several QA heuristics, based on the identification of groups of anomalous terms which exhibit certain taxonomy-defined characteristics, are introduced. Such groups are expected to have higher error rates when compared to other terms. Thus, by focusing QA efforts on anomalous terms one would expect to find relatively more erroneous content. By automatically identifying these potential problem areas within an ontology, time and effort will be saved during manual reviews of GO's content. BP is used as a testbed, with samples of three kinds of anomalous BP terms chosen for a taxonomy-based QA review. Additional heuristics for QA are demonstrated. From the results of this QA effort, it is observed that different kinds of inconsistencies in the modeling of GO can be exposed with the use of the proposed heuristics. For comparison, the results of QA work on a sample of terms chosen from GO's general population are presented.


Subject(s)
Computational Biology/methods , Gene Ontology , Quality Control
8.
J Biomed Inform ; 57: 278-87, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26260003

ABSTRACT

The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is an extensive reference terminology with an attendant amount of complexity. It has been updated continuously and revisions have been released semi-annually to meet users' needs and to reflect the results of quality assurance (QA) activities. Two measures based on structural features are proposed to track the effects of both natural terminology growth and QA activities based on aspects of the complexity of SNOMED CT. These two measures, called the structural density measure and accumulated structural measure, are derived based on two abstraction networks, the area taxonomy and the partial-area taxonomy. The measures derive from attribute relationship distributions and various concept groupings that are associated with the abstraction networks. They are used to track the trends in the complexity of structures as SNOMED CT changes over time. The measures were calculated for consecutive releases of five SNOMED CT hierarchies, including the Specimen hierarchy. The structural density measure shows that natural growth tends to move a hierarchy's structure toward a more complex state, whereas the accumulated structural measure shows that QA processes tend to move a hierarchy's structure toward a less complex state. It is also observed that both the structural density and accumulated structural measures are useful tools to track the evolution of an entire SNOMED CT hierarchy and reveal internal concept migration within it.


Subject(s)
Data Accuracy , Systematized Nomenclature of Medicine
9.
Artif Intell Med ; 64(1): 1-16, 2015 May.
Article in English | MEDLINE | ID: mdl-25890687

ABSTRACT

OBJECTIVE: Terminologies and terminological systems have assumed important roles in many medical information processing environments, giving rise to the "big knowledge" challenge when terminological content comprises tens of thousands to millions of concepts arranged in a tangled web of relationships. Use and maintenance of knowledge structures on that scale can be daunting. The notion of abstraction network is presented as a means of facilitating the usability, comprehensibility, visualization, and quality assurance of terminologies. METHODS AND MATERIALS: An abstraction network overlays a terminology's underlying network structure at a higher level of abstraction. In particular, it provides a more compact view of the terminology's content, avoiding the display of minutiae. General abstraction network characteristics are discussed. Moreover, the notion of meta-abstraction network, existing at an even higher level of abstraction than a typical abstraction network, is described for cases where even the abstraction network itself represents a case of "big knowledge." Various features in the design of abstraction networks are demonstrated in a methodological survey of some existing abstraction networks previously developed and deployed for a variety of terminologies. RESULTS: The applicability of the general abstraction-network framework is shown through use-cases of various terminologies, including the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), the Medical Entities Dictionary (MED), and the Unified Medical Language System (UMLS). Important characteristics of the surveyed abstraction networks are provided, e.g., the magnitude of the respective size reduction referred to as the abstraction ratio. Specific benefits of these alternative terminology-network views, particularly their use in terminology quality assurance, are discussed. Examples of meta-abstraction networks are presented. CONCLUSIONS: The "big knowledge" challenge constitutes the use and maintenance of terminological structures that comprise tens of thousands to millions of concepts and their attendant complexity. The notion of abstraction network has been introduced as a tool in helping to overcome this challenge, thus enhancing the usefulness of terminologies. Abstraction networks have been shown to be applicable to a variety of existing biomedical terminologies, and these alternative structural views hold promise for future expanded use with additional terminologies.


Subject(s)
Health Information Management/organization & administration , Medical Informatics/organization & administration , Neural Networks, Computer , Vocabulary, Controlled
10.
Artif Intell Med ; 58(2): 73-80, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23602702

ABSTRACT

OBJECTIVE: By 2015, SNOMED CT (SCT) will become the USA's standard for encoding diagnoses and problem lists in electronic health records (EHRs). To facilitate this effort, the National Library of Medicine has published the "SCT Clinical Observations Recording and Encoding" and the "Veterans Health Administration and Kaiser Permanente" problem lists (collectively, the "PL"). The PL is studied in regard to its readiness to support meaningful use of EHRs. In particular, we wish to determine if inconsistencies appearing in SCT, in general, occur as frequently in the PL, and whether further quality-assurance (QA) efforts on the PL are required. METHODS AND MATERIALS: A study is conducted where two random samples of SCT concepts are compared. The first consists of concepts strictly from the PL and the second contains general SCT concepts distributed proportionally to the PL's in terms of their hierarchies. Each sample is analyzed for its percentage of primitive concepts and for frequency of modeling errors of various severity levels as quality measures. A simple structural indicator, namely, the number of parents, is suggested to locate high likelihood inconsistencies in hierarchical relationships. The effectiveness of this indicator is evaluated. RESULTS: PL concepts are found to be slightly better than other concepts in the respective SCT hierarchies with regards to the quality measure of the percentage of primitive concepts and the frequency of modeling errors. There were 58% primitive concepts in the PL sample versus 62% in the control sample. The structural indicator of number of parents is shown to be statistically significant in its ability to identify concepts having a higher likelihood of inconsistencies in their hierarchical relationships. The absolute number of errors in the group of concepts having 1-3 parents was shown to be significantly lower than that for concepts with 4-6 parents and those with 7 or more parents based on Chi-squared analyses. CONCLUSION: PL concepts suffer from the same issues as general SCT concepts, although to a slightly lesser extent, and do require further QA efforts to promote meaningful use of EHRs. To support such efforts, a structural indicator is shown to effectively ferret out potentially problematic concepts where those QA efforts should be focused.


Subject(s)
Artificial Intelligence , Data Mining/methods , Electronic Health Records , Meaningful Use , Medical Records, Problem-Oriented , Quality Assurance, Health Care , Systematized Nomenclature of Medicine , Unified Medical Language System , Artificial Intelligence/standards , Data Mining/standards , Electronic Health Records/standards , Humans , Meaningful Use/standards , Medical Records, Problem-Oriented/standards , National Library of Medicine (U.S.) , Quality Assurance, Health Care/standards , Terminology as Topic , Unified Medical Language System/standards , United States
11.
AMIA Annu Symp Proc ; 2013: 581-90, 2013.
Article in English | MEDLINE | ID: mdl-24551360

ABSTRACT

BioPortal contains over 300 ontologies, for which quality assurance (QA) is critical. Abstraction networks (ANs), compact summarizations of ontology structure and content, have been used in such QA efforts, typically in a "one-off" manner for a single ontology. Ontologies can be characterized-independently of knowledge-content focus-from a structural standpoint leading to the formulation of ontology families. A family is defined as a set of ontologies satisfying some overarching condition regarding their structural features. Seven such families, comprising 186 ontologies, are identified. To increase efficiency, a new family-based QA framework is introduced in which an automated, uniform AN derivation technique and accompanying semi-automated, uniform QA regimen are applicable to the ontologies of a given family. Specifically, across an entire family, the QA efforts exploit family-wide AN features in the characterization of sets of classes that are more likely to harbor errors. The approach is demonstrated on the Cancer Chemoprevention BioPortal ontology.


Subject(s)
Biological Ontologies , Quality Assurance, Health Care , Abstracting and Indexing , Antineoplastic Agents/therapeutic use , Humans , Neoplasms/prevention & control , Programming Languages
12.
AMIA Annu Symp Proc ; 2013: 1071-80, 2013.
Article in English | MEDLINE | ID: mdl-24551393

ABSTRACT

Abstraction networks are compact summarizations of terminologies used to support orientation and terminology quality assurance (TQA). Area taxonomies and partial-area taxonomies are abstraction networks that have been successfully employed in support of TQA of small SNOMED CT hierarchies. However, nearly half of SNOMED CT's concepts are in the large Procedure and Clinical Finding hierarchies. Abstraction network derivation methodologies applied to those hierarchies resulted in taxonomies that were too large to effectively support TQA. A methodology for deriving sub-taxonomies from large taxonomies is presented, and the resultant smaller abstraction networks are shown to facilitate TQA, allowing for the scaling of our taxonomy-based TQA regimen to large hierarchies. Specifically, sub-taxonomies are derived for the Procedure hierarchy and a review for errors and inconsistencies is performed. Concepts are divided into groups within the sub-taxonomy framework, and it is shown that small groups are statistically more likely to harbor erroneous and inconsistent concepts than large groups.


Subject(s)
Systematized Nomenclature of Medicine , Artificial Intelligence , Methods , Quality Control , Terminology as Topic
13.
J Cheminform ; 4(1): 9, 2012 May 11.
Article in English | MEDLINE | ID: mdl-22577759

ABSTRACT

BACKGROUND: Terms representing chemical concepts found the Unified Medical Language System (UMLS) are used to derive an expanded semantic network with mutually exclusive semantic types. The UMLS Semantic Network (SN) is composed of a collection of broad categories called semantic types (STs) that are assigned to concepts. Within the UMLS's coverage of the chemical domain, we find a great deal of concepts being assigned more than one ST. This leads to the situation where the extent of a given ST may contain concepts elaborating variegated semantics.A methodology for expanding the chemical subhierarchy of the SN into a finer-grained categorization of mutually exclusive types with semantically uniform extents is presented. We call this network a Chemical Specialty Semantic Network (CSSN). A CSSN is derived automatically from the existing chemical STs and their assignments. The methodology incorporates a threshold value governing the minimum size of a type's extent needed for inclusion in the CSSN. Thus, different CSSNs can be created by choosing different threshold values based on varying requirements. RESULTS: A complete CSSN is derived using a threshold value of 300 and having 68 STs. It is used effectively to provide high-level categorizations for a random sample of compounds from the "Chemical Entities of Biological Interest" (ChEBI) ontology. The effect on the size of the CSSN using various threshold parameter values between one and 500 is shown. CONCLUSIONS: The methodology has several potential applications, including its use to derive a pre-coordinated guide for ST assignments to new UMLS chemical concepts, as a tool for auditing existing concepts, inter-terminology mapping, and to serve as an upper-level network for ChEBI.

14.
J Biomed Inform ; 45(1): 1-14, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21907827

ABSTRACT

Auditors of a large terminology, such as SNOMED CT, face a daunting challenge. To aid them in their efforts, it is essential to devise techniques that can automatically identify concepts warranting special attention. "Complex" concepts, which by their very nature are more difficult to model, fall neatly into this category. A special kind of grouping, called a partial-area, is utilized in the characterization of complex concepts. In particular, the complex concepts that are the focus of this work are those appearing in intersections of multiple partial-areas and are thus referred to as overlapping concepts. In a companion paper, an automatic methodology for identifying and partitioning the entire collection of overlapping concepts into disjoint, singly-rooted groups, that are more manageable to work with and comprehend, has been presented. The partitioning methodology formed the foundation for the development of an abstraction network for the overlapping concepts called a disjoint partial-area taxonomy. This new disjoint partial-area taxonomy offers a collection of semantically uniform partial-areas and is exploited herein as the basis for a novel auditing methodology. The review of the overlapping concepts is done in a top-down order within semantically uniform groups. These groups are themselves reviewed in a top-down order, which proceeds from the less complex to the more complex overlapping concepts. The results of applying the methodology to SNOMED's Specimen hierarchy are presented. Hypotheses regarding error ratios for overlapping concepts and between different kinds of overlapping concepts are formulated. Two phases of auditing the Specimen hierarchy for two releases of SNOMED are reported on. With the use of the double bootstrap and Fisher's exact test (two-tailed), the auditing of concepts and especially roots of overlapping partial-areas is shown to yield a statistically significant higher proportion of errors.


Subject(s)
Systematized Nomenclature of Medicine , Models, Theoretical , Terminology as Topic
15.
AMIA Annu Symp Proc ; 2012: 681-9, 2012.
Article in English | MEDLINE | ID: mdl-23304341

ABSTRACT

An abstraction network is an auxiliary network of nodes and links that provides a compact, high-level view of an ontology. Such a view lends support to ontology orientation, comprehension, and quality-assurance efforts. A methodology is presented for deriving a kind of abstraction network, called a partial-area taxonomy, for the Ontology of Clinical Research (OCRe). OCRe was selected as a representative of ontologies implemented using the Web Ontology Language (OWL) based on shared domains. The derivation of the partial-area taxonomy for the Entity hierarchy of OCRe is described. Utilizing the visualization of the content and structure of the hierarchy provided by the taxonomy, the Entity hierarchy is audited, and several errors and inconsistencies in OCRe's modeling of its domain are exposed. After appropriate corrections are made to OCRe, a new partial-area taxonomy is derived. The generalizability of the paradigm of the derivation methodology to various families of biomedical ontologies is discussed.


Subject(s)
Biomedical Research/classification , Vocabulary, Controlled , Medical Informatics
16.
J Biomed Inform ; 45(1): 15-29, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21878396

ABSTRACT

An algorithmically-derived abstraction network, called the partial-area taxonomy, for a SNOMED hierarchy has led to the identification of concepts considered complex. The designation "complex" is arrived at automatically on the basis of structural analyses of overlap among the constituent concept groups of the partial-area taxonomy. Such complex concepts, called overlapping concepts, constitute a tangled portion of a hierarchy and can be obstacles to users trying to gain an understanding of the hierarchy's content. A new methodology for partitioning the entire collection of overlapping concepts into singly-rooted groups, that are more manageable to work with and comprehend, is presented. Different kinds of overlapping concepts with varying degrees of complexity are identified. This leads to an abstract model of the overlapping concepts called the disjoint partial-area taxonomy, which serves as a vehicle for enhanced, high-level display. The methodology is demonstrated with an application to SNOMED's Specimen hierarchy. Overall, the resulting disjoint partial-area taxonomy offers a refined view of the hierarchy's structural organization and conceptual content that can aid users, such as maintenance personnel, working with SNOMED. The utility of the disjoint partial-area taxonomy as the basis for a SNOMED auditing regimen is presented in a companion paper.


Subject(s)
Algorithms , Systematized Nomenclature of Medicine , Humans , Models, Theoretical , Pattern Recognition, Automated/methods , Terminology as Topic
17.
MIXHS 12 (2012) ; 2012: 1-6, 2012.
Article in English | MEDLINE | ID: mdl-26870837

ABSTRACT

As SNOMED usage becomes more ingrained within applications, its range of concept descriptors, and particularly its synonym adequacy, becomes more important. A simulated clinical scenario involving various term-based concept searches is used to assess whether SNOMED's concept descriptors provide sufficient differentiation to enable possible concept selection between similar terms. Four random samples from different SNOMED concept populations are utilized. Of particular interest are concepts mapped duplicately into UMLS concepts due to shared term patterns. While overall synonym problems are rare (1%), some concept populations exhibited a high rate of potential problems for clinical use (17-62%). The vast majority of issues are due to SNOMED's inherent structure and fine granularity. Many findings hint at a lack of clear delineation between reference and interface terminological qualities. Closer attention should be given to practical clinical use-case scenarios. Reducing SNOMED's structural complexity may alleviate many of the described findings and encourage clinical adoption.

18.
AMIA Annu Symp Proc ; 2011: 529-36, 2011.
Article in English | MEDLINE | ID: mdl-22195107

ABSTRACT

A cycle in the parent relationship hierarchy of the UMLS is a configuration that effectively makes some concept(s) an ancestor of itself. Such a structural inconsistency can easily be found automatically. A previous strategy for disconnecting cycles is to break them with the deletion of one or more parent relationships-irrespective of the correctness of the deleted relationships. A methodology is introduced for auditing of cycles that seeks to discover and delete erroneous relationships only. Cycles involving three concepts are the primary consideration. Hypotheses about the high probability of locating an erroneous parent relationship in a cycle are proposed and confirmed with statistical confidence and lend credence to the auditing approach. A cycle may serve as an indicator of other non-structural inconsistencies that are otherwise difficult to detect automatically. An extensive auditing example shows how a cycle can indicate further inconsistencies.


Subject(s)
Unified Medical Language System/organization & administration , Systematized Nomenclature of Medicine , Vocabulary, Controlled
19.
Artif Intell Med ; 52(3): 141-51, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21646001

ABSTRACT

OBJECTIVE: The Unified Medical Language System (UMLS) integrates terms from different sources into concepts and supplements these with the assignment of one or more high-level semantic types (STs) from its Semantic Network (SN). For a composite organic chemical concept, multiple assignments of organic chemical STs often serve to enumerate the types of the composite's underlying chemical constituents. This practice sometimes leads to the introduction of a forbidden redundant ST assignment, where both an ST and one of its descendants are assigned to the same concept. A methodology for resolving redundant ST assignments for organic chemicals, better capturing the essence of such composite chemicals than the typical omission of the more general ST, is presented. MATERIALS AND METHODS: The typical SN resolution of a redundant ST assignment is to retain only the more specific ST assignment and omit the more general one. However, with organic chemicals, that is not always the correct strategy. A methodology for properly dealing with the redundancy based on the relative sizes of the chemical components is presented. It is more accurate to use the ST of the larger chemical component for capturing the category of the concept, even if that means using the more general ST. RESULTS: A sample of 254 chemical concepts having redundant ST assignments in older UMLS releases was audited to analyze the accuracy of current ST assignments. For 81 (32%) of them, our chemical analysis-based approach yielded a different recommendation from the UMLS (2009AA). New UMLS usage notes capturing rules of this methodology are proffered. CONCLUSIONS: Redundant ST assignments have typically arisen for organic composite chemical concepts. A methodology for dealing with this kind of erroneous configuration, capturing the proper category for a composite chemical, is presented and demonstrated.


Subject(s)
Chemistry, Organic , Unified Medical Language System
20.
AMIA Annu Symp Proc ; 2010: 212-6, 2010 Nov 13.
Article in English | MEDLINE | ID: mdl-21346971

ABSTRACT

Concepts whose terms are of a similar word structure are expected to have similar logical representations. Anecdotal examples from SNOMED CT indicate that this may not always be the case. An investigation into the extent of inconsistent modeling in SNOMED CT hierarchies is carried out. A lexical methodology is used to identify sets of similar concepts. It is applied to one of the most attribute-rich hierarchies, Procedure, from which a random sample of 60 sets is derived. These sets are examined in regard to hierarchical, definitional, attribute, attribute/value, and role-group aspects. Thirty percent of the sample sets were found to have at least one type of modeling inconsistency. Their presence may interfere with the performance of terminology-driven applications. With the use of SNOMED expanding, such inconsistencies may eventually affect clinical care. Due to this, external auditing should be encouraged to identify such issues and complement IHTSDO's efforts.


Subject(s)
Systematized Nomenclature of Medicine , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...