Search | VHL Regional Portal

Representing the NCI Thesaurus in OWL DL: Modeling tools help modeling languages.

Noy, Natalya F; de Coronado, Sherri; Solbrig, Harold; Fragoso, Gilberto; Hartel, Frank W; Musen, Mark A.

Appl Ontol ; 3(3): 173-190, 2008 Jan 01.

Article in English | MEDLINE | ID: mdl-19789731

ABSTRACT

The National Cancer Institute's (NCI) Thesaurus is a biomedical reference ontology. The NCI Thesaurus is represented using Description Logic, more specifically Ontylog, a Description logic implemented by Apelon, Inc. We are exploring the use of the DL species of the Web Ontology Language (OWL DL)-a W3C recommended standard for ontology representation-instead of Ontylog for representing the NCI Thesaurus. We have studied the requirements for knowledge representation of the NCI Thesaurus, and considered how OWL DL (and its implementation in Protégé-OWL) satisfies these requirements. In this paper, we discuss the areas where OWL DL was sufficient for representing required components, where tool support that would hide some of the complexity and extra levels of indirection would be required, and where language expressiveness is not sufficient given the representation requirements. Because many of the knowledge-representation issues that we encountered are very similar to the issues in representing other biomedical terminologies and ontologies in general, we believe that the lessons that we learned and the approaches that we developed will prove useful and informative for other researchers.

NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information.

Sioutos, Nicholas; de Coronado, Sherri; Haber, Margaret W; Hartel, Frank W; Shaiu, Wen-Ling; Wright, Lawrence W.

J Biomed Inform ; 40(1): 30-43, 2007 Feb.

Article in English | MEDLINE | ID: mdl-16697710

ABSTRACT

Over the last 8 years, the National Cancer Institute (NCI) has launched a major effort to integrate molecular and clinical cancer-related information within a unified biomedical informatics framework, with controlled terminology as its foundational layer. The NCI Thesaurus is the reference terminology underpinning these efforts. It is designed to meet the growing need for accurate, comprehensive, and shared terminology, covering topics including: cancers, findings, drugs, therapies, anatomy, genes, pathways, cellular and subcellular processes, proteins, and experimental organisms. The NCI Thesaurus provides a partial model of how these things relate to each other, responding to actual user needs and implemented in a deductive logic framework that can help maintain the integrity and extend the informational power of what is provided. This paper presents the semantic model for cancer diseases and its uses in integrating clinical and molecular knowledge, more briefly examines the models and uses for drug, biochemical pathway, and mouse terminology, and discusses limits of the current approach and directions for future work.

Subject(s)

Biomedical Research/methods , Database Management Systems , Databases, Factual , Information Storage and Retrieval/methods , Neoplasms/classification , Neoplasms/physiopathology , Vocabulary, Controlled , Computational Biology/methods , Humans , National Institutes of Health (U.S.) , Neoplasm Proteins/metabolism , Semantics , Systems Integration , United States , User-Computer Interface

Modeling a description logic vocabulary for cancer research.

Hartel, Frank W; de Coronado, Sherri; Dionne, Robert; Fragoso, Gilberto; Golbeck, Jennifer.

J Biomed Inform ; 38(2): 114-29, 2005 Apr.

Article in English | MEDLINE | ID: mdl-15797001

ABSTRACT

The National Cancer Institute has developed the NCI Thesaurus, a biomedical vocabulary for cancer research, covering terminology across a wide range of cancer research domains. A major design goal of the NCI Thesaurus is to facilitate translational research. We describe: the features of Ontylog, a description logic used to build NCI Thesaurus; our methodology for enhancing the terminology through collaboration between ontologists and domain experts, and for addressing certain real world challenges arising in modeling the Thesaurus; and finally, we describe the conversion of NCI Thesaurus from Ontylog into Web Ontology Language Lite. Ontylog has proven well suited for constructing big biomedical vocabularies. We have capitalized on the Ontylog constructs Kind and Role in the collaboration process described in this paper to facilitate communication between ontologists and domain experts. The artifacts and processes developed by NCI for collaboration may be useful in other biomedical terminology development efforts.

Subject(s)

Databases, Factual , Dictionaries as Topic , Information Storage and Retrieval/methods , Medical Oncology/methods , Neoplasms/classification , Research Design , Terminology as Topic , Vocabulary, Controlled , Animals , Database Management Systems , Humans , Models, Theoretical , Natural Language Processing

Overview and utilization of the NCI thesaurus.

Fragoso, Gilberto; de Coronado, Sherri; Haber, Margaret; Hartel, Frank; Wright, Larry.

Comp Funct Genomics ; 5(8): 648-54, 2004.

Article in English | MEDLINE | ID: mdl-18629178

ABSTRACT

The NCI Thesaurus is a reference terminology covering areas of basic and clinical science, built with the goal of facilitating translational research in cancer. It contains nearly 110 000 terms in approximately 36000 concepts, partitioned in 20 subdomains, which include diseases, drugs, anatomy, genes, gene products, techniques, and biological processes, among others, all with a cancer-centric focus in content, and originally designed to support coding activities across the National Cancer Institute. Each concept represents a unit of meaning and contains a number of annotations, such as synonyms and preferred name, as well as annotations such as textual definitions and optional references to external authorities. In addition, concepts are modelled with description logic (DL) and defined by their relationships to other concepts; there are currently approximately 90 types of named relations declared in the terminology. The NCI Thesaurus is produced by the Enterprise Vocabulary Services project, a collaborative effort between the NCI Center for Bioinformatics and the NCI Office of Communications, and is part of the caCORE infrastructure stack (http://ncicb.nci.nih.gov/NCICB/core). It can be accessed programmatically through the open caBIO API and browsed via the web (http://nciterms.nci.nih.gov). A history of editing changes is also accessible through the API. In addition, the Thesaurus is available for download in various file formats, including OWL, the web ontology language, to facilitate its utilization by others.

caCORE: a common infrastructure for cancer informatics.

Covitz, Peter A; Hartel, Frank; Schaefer, Carl; De Coronado, Sherri; Fragoso, Gilberto; Sahni, Himanso; Gustafson, Scott; Buetow, Kenneth H.

Bioinformatics ; 19(18): 2404-12, 2003 Dec 12.

Article in English | MEDLINE | ID: mdl-14668224

ABSTRACT

MOTIVATION: Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data management and integration that supports advanced biomedical applications. RESULTS: We have developed an interconnected set of software and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repository (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the biomedical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. AVAILABILITY: caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted academic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads. SUPPLEMENTARY INFORMATION: http://ncicb.nci.nih.gov/core/publications contains links to the caBIO 1.0 class diagram and the caCORE 1.0 Technical Guide, which provide detailed information on the present caCORE architecture, data sources and APIs. Updated information appears on a regular basis on the caCORE web site (http://ncicb.nci.nih.gov/core).

Subject(s)

Databases, Factual/standards , Information Storage and Retrieval/methods , Information Storage and Retrieval/standards , Natural Language Processing , Neoplasms/classification , User-Computer Interface , Animals , Computational Biology/methods , Computational Biology/standards , Dictionaries, Medical as Topic , Humans , Internet , National Institutes of Health (U.S.) , United States , Vocabulary, Controlled

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL