RESUMO
In this review, we provide a summary of recent progress in ontology mapping (OM) at a crucial time when biomedical research is under a deluge of an increasing amount and variety of data. This is particularly important for realising the full potential of semantically enabled or enriched applications and for meaningful insights, such as drug discovery, using machine-learning technologies. We discuss challenges and solutions for better ontology mappings, as well as how to select ontologies before their application. In addition, we describe tools and algorithms for ontology mapping, including evaluation of tool capability and quality of mappings. Finally, we outline the requirements for an ontology mapping service (OMS) and the progress being made towards implementation of such sustainable services.
Assuntos
Ontologias Biológicas , Descoberta de Drogas/métodos , Aprendizado de Máquina , Semântica , Algoritmos , HumanosRESUMO
Research in the life sciences requires ready access to primary data, derived information and relevant knowledge from a multitude of sources. Integration and interoperability of such resources are crucial for sharing content across research domains relevant to the life sciences. In this article we present a perspective review of data integration with emphasis on a semantics driven approach to data integration that pushes content into a shared infrastructure, reduces data redundancy and clarifies any inconsistencies. This enables much improved access to life science data from numerous primary sources. The Semantic Enrichment of the Scientific Literature (SESL) pilot project demonstrates feasibility for using already available open semantic web standards and technologies to integrate public and proprietary data resources, which span structured and unstructured content. This has been accomplished through a precompetitive consortium, which provides a cost effective approach for numerous stakeholders to work together to solve common problems.
Assuntos
Coleta de Dados , Disseminação de Informação , Armazenamento e Recuperação da Informação , Integração de Sistemas , Disciplinas das Ciências Biológicas , Humanos , InternetRESUMO
The life science industries (including pharmaceuticals, agrochemicals and consumer goods) are exploring new business models for research and development that focus on external partnerships. In parallel, there is a desire to make better use of data obtained from sources such as human clinical samples to inform and support early research programmes. Success in both areas depends upon the successful integration of heterogeneous data from multiple providers and scientific domains, something that is already a major challenge within the industry. This issue is exacerbated by the absence of agreed standards that unambiguously identify the entities, processes and observations within experimental results. In this article we highlight the risks to future productivity that are associated with incomplete biological and chemical vocabularies and suggest a new model to address this long-standing issue.