Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
J Biomed Inform ; 116: 103716, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33647519

RESUMO

Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems' outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.


Assuntos
Descoberta do Conhecimento , Telemedicina , Algoritmos , Humanos , Idioma , Aprendizado de Máquina , Processamento de Linguagem Natural
2.
Subj. procesos cogn ; 14(2): 113-126, dic. 2010. tab
Artigo em Espanhol | BINACIS | ID: bin-125394

RESUMO

Este artículo presenta un estudio preliminar de los fenómenos presentes en la Web 2.0, concretamente en blogs y cómo se reflejan en los correspondientes resúmenes generados. El principal objetivo es cuantificar en qué medida dichos fenómenos están presentes tanto en los blogs como en los resúmenes. La presencia de estos fenómenos en los resúmenes tiene como consecuencia directa la disminución de la calidad de estos, en criterios como la corrección gramatical o la coherencia de los resúmenes. Los resultados preliminares obtenidos muestran que los nuevos géneros textuales derivados de la Web 2.0 contienen un alto número de rasgos ling³ísticos típicos que es necesario tratar con métodos y herramientas adecuadas para que dichos rasgos no se propaguen a otras tareas del Procesamiento del Lenguaje Natural, en concreto, en este estudio, a los resúmenes de textos. Además, se proponen posibles soluciones para abordar el problema, con la finalidad de ayudar a que la calidad de los resúmenes no se vea afectada debido a la presencia de estos fenómenos.(AU)


This article presents a preliminary study of the phenomena present in Web 2.0,specifically in blogs and how they are reflected in the corresponding generatedsummaries. The main objective is to provide a measure of the occurrence of these phenomena in both blogs and summaries. The presence of these phenomena in the summaries has as a direct consequence in their diminishing quality in terms of grammar accuracy or coherence. Preliminary results obtained show that the new text genres derived from Web 2.0 contain a great quantity of linguistic typical traits which need to be tackled with appropriate tools for these traits not to propagate to other tasks of Natural Language Processing, in particular, in this study, to the textsummaries. In addition, possible solutions to address the problem are proposed, in order that the quality of the summaries might remain unaffected by the occurrence of these phenomena.(AU)


Assuntos
Psicologia , Ciência da Informação , Blogging , Internet , Resumos , Processamento de Linguagem Natural
3.
Subj. procesos cogn ; 14(2): 113-126, dic. 2010. tab
Artigo em Espanhol | LILACS | ID: lil-576378

RESUMO

Este artículo presenta un estudio preliminar de los fenómenos presentes en la Web 2.0, concretamente en blogs y cómo se reflejan en los correspondientes resúmenes generados. El principal objetivo es cuantificar en qué medida dichos fenómenos están presentes tanto en los blogs como en los resúmenes. La presencia de estos fenómenos en los resúmenes tiene como consecuencia directa la disminución de la calidad de estos, en criterios como la corrección gramatical o la coherencia de los resúmenes. Los resultados preliminares obtenidos muestran que los nuevos géneros textuales derivados de la Web 2.0 contienen un alto número de rasgos lingüísticos típicos que es necesario tratar con métodos y herramientas adecuadas para que dichos rasgos no se propaguen a otras tareas del Procesamiento del Lenguaje Natural, en concreto, en este estudio, a los resúmenes de textos. Además, se proponen posibles soluciones para abordar el problema, con la finalidad de ayudar a que la calidad de los resúmenes no se vea afectada debido a la presencia de estos fenómenos.


This article presents a preliminary study of the phenomena present in Web 2.0,specifically in blogs and how they are reflected in the corresponding generatedsummaries. The main objective is to provide a measure of the occurrence of these phenomena in both blogs and summaries. The presence of these phenomena in the summaries has as a direct consequence in their diminishing quality in terms of grammar accuracy or coherence. Preliminary results obtained show that the new text genres derived from Web 2.0 contain a great quantity of linguistic typical traits which need to be tackled with appropriate tools for these traits not to propagate to other tasks of Natural Language Processing, in particular, in this study, to the textsummaries. In addition, possible solutions to address the problem are proposed, in order that the quality of the summaries might remain unaffected by the occurrence of these phenomena.


Assuntos
Blogging , Ciência da Informação , Internet , Processamento de Linguagem Natural , Psicologia , Resumos
4.
Gac Sanit ; 22(5): 421-33, 2008.
Artigo em Espanhol | MEDLINE | ID: mdl-19000523

RESUMO

OBJECTIVES: Ontologies are a resource that allow the concept of meaning to be represented informatically, thus avoiding the limitations imposed by standardized terms. The objective of this study was to establish the extent to which terminologies could be used for the design of ontologies, which could be serve as an aid to resolve problems such as semantic interoperability and knowledge reusability in healthcare information systems. METHODS: To determine the extent to which terminologies could be used as ontologies, six of the most important terminologies in clinical, epidemiologic, documentation and administrative-economic contexts were analyzed. The following characteristics were verified: conceptual coverage, hierarchical structure, conceptual granularity of the categories, conceptual relations, and the language used for conceptual representation. RESULTS: MeSH, DeCS and UMLS ontologies were considered lightweight. The main differences among these ontologies concern conceptual specification, the types of relation and the restrictions among the associated concepts. SNOMED and GALEN ontologies have declaratory formalism, based on logical descriptions. These ontologies include explicit qualities and show greater restrictions among associated concepts and rule combinations and were consequently considered as heavyweight. CONCLUSIONS: Analysis of the declared representation of the terminologies shows the extent to which they could be reused as ontologies. Their degree of usability depends on whether the aim is for healthcare information systems to solve problems of semantic interoperability (lightweight ontologies) or to reuse the systems' knowledge as an aid to decision making (heavyweight ontologies) and for non-structured information retrieval, extraction, and classification.


Assuntos
Serviços de Informação , Sistemas Integrados e Avançados de Gestão da Informação , Aplicações da Informática Médica , Terminologia como Assunto , Humanos , Medical Subject Headings , Semântica , Vocabulário Controlado
5.
Gac. sanit. (Barc., Ed. impr.) ; 22(5): 421-433, oct. 2008. ilus, tab
Artigo em Espanhol | IBECS | ID: ibc-61226

RESUMO

Objetivos: Las ontologías son un recurso que permite trabajarinformáticamente con la conceptualización del significadoy evitar la limitación impuesta por los términos normalizados.El objetivo de este estudio es establecer el grado deusabilidad de las terminologías para el diseño de ontologías,que contribuyan a resolver los problemas de interoperabilidadsemántica, y de reutilización de conocimiento en los sistemasde información clínicos.Métodos: Se han analizado 6 de las terminologías más relevantespara el ámbito clínico, epidemiológico, documentaly administrativo-económico. Se valoraron las siguientes cualidades:cobertura conceptual, estructura jerárquica, granularidadconceptual, relaciones conceptuales y grado de formalismoutilizado en la representación conceptual, paraestablecer el grado de usabilidad.Resultados: Se consideran como ontologías ligeras los MeSH,los DeCS y el UMLS, aunque con diferencias entre ellas, alexplicitar los conceptos, el tipo de relación y las restriccionesentre los conceptos asociados. SNOMED y GALEN, con suformalismo declarativo basado en descripciones lógicas, incluyenla explicitación de las cualidades, una mayor restricciónpara relacionar conceptos y las reglas de combinaciónentre ellos, por lo que se consideran como ontologías pesadas.Conclusiones: El análisis de la representación declarada delas terminologías muestra las posibilidades de su reutilizacióncomo ontologías. Su grado de usabilidad dependerá de si sepretende que los sistemas de información clínicos resuelvanlos problemas de interoperabilidad semántica (ontologías ligeras)o además reutilizar su conocimiento para sistemas deayuda a la toma de decisiones (ontologías pesadas) y paratareas de recuperación, extracción y clasificación de informaciónno estructurada(AU)


Objectives: Ontologies are a resource that allow the conceptof meaning to be represented informatically, thus avoiding thelimitations imposed by standardized terms. The objective of thisstudy was to establish the extent to which terminologies couldbe used for the design of ontologies, which could be serve asan aid to resolve problems such as semantic interoperability andknowledge reusability in healthcare information systems.Methods: To determine the extent to which terminologies couldbe used as ontologies, six of the most important terminologiesin clinical, epidemiologic, documentation and administrative-economiccontexts were analyzed. The following characteristics wereverified: conceptual coverage, hierarchical structure, conceptualgranularity of the categories, conceptual relations, and thelanguage used for conceptual representation.Results: MeSH, DeCS and UMLS ontologies were consideredlightweight. The main differences among these ontologies concernconceptual specification, the types of relation and the restrictionsamong the associated concepts. SNOMED andGALEN ontologies have declaratory formalism, based on logicaldescriptions. These ontologies include explicit qualitiesand show greater restrictions among associated concepts andrule combinations and were consequently considered as heavyweight.Conclusions: Analysis of the declared representation of theterminologies shows the extent to which they could be reusedas ontologies. Their degree of usability depends on whetherthe aim is for healthcare information systems to solve problemsof semantic interoperability (lightweight ontologies) orto reuse the systems’ knowledge as an aid to decision making(heavyweight ontologies) and for non-structured informationretrieval, extraction, and classification(AU)


Assuntos
Terminologia como Assunto , Alfabetização Digital , Informática Médica/métodos , Informática Médica/normas , Computação em Informática Médica/normas , Computação em Informática Médica , Semântica , Serviços de Informação/organização & administração , Serviços de Informação , Medical Subject Headings , Systematized Nomenclature of Medicine
6.
Comput Biol Med ; 37(10): 1511-21, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17374369

RESUMO

In this paper, a restricted domain question answering (QA) system is described. The design architecture of this QA system and the features that allow the adaptation of the QA system to the medical domain are also presented. The advantages of this QA system include the simple process of defining the question taxonomy answered by the system as well as the possibility of locally or remotely managed document collections. The main computing methods of the QA system are based on the application of natural language processing (NLP) techniques to infer the logic forms and on the treatment of the logic forms. The knowledge of the system is acquired through the use of two different resources: Unified Medical Language System (UMLS) to handle the medical terminology and WordNet to manage the open-domain terminology.


Assuntos
Bases de Conhecimento , Biologia Computacional , Sistemas Computacionais , Armazenamento e Recuperação da Informação , Aplicações da Informática Médica , Processamento de Linguagem Natural , Terminologia como Assunto , Unified Medical Language System
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...