Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
J Biomed Semantics ; 2 Suppl 5: S8, 2011 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-22166355

RESUMO

BACKGROUND: The treatment of negation and hedging in natural language processing has received much interest recently, especially in the biomedical domain. However, open access corpora annotated for negation and/or speculation are hardly available for training and testing applications, and even if they are, they sometimes follow different design principles. In this paper, the annotation principles of the two largest corpora containing annotation for negation and speculation - BioScope and Genia Event - are compared. BioScope marks linguistic cues and their scopes for negation and hedging while in Genia biological events are marked for uncertainty and/or negation. RESULTS: Differences among the annotations of the two corpora are thematically categorized and the frequency of each category is estimated. We found that the largest amount of differences is due to the issue that scopes - which cover text spans - deal with the key events and each argument (including events within events) of these events is under the scope as well. In contrast, Genia deals with the modality of events within events independently. CONCLUSIONS: The analysis of multiple layers of annotation (linguistic scopes and biological events) showed that the detection of negation/hedge keywords and their scopes can contribute to determining the modality of key events (denoted by the main predicate). On the other hand, for the detection of the negation and speculation status of events within events, additional syntax-based rules investigating the dependency path between the modality cue and the event cue have to be employed.

2.
J Am Med Inform Assoc ; 16(4): 601-5, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19390097

RESUMO

OBJECTIVE In this study the authors describe the system submitted by the team of University of Szeged to the second i2b2 Challenge in Natural Language Processing for Clinical Data. The challenge focused on the development of automatic systems that analyzed clinical discharge summary texts and addressed the following question: "Who's obese and what co-morbidities do they (definitely/most likely) have?". Target diseases included obesity and its 15 most frequent comorbidities exhibited by patients, while the target labels corresponded to expert judgments based on textual evidence and intuition (separately). DESIGN The authors applied statistical methods to preselect the most common and confident terms and evaluated outlier documents by hand to discover infrequent spelling variants. The authors expected a system with dictionaries gathered semi-automatically to have a good performance with moderate development costs (the authors examined just a small proportion of the records manually). MEASUREMENTS Following the standard evaluation method of the second Workshop on challenges in Natural Language Processing for Clinical Data, the authors used both macro- and microaveraged Fbeta=1 measure for evaluation. RESULTS The authors submission achieved a microaverage F(beta=1) score of 97.29% for classification based on textual evidence (macroaverage F(beta=1) = 76.22%) and 96.42% for intuitive judgments (macroaverage F(beta=1) = 67.27%). CONCLUSIONS The results demonstrate the feasibility of the authors approach and show that even very simple systems with a shallow linguistic analysis can achieve remarkable accuracy scores for classifying clinical records on a limited set of concepts.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Sistemas Computadorizados de Registros Médicos , Processamento de Linguagem Natural , Obesidade , Comorbidade , Humanos , Alta do Paciente , Estatística como Assunto
3.
BMC Bioinformatics ; 9 Suppl 11: S9, 2008 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-19025695

RESUMO

BACKGROUND: Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). RESULTS: The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief linguist--also responsible for setting up the annotation guidelines --who resolved cases where the annotators disagreed. The resulting corpus consists of more than 20.000 sentences that were considered for annotation and over 10% of them actually contain one (or more) linguistic annotation suggesting negation or uncertainty. CONCLUSION: Statistics are reported on corpus size, ambiguity levels and the consistency of annotations. The corpus is accessible for academic purposes and is free of charge. Apart from the intended goal of serving as a common resource for the training, testing and comparing of biomedical Natural Language Processing systems, the corpus is also a good resource for the linguistic analysis of scientific and clinical texts.


Assuntos
Indexação e Redação de Resumos/métodos , Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Processamento de Linguagem Natural , Vocabulário Controlado
4.
BMC Bioinformatics ; 9 Suppl 3: S10, 2008 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-18426545

RESUMO

BACKGROUND: In this paper we focus on the problem of automatically constructing ICD-9-CM coding systems for radiology reports. ICD-9-CM codes are used for billing purposes by health institutes and are assigned to clinical records manually following clinical treatment. Since this labeling task requires expert knowledge in the field of medicine, the process itself is costly and is prone to errors as human annotators have to consider thousands of possible codes when assigning the right ICD-9-CM labels to a document. In this study we use the datasets made available for training and testing automated ICD-9-CM coding systems by the organisers of an International Challenge on Classifying Clinical Free Text Using Natural Language Processing in spring 2007. The challenge itself was dominated by entirely or partly rule-based systems that solve the coding task using a set of hand crafted expert rules. Since the feasibility of the construction of such systems for thousands of ICD codes is indeed questionable, we decided to examine the problem of automatically constructing similar rule sets that turned out to achieve a remarkable accuracy in the shared task challenge. RESULTS: Our results are very promising in the sense that we managed to achieve comparable results with purely hand-crafted ICD-9-CM classifiers. Our best model got a 90.26% F measure on the training dataset and an 88.93% F measure on the challenge test dataset, using the micro-averaged F beta=1 measure, the official evaluation metric of the International Challenge on Classifying Clinical Free Text Using Natural Language Processing. This result would have placed second in the challenge, with a hand-crafted system achieving slightly better results. CONCLUSIONS: Our results demonstrate that hand-crafted systems - which proved to be successful in ICD-9-CM coding - can be reproduced by replacing several laborious steps in their construction with machine learning models. These hybrid systems preserve the favourable aspects of rule-based classifiers like good performance, and their development can be achieved rapidly and requires less human effort. Hence the construction of such hybrid systems can be feasible for a set of labels one magnitude bigger, and with more labeled data.


Assuntos
Algoritmos , Inteligência Artificial , Sistemas de Apoio a Decisões Clínicas , Classificação Internacional de Doenças , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Radiologia/métodos , Terminologia como Assunto , Vocabulário Controlado
5.
J Am Med Inform Assoc ; 14(5): 574-80, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17823086

RESUMO

OBJECTIVE: The anonymization of medical records is of great importance in the human life sciences because a de-identified text can be made publicly available for non-hospital researchers as well, to facilitate research on human diseases. Here the authors have developed a de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act. DESIGN: We introduce here a novel, machine learning-based iterative Named Entity Recognition approach intended for use on semi-structured documents like discharge records. Our method identifies PHI in several steps. First, it labels all entities whose tags can be inferred from the structure of the text and it then utilizes this information to find further PHI phrases in the flow text parts of the document. MEASUREMENTS: Following the standard evaluation method of the first Workshop on Challenges in Natural Language Processing for Clinical Data, we used token-level Precision, Recall and F(beta=1) measure metrics for evaluation. RESULTS: Our system achieved outstanding accuracy on the standard evaluation dataset of the de-identification challenge, with an F measure of 99.7534% for the best submitted model. CONCLUSION: We can say that our system is competitive with the current state-of-the-art solutions, while we describe here several techniques that can be beneficial in other tasks that need to handle structured documents such as clinical records.


Assuntos
Inteligência Artificial , Confidencialidade , Sistemas Computadorizados de Registros Médicos , Estudos de Avaliação como Assunto , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...