Pesquisa | Portal Regional da BVS

Identifying and extracting patient smoking status information from clinical narrative texts in Spanish.

Figueroa, Rosa L; Soto, Diego A; Pino, Esteban J.

Annu Int Conf IEEE Eng Med Biol Soc ; 2014: 2710-3, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25570550

RESUMO

In this work we present a system to identify and extract patient's smoking status from clinical narrative text in Spanish. The clinical narrative text was processed using natural language processing techniques, and annotated by four people with a biomedical background. The dataset used for classification had 2,465 documents, each one annotated with one of the four smoking status categories. We used two feature representations: single word token and bigrams. The classification problem was divided in two levels. First recognizing between smoker (S) and non-smoker (NS); second recognizing between current smoker (CS) and past smoker (PS). For each feature representation and classification level, we used two classifiers: Support Vector Machines (SVM) and Bayesian Networks (BN). We split our dataset as follows: a training set containing 66% of the available documents that was used to build classifiers and a test set containing the remaining 34% of the documents that was used to test and evaluate the model. Our results show that SVM together with the bigram representation performed better in both classification levels. For S vs NS classification level performance measures were: ACC=85%, Precision=85%, and Recall=90%. For CS vs PS classification level performance measures were: ACC=87%, Precision=91%, and Recall=94%.

Assuntos

Bases de Dados Factuais , Registros Eletrônicos de Saúde/classificação , Processamento de Linguagem Natural , Fumar , Teorema de Bayes , Chile , Humanos , Narração , Máquina de Vetores de Suporte

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA