Pesquisa | Portal Regional da BVS

Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection. / Processamento de linguagem natural não supervisionado na identificação de pacientes suspeitos de infecção por COVID-19.

Silva, Rildo Pinto da; Pollettini, Juliana Tarossi; Pazin Filho, Antonio.

Cad Saude Publica ; 39(11): e00243722, 2023.

Artigo em Português, Inglês | MEDLINE | ID: mdl-38055548

RESUMO

Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.

Os pacientes com síndrome pós-COVID-19 se beneficiam de programas de promoção de saúde e sua rápida identificação é importante para a utilização custo efetiva desses programas. Técnicas tradicionais de identificação têm fraco desempenho, especialmente em pandemias. Portanto, foi realizado um estudo observacional descritivo utilizando 105.008 autorizações prévias pagas por operadora privada de saúde com aplicação de método não supervisionado de processamento de linguagem natural por modelagem de tópicos para identificação de pacientes suspeitos de infecção por COVID-19. Foram gerados seis modelos: três utilizando o algoritmo BERTopic e três modelos Word2Vec. O modelo BERTopic cria automaticamente grupos de doenças. Já no modelo Word2Vec, para definição dos tópicos relacionados a COVID-19, foi necessária análise manual dos 100 primeiros casos de cada tópico. O modelo BERTopic com mais de 1.000 autorizações por tópico sem tratamento de palavras selecionou pacientes mais graves - custo médio por autorizações prévias pagas de BRL 10.206 e gasto total de BRL 20,3 milhões (5,4%) em 1.987 autorizações prévias (1,9%). Teve 70% de acerto comparado à análise humana e 20% de casos com potencial interesse, todos passíveis de análise para inclusão em programa de promoção à saúde. Teve perda importante de casos quando comparado ao modelo tradicional de pesquisa com linguagem estruturada e identificou outros grupos de doenças - ortopédicas, mentais e câncer. O modelo BERTopic serviu como método exploratório a ser utilizado na rotulagem de casos e posterior aplicação em modelos supervisionados. A identificação automática de outras doenças levanta questionamentos éticos sobre o tratamento de informações em saúde por aprendizado de máquina.

Los pacientes con síndrome pos-COVID-19 pueden beneficiarse de los programas de promoción de la salud. Su rápida identificación es importante para el uso efectivo de estos programas. Las técnicas de identificación tradicionales no tienen un buen desempeño, especialmente en pandemias. Se realizó un estudio observacional descriptivo, con el uso de 105.008 autorizaciones previas pagadas por un operador de salud privado mediante la aplicación de un método no supervisado de procesamiento del lenguaje natural mediante modelado temático para identificar a los pacientes sospechosos de estar infectados por COVID-19. Se generaron 6 modelos: 3 con el uso del algoritmo BERTopic y 3 modelos Word2Vec. El modelo BERTopic crea automáticamente grupos de enfermedades. En el modelo Word2Vec para definir temas relacionados con la COVID-19, fue necesario el análisis manual de los primeros 100 casos de cada tema. El modelo BERTopic con más de 1.000 autorizaciones por tema sin tratamiento de palabras seleccionó a pacientes más graves: costo promedio por autorizaciones previas pagada de BRL 10.206 y gasto total de BRL 20,3 millones (5,4%) en 1.987 autorizaciones previas (1,9%). Además, contó con el 70% de aciertos en comparación con el análisis humano y el 20% de los casos con potencial interés, todos los cuales pueden analizarse para su inclusión en un programa de promoción de la salud. Hubo una pérdida significativa de casos en comparación con el modelo tradicional de investigación con lenguaje estructurado y se identificó otros grupos de enfermedades: ortopédicas, mentales y cáncer. El modelo BERTopic sirvió como un método exploratorio para ser utilizado en el etiquetado de casos y su posterior aplicación en modelos supervisados. La identificación automática de otras enfermedades plantea preguntas éticas sobre el tratamiento de la información de salud mediante el aprendizaje de máquina.

Assuntos

COVID-19 , Humanos , Processamento de Linguagem Natural , Síndrome de COVID-19 Pós-Aguda , Brasil/epidemiologia , Aprendizado de Máquina

Processamento de linguagem natural não supervisionado na identificação de pacientes suspeitos de infecção por COVID-19 / Procesamiento del lenguaje natural no supervisado para identificar a los pacientes sospechosos de infección por COVID-19 / Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection

Silva, Rildo Pinto da; Pollettini, Juliana Tarossi; Pazin Filho, Antonio.

Cad. Saúde Pública (Online) ; 39(11): e00243722, 2023. tab, graf

Artigo em Português | LILACS-Express | LILACS | ID: biblio-1550174

RESUMO

A Health Surveillance Software Framework to deliver information on preventive healthcare strategies.

Macedo, Alessandra Alaniz; Pollettini, Juliana Tarossi; Baranauskas, José Augusto; Chaves, Julia Carmona Almeida.

J Biomed Inform ; 62: 159-70, 2016 08.

Artigo em Inglês | MEDLINE | ID: mdl-27318270

RESUMO

A software framework can reduce costs related to the development of an application because it allows developers to reuse both design and code. Recently, companies and research groups have announced that they have been employing health software frameworks. This paper presents the design, proof-of-concept implementations and experimentation of the Health Surveillance Software Framework (HSSF). The HSSF is a framework that tackles the demand for the recommendation of surveillance information aiming at supporting preventive healthcare strategies. Examples of such strategies are the automatic recommendation of surveillance levels to patients in need of healthcare and the automatic recommendation of scientific literature that elucidates epigenetic problems related to patients. HSSF was created from two systems we developed in our previous work on health surveillance systems: the Automatic-SL and CISS systems. The Automatic-SL system aims to assist healthcare professionals in making decisions and in identifying children with developmental problems. The CISS service associates genetic and epigenetic risk factors related to chronic diseases with patient's clinical records. Towards evaluating the HSSF framework, two new systems, CISS+ and CISS-SW, were created by means of abstractions and instantiations of the framework (design and code). We show that HSSF supported the development of the two new systems given that they both recommend scientific papers using medical records as queries even though they exploit different computational technologies. In an experiment using simulated patients' medical records, we show that CISS, CISS+, and CISS-SW systems recommended more closely related and somewhat related documents than Google, Google Scholar and PubMed. Considering recall and precision measures, CISS+ surpasses CISS-SW in terms of precision.

Assuntos

Sistemas Computacionais , Nível de Saúde , Vigilância da População , Software , Criança , Doença Crônica , Diagnóstico , Humanos , Prontuários Médicos

Surveillance for the prevention of chronic diseases through information association.

Pollettini, Juliana Tarossi; Baranauskas, José Augusto; Ruiz, Evandro Seron; da Graça Pimentel, Maria; Macedo, Alessandra Alaniz.

BMC Med Genomics ; 7: 7, 2014 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-24479447

RESUMO

BACKGROUND: Research on Genomic medicine has suggested that the exposure of patients to early life risk factors may induce the development of chronic diseases in adulthood, as the presence of premature risk factors can influence gene expression. The large number of scientific papers published in this research area makes it difficult for the healthcare professional to keep up with individual results and to establish association between them. Therefore, in our work we aim at building a computational system that will offer an innovative approach that alerts health professionals about human development problems such as cardiovascular disease, obesity and type 2 diabetes. METHODS: We built a computational system called Chronic Illness Surveillance System (CISS), which retrieves scientific studies that establish associations (conceptual relationships) between chronic diseases (cardiovascular diseases, diabetes and obesity) and the risk factors described on clinical records. To evaluate our approach, we submitted ten queries to CISS as well as to three other search engines (Google™, Google Scholar™ and Pubmed®;) - the queries were composed of terms and expressions from a list of risk factors provided by specialists. RESULTS: CISS retrieved a higher number of closely related (+) and somewhat related (+/-) documents, and a smaller number of unrelated (-) and almost unrelated (-/+) documents, in comparison with the three other systems. The results from the Friedman's test carried out with the post-hoc Holm procedure (95% confidence) for our system (control) versus the results for the three other engines indicate that our system had the best performance in three of the categories (+), (-) and (+/-). This is an important result, since these are the most relevant categories for our users. CONCLUSION: Our system should be able to assist researchers and health professionals in finding out relationships between potential risk factors and chronic diseases in scientific papers.

Assuntos

Doença Crônica/epidemiologia , Doença Crônica/prevenção & controle , Monitoramento Epidemiológico , Predisposição Genética para Doença , Humanos , Idioma , Fatores de Risco , Ferramenta de Busca

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA