Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Educ. med. super ; 37(2)jun. 2023. ilus, tab
Artigo em Espanhol | LILACS, CUMED | ID: biblio-1528540

RESUMO

Introducción: Los avances de unas tecnologías y la obsolescencia de otras marchan a una velocidad inimaginable, especialmente en este siglo xxi. En los últimos meses de 2022 y primeros meses de 2023 muchas incógnitas y controversias en diferentes campos han surgido en torno a los Chat GPS, una innovación que presenta desafíos nunca pensados para la sociedad actual, así como nuevos retos que impactarán de manera directa en la formación y/o desempeño de profesores, estudiantes, profesionales de la salud, juristas, políticos, informáticos, bibliotecarios, científicos y cualquier ciudadano. Objetivo: Identificar algunas características del chat GPT y su posible impacto en el educación. Posicionamiento de los autores: Se leen en las noticias y reportajes valoraciones de especialistas; se han realizado encuentros virtuales y exposiciones; y están disponibles diversos artículos y videos sobre este tema, algunos llegan a ser elaborados con el propio asistente. Por la novedad del tema, la reciente incorporación como herramienta para el desarrollo profesional, así como por el interés mostrado en los últimos días por la comunidad de profesores de las ciencias médicas cubanas, y considerando que esta herramienta es resultado del desarrollo de la inteligencia artificial, cabe preguntarse: ¿en qué consiste? y ¿cuáles son sus perspectivas? Conclusiones: Resulta oportuno acercarse al tema desde las posibilidades y los retos que abre a la educación y el aprendizaje, en particular a la docencia médica(AU)


Introduction: The advances of some technologies and the obsolescence of others are marching at an unimaginable speed, especially in this twenty-first century. In the last months of 2022 and first months of 2023, many questions and controversies in different fields have arisen with respect to Chat GPT, an innovation that presents challenges never thought of before for today's society, as well as new challenges that will have a direct impact on the training and/or performance of professors, students, health professionals, law practitioners, politicians, computer scientists, librarians, scientists and any citizen. Objective: To identify some technological characteristics of Chat GPT. Positioning of the authors: In news and reports, assessments by specialists are read; virtual meetings and presentations have been held; and several articles and videos on this topic are available, some of them even elaborated by the assistant itself. Due to the novelty of the subject, its recent assimilation as a tool for professional development, as well as the interest shown in recent days by the community of professors of Cuban medical sciences and considering that this tool is the result of the development of artificial intelligence, it is worth wondering what it consists in and what its prospects are. Conclusions: It is appropriate to approach the subject with a focus on the possibilities and challenges that it opens to education and learning (AU)


Assuntos
Humanos , Ensino/educação , Inteligência Artificial/história , Inteligência Artificial/tendências , Educação Médica/métodos , Educação Médica/tendências , Aprendizado de Máquina , Aprendizagem , Universidades , Processamento de Linguagem Natural , Comunicação não Verbal
2.
Chinese Journal of Biotechnology ; (12): 1815-1824, 2023.
Artigo em Chinês | WPRIM | ID: wpr-981172

RESUMO

Antimicrobial peptides (AMPs) are small molecule peptides that are widely found in living organisms with broad-spectrum antibacterial activity and immunomodulatory effect. Due to slower emergence of resistance, excellent clinical potential and wide range of application, AMP is a strong alternative to conventional antibiotics. AMP recognition is a significant direction in the field of AMP research. The high cost, low efficiency and long period shortcomings of the wet experiment methods prevent it from meeting the need for the large-scale AMP recognition. Therefore, computer-aided identification methods are important supplements to AMP recognition approaches, and one of the key issues is how to improve the accuracy. Protein sequences could be approximated as a language composed of amino acids. Consequently, rich features may be extracted using natural language processing (NLP) techniques. In this paper, we combine the pre-trained model BERT and the fine-tuned structure Text-CNN in the field of NLP to model protein languages, develop an open-source available antimicrobial peptide recognition tool and conduct a comparison with other five published tools. The experimental results show that the optimization of the two-phase training approach brings an overall improvement in accuracy, sensitivity, specificity, and Matthew correlation coefficient, offering a novel approach for further research on AMP recognition.


Assuntos
Antibacterianos/química , Sequência de Aminoácidos , Peptídeos Catiônicos Antimicrobianos/química , Peptídeos Antimicrobianos , Processamento de Linguagem Natural
3.
Chinese Acupuncture & Moxibustion ; (12): 327-331, 2022.
Artigo em Chinês | WPRIM | ID: wpr-927383

RESUMO

The paper analyzes the specificity of term recognition in acupuncture clinical literature and compares the advantages and disadvantages of three named entity recognition (NER) methods adopted in the field of traditional Chinese medicine. It is believed that the bi-directional long short-term memory networks-conditional random fields (Bi LSTM-CRF) may communicate the context information and complete NER by using less feature rules. This model is suitable for term recognition in acupuncture clinical literature. Based on this model, it is proposed that the process of term recognition in acupuncture clinical literature should include 4 aspects, i.e. literature pretreatment, sequence labeling, model training and effect evaluation, which provides an approach to the terminological structurization in acupuncture clinical literature.


Assuntos
Terapia por Acupuntura , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural
4.
Artigo em Espanhol | LILACS, CUMED | ID: biblio-1408108

RESUMO

Este artículo tuvo como propósito caracterizar el texto libre disponible en una historia clínica electrónica de una institución orientada a la atención de pacientes en embarazo. La historia clínica electrónica, más que ser un repositorio de datos, se ha convertido en un sistema de soporte a la toma de decisiones clínicas. Sin embargo, debido al alto volumen de información y a que parte de la información clave de las historias clínicas electrónicas está en forma de texto libre, utilizar todo el potencial que ofrece la información de la historia clínica electrónica para mejorar la toma de decisiones clínicas requiere el apoyo de métodos de minería de texto y procesamiento de lenguaje natural. Particularmente, en el área de Ginecología y Obstetricia, la implementación de métodos del procesamiento de lenguaje natural podría ayudar a agilizar la identificación de factores asociados al riesgo materno. A pesar de esto, en la literatura no se registran trabajos que integren técnicas de procesamiento de lenguaje natural en las historias clínicas electrónicas asociadas al seguimiento materno en idioma español. En este trabajo se obtuvieron 659 789 tokens mediante los métodos de minería de texto, un diccionario con palabras únicas dado por 7 334 tokens y se estudiaron los n-grams más frecuentes. Se generó una caracterización con una arquitectura de red neuronal CBOW (continuos bag of words) para la incrustación de palabras. Utilizando algoritmos de clustering se obtuvo evidencia que indica que palabras cercanas en el espacio de incrustación de 300 dimensiones pueden llegar a representar asociaciones referentes a tipos de pacientes, o agrupar palabras similares, incluyendo palabras escritas con errores ortográficos. El corpus generado y los resultados encontrados sientan las bases para trabajos futuros en la detección de entidades (síntomas, signos, diagnósticos, tratamientos), la corrección de errores ortográficos y las relaciones semánticas entre palabras para generar resúmenes de historias clínicas o asistir el seguimiento de las maternas mediante la revisión automatizada de la historia clínica electrónica(AU)


The purpose of this article was to characterize the free text available in an electronic health record of an institution, directed at the care of patients in pregnancy. More than being a data repository, the electronic health record (HCE) has become a clinical decision support system (CDSS). However, due to the high volume of information, as some of the key information in EHR is in free text form, using the full potential that EHR information offers to improve clinical decision-making requires the support of methods of text mining and natural language processing (PLN). Particularly in the area of gynecology and obstetrics, the implementation of PLN methods could help speed up the identification of factors associated with maternal risk. Despite this, in the literature there are no papers that integrate PLN techniques in EHR associated with maternal follow-up in Spanish. Taking into account this knowledge gap, in this work a corpus was generated and characterized from the EHRs of a gynecology and obstetrics service characterized by treating high-risk maternal patients. PLN and text mining methods were implemented on the data, obtaining 659 789 tokens and a dictionary with unique words given by 7 334 tokens. The characterization of the data was developed from the identification of the most frequent words and n-grams and a vector representation of embedding words in a 300-dimensional space was performed using a CBOW (Continuous Bag of Words) neural network architecture. The embedding of words allowed to verify by means of Clustering algorithms, that the words associated to the same group can come to represent associations referring to types of patients, or group similar words, including words written with spelling errors. The corpus generated and the results found lay the foundations for future work in the detection of entities (symptoms, signs, diagnoses, treatments), correction of spelling errors and semantic relationships between words to generate summaries of medical records or assist the follow-up of mothers through the automated review of the electronic health record(AU)


Assuntos
Humanos , Feminino , Gravidez , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde
5.
Journal of Biomedical Engineering ; (6): 105-110, 2021.
Artigo em Chinês | WPRIM | ID: wpr-879255

RESUMO

Subject recruitment is a key component that affects the progress and results of clinical trials, and generally conducted with eligibility criteria (includes inclusion criteria and exclusion criteria). The semantic category analysis of eligibility criteria can help optimizing clinical trials design and building automated patient recruitment system. This study explored the automatic semantic categories classification of Chinese eligibility criteria based on artificial intelligence by academic shared task. We totally collected 38 341 annotated eligibility criteria sentences and predefined 44 semantic categories. A total of 75 teams participated in competition, with 27 teams having submitted system outputs. Based on the results, we found out that most teams adopted mixed models. The mainstream resolution was applying pre-trained language models capable of providing rich semantic representation, which were combined with neural network models and used to fine-tune the models with reference to classifier tasks, and finally improved classification performance could be obtained by ensemble modeling. The best-performing system achieved a macro


Assuntos
Humanos , Inteligência Artificial , China , Idioma , Processamento de Linguagem Natural , Redes Neurais de Computação
6.
Rev. méd. Chile ; 147(10): 1229-1238, oct. 2019. tab, graf
Artigo em Espanhol | LILACS | ID: biblio-1058589

RESUMO

Background: Free-text imposes a challenge in health data analysis since the lack of structure makes the extraction and integration of information difficult, particularly in the case of massive data. An appropriate machine-interpretation of electronic health records in Chile can unleash knowledge contained in large volumes of clinical texts, expanding clinical management and national research capabilities. Aim: To illustrate the use of a weighted frequency algorithm to find keywords. This finding was carried out in the diagnostic suspicion field of the Chilean specialty consultation waiting list, for diseases not covered by the Chilean Explicit Health Guarantees plan. Material and Methods: The waiting lists for a first specialty consultation for the period 2008-2018 were obtained from 17 out of 29 Chilean health services, and total of 2,592,925 diagnostic suspicions were identified. A natural language processing technique called Term Frequency-Inverse Document Frequency was used for the retrieval of diagnostic suspicion keywords. Results: For each specialty, four key words with the highest weighted frequency were determined. Word clouds showing words weighted by their importance were created to obtain a visual representation. These are available at cimt.uchile.cl/lechile/. Conclusions: The algorithm allowed to summarize unstructured clinical free-text data, improving its usefulness and accessibility.


Assuntos
Humanos , Processamento de Linguagem Natural , Processamento Eletrônico de Dados/métodos , Prontuários Médicos , Armazenamento e Recuperação da Informação/métodos , Técnicas e Procedimentos Diagnósticos , Mineração de Dados/métodos , Encaminhamento e Consulta/estatística & dados numéricos , Fatores de Tempo , Computação em Informática Médica , Chile , Reprodutibilidade dos Testes , Medicina
7.
Healthcare Informatics Research ; : 99-105, 2019.
Artigo em Inglês | WPRIM | ID: wpr-740235

RESUMO

OBJECTIVES: This study analyzed the health technology trends and sentiments of users using Twitter data in an attempt to examine the public's opinions and identify their needs. METHODS: Twitter data related to health technology, from January 2010 to October 2016, were collected. An ontology related to health technology was developed. Frequently occurring keywords were analyzed and visualized with the word cloud technique. The keywords were then reclassified and analyzed using the developed ontology and sentiment dictionary. Python and the R program were used for crawling, natural language processing, and sentiment analysis. RESULTS: In the developed ontology, the keywords are divided into ‘health technology‘ and ‘health information‘. Under health technology, there are are six subcategories, namely, health technology, wearable technology, biotechnology, mobile health, medical technology, and telemedicine. Under health information, there are four subcategories, namely, health information, privacy, clinical informatics, and consumer health informatics. The number of tweets about health technology has consistently increased since 2010; the number of posts in 2014 was double that in 2010, which was about 150 thousand posts. Posts about mHealth accounted for the majority, and the dominant words were ‘care‘, ‘new‘, ‘mental‘, and ‘fitness‘. Sentiment analysis by subcategory showed that most of the posts in nearly all subcategories had a positive tone with a positive score. CONCLUSIONS: Interests in mHealth have risen recently, and consequently, posts about mHealth were the most frequent. Examining social media users' responses to new health technology can be a useful method to understand the trends in rapidly evolving fields.


Assuntos
Tecnologia Biomédica , Biotecnologia , Boidae , Mineração de Dados , Informática , Informática Médica , Métodos , Processamento de Linguagem Natural , Privacidade , Opinião Pública , Mídias Sociais , Telemedicina
8.
Healthcare Informatics Research ; : 305-312, 2019.
Artigo em Inglês | WPRIM | ID: wpr-763951

RESUMO

OBJECTIVES: Triage is a process to accurately assess and classify symptoms to identify and provide rapid treatment to patients. The Korean Triage and Acuity Scale (KTAS) is used as a triage instrument in all emergency centers. The aim of this study was to train and compare machine learning models to predict KTAS levels. METHODS: This was a cross-sectional study using data from a single emergency department of a tertiary university hospital. Information collected during triage was used in the analysis. Logistic regression, random forest, and XGBoost were used to predict the KTAS level. RESULTS: The models with the highest area under the receiver operating characteristic curve (AUROC) were the random forest and XGBoost models trained on the entire dataset (AUROC = 0.922, 95% confidence interval 0.917–0.925 and AUROC = 0.922, 95% confidence interval 0.918–0.925, respectively). The AUROC of the models trained on the clinical data was higher than that of models trained on text data only, but the models trained on all variables had the highest AUROC among similar machine learning models. CONCLUSIONS: Machine learning can robustly predict the KTAS level at triage, which may have many possibilities for use, and the addition of text data improves the predictive performance compared to that achieved by using structured data alone.


Assuntos
Humanos , Estudos Transversais , Conjunto de Dados , Emergências , Serviço Hospitalar de Emergência , Florestas , Modelos Logísticos , Aprendizado de Máquina , Processamento de Linguagem Natural , Curva ROC , Triagem
9.
Genomics & Informatics ; : e15-2019.
Artigo em Inglês | WPRIM | ID: wpr-763809

RESUMO

Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL).


Assuntos
Reposicionamento de Medicamentos , Aprendizagem , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação , Neurônios , Corrida
10.
Genomics & Informatics ; : e16-2019.
Artigo em Inglês | WPRIM | ID: wpr-763808

RESUMO

Medical Subject Headings (MeSH), a medical thesaurus created by the National Library of Medicine (NLM), is a useful resource for natural language processing (NLP). In this article, the current status of the Japanese version of Medical Subject Headings (MeSH) is reviewed. Online investigation found that Japanese-English dictionaries, which assign MeSH information to applicable terms, but use them for NLP, were found to be difficult to access, due to license restrictions. Here, we investigate an open-source Japanese-English glossary as an alternative method for assigning MeSH IDs to Japanese terms, to obtain preliminary data for NLP proof-of-concept.


Assuntos
Humanos , Povo Asiático , Licenciamento , Medical Subject Headings , Métodos , Processamento de Linguagem Natural , Vocabulário Controlado
11.
Genomics & Informatics ; : e17-2019.
Artigo em Inglês | WPRIM | ID: wpr-763807

RESUMO

Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.


Assuntos
Benchmarking , Biologia , Mineração de Dados , Conjunto de Dados , Aprendizado de Máquina , Métodos , Biologia Molecular , Processamento de Linguagem Natural , Oryza , Plantas
12.
Genomics & Informatics ; : e19-2019.
Artigo em Inglês | WPRIM | ID: wpr-763805

RESUMO

In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.


Assuntos
Linguística , Aprendizado de Máquina , Processamento de Linguagem Natural
13.
Genomics & Informatics ; : e21-2019.
Artigo em Inglês | WPRIM | ID: wpr-763803

RESUMO

Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.


Assuntos
Processamento de Linguagem Natural
14.
Healthcare Informatics Research ; : 376-380, 2018.
Artigo em Inglês | WPRIM | ID: wpr-717652

RESUMO

OBJECTIVES: This research presents the design and development of a software architecture using natural language processing tools and the use of an ontology of knowledge as a knowledge base. METHODS: The software extracts, manages and represents the knowledge of a text in natural language. A corpus of more than 200 medical domain documents from the general medicine and palliative care areas was validated, demonstrating relevant knowledge elements for physicians. RESULTS: Indicators for precision, recall and F-measure were applied. An ontology was created called the knowledge elements of the medical domain to manipulate patient information, which can be read or accessed from any other software platform. CONCLUSIONS: The developed software architecture extracts the medical knowledge of the clinical histories of patients from two different corpora. The architecture was validated using the metrics of information extraction systems.


Assuntos
Humanos , Armazenamento e Recuperação da Informação , Bases de Conhecimento , Gestão do Conhecimento , Processamento de Linguagem Natural , Cuidados Paliativos
15.
Genomics & Informatics ; : 75-77, 2018.
Artigo em Inglês | WPRIM | ID: wpr-716819

RESUMO

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.


Assuntos
Mineração de Dados , Genoma , Genômica , Informática , Armazenamento e Recuperação da Informação , Coreia (Geográfico) , Linguística , Processamento de Linguagem Natural , Semântica
16.
Journal of Korean Critical Care Nursing ; (3): 1-14, 2018.
Artigo em Coreano | WPRIM | ID: wpr-788140

RESUMO

PURPOSE: As the intensive care unit (ICU) survival rate increases, interest in the lives of ICU survivors has also been increasing. The purpose of this study was to identify the sentiment of ICU survivors.METHOD: The author analyzed the quotations from previous qualitative studies related to ICU survivors; a total of 1,074 sentences comprising 429 quotations from 25 relevant studies were analyzed. A word cloud created in the R program was utilized to identify the most frequent adjectives used, and sentiment and emotional scores were calculated using the Artificial Intelligence (AI) program.RESULTS: The 10 adjectives that appeared the most in the quotations were ‘difficult’, ‘different’, ‘normal’, ‘able’, ‘hard’, ‘bad’, ‘ill’, ‘better’, ‘weak’, and ‘afraid’, in order of decreasing occurrence. The mean sentiment score was negative (-.31±.23), and the three emotions with the highest score were ‘sadness’(.52±.13), ‘joy’(.35±.22), and ‘fear’(.30±.25).CONCLUSION: The natural language processing of AI used in this study is a relatively new method. As such, it is necessary to refine the methodology through repeated research in various nursing fields. In addition, further studies on nursing interventions that improve the coherency of ICU memory of survivors and familial support for the ICU survivors are needed.


Assuntos
Humanos , Inteligência Artificial , Cuidados Críticos , Estado Terminal , Unidades de Terapia Intensiva , Memória , Métodos , Processamento de Linguagem Natural , Enfermagem , Taxa de Sobrevida , Sobreviventes
17.
Healthcare Informatics Research ; : 179-186, 2018.
Artigo em Inglês | WPRIM | ID: wpr-716037

RESUMO

OBJECTIVES: Clinical discharge summaries provide valuable information about patients' clinical history, which is helpful for the realization of intelligent healthcare applications. The documents tend to take the form of separate segments based on temporal or topical information. If a patient's clinical history can be seen as a consecutive sequence of clinical events, then each temporal segment can be seen as a snapshot, providing a certain clinical context at a specific moment. This study aimed to demonstrate a temporal segmentation method of Korean clinical narratives for identifying textual snapshots of patient history as a proof-of-a-concept. METHODS: Our method uses pattern-based segmentation to approximate human recognition of the temporal or topical shifts in clinical documents. We utilized rheumatic patients' discharge summaries and transformed them into sequences of constituent chunks. We built 97 single pattern functions to denote whether a certain chunk has attributes that indicate that it can be a segment boundary. We manually defined the relationships between the pattern functions to resolve multiple pattern matchings and to make a final decision. RESULTS: The algorithm segmented 30 discharge summaries and processed 1,849 decision points. Three human judges were asked whether they agreed with the algorithm's prediction, and the agreement percentage on the judges' majority opinion was 89.61%. CONCLUSIONS: Although this method is based on manually constructed rules, our findings demonstrate that the proposed algorithm can achieve fairly good segmentation results, and it may be the basis for methodological improvement in the future.


Assuntos
Humanos , Atenção à Saúde , Registros Eletrônicos de Saúde , Métodos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão , Doenças Reumáticas
18.
Journal of Peking University(Health Sciences) ; (6): 256-263, 2018.
Artigo em Chinês | WPRIM | ID: wpr-691492

RESUMO

OBJECTIVE@#There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing.@*METHODS@#Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method.@*RESULTS@#For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94.@*CONCLUSION@#This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.


Assuntos
Humanos , Algoritmos , China , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural
19.
Healthcare Informatics Research ; : 141-146, 2017.
Artigo em Inglês | WPRIM | ID: wpr-41215

RESUMO

OBJECTIVES: With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. METHODS: This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. RESULTS: Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. CONCLUSIONS: Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.


Assuntos
Classificação , Análise por Conglomerados , Mineração de Dados , Mineração , Processamento de Linguagem Natural
20.
J. health inform ; 8(supl.I): 373-380, 2016. tab
Artigo em Inglês | LILACS | ID: biblio-906292

RESUMO

Ontologias terminológicas padronizadas e corretamente traduzidas são essenciais para o desenvolvimento de aplicações de processamento de linguagem natural na área da saúde. Para o desenvolvimento de uma aplicação de busca semântica em narrativas clínicas em português se fez necessária a utilização dos termos clínicos da Unified Medical Language System (UMLS). OBJETIVOS: Traduzir termos da UMLS em Português Europeu para Português Brasileiro. MÉTODOS: Foi desenvolvido um algoritmo de tradução semi-automática baseada em regras de substituição de texto. RESULTADOS: Após execução do algoritmo e avaliação por parte de especialistas, o algoritmo deixou de traduzir corretamente apenas 0.1% dos termos da base de testes. CONCLUSÃO: A utilização do método proposto se mostrou efetivo na tradução dos termos da UMLS e pode auxiliar em posteriores adaptações de listagens em Português Europeu para Português Brasileiro.


Correctly translated and standardized clinical ontologies are essential for development of Natural LanguageProcessing application for the medical domain. To develop an ontology-driven semantic search application for Portuguese clinical notes we needed to implement the Unified Medical Language System (UMLS) ontologies, specifically for Brazilian Portuguese. OBJECTIVES: To translate UMLS terms from European Portuguese to Brazilian Portuguese. METHODS: To develop a semi-automatic translation algorithm based on string replacement rules. RESULTS: Following the experiments and specialists' evaluation the algorithm mis-translated only 0.1% of terms in our test set. CONCLUSION: The proposed method proved to be effective for UMLS clinical terms translation and can be useful for posterior adaption ofa set of clinical terms from European Portuguese to Brazilian Portuguese.


Assuntos
Humanos , Tradução , Processamento de Linguagem Natural , Congressos como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA