Search | Global Index Medicus

CONTEXTUAL LEARNING APPROACH FOR SYNDROME DISEASE NAMED ENTITY RECOGNITION

Uma, Dr. E.; K, Kamatchi; Elangovan, Mehala.

Article | IMSEAR | ID: sea-221381

ABSTRACT

The groundwork for extracting a significant amount of biomedical information from unstructured texts into structured formats is the difficult research area of biological entity recognition from medical documents. The existing work implemented the named entity recognition for diseases using the sequence labelling framework. The performance of this strategy, however, is not always adequate, and it frequently cannot fully exploit the semantic information in the dataset. The Syndrome Diseases Named Entity problem is presented in this work as a sequence labelling with multi-context learning. By using well-designed text/queries, this formulation may incorporate more previous information and to decode it using decoding techniques such conditional random fields (CRF). We performed experiments on three biomedical datasets, and the outcomes show how effective our methodology is on the BC5CDR-Disease, JNLPBA and NCBI-Disease, compared with other techniques our methodology performs with accuracy levels of 96.70%,98.65 and 96.72% respectively.

Entity extraction and graph construction based on Chinese medical text / 中国药科大学学报

Ye YANG; Lei PEI; Fengzhen HOU.

Journal of China Pharmaceutical University ; (6): 363-371, 2023.

Article in Chinese | WPRIM | ID: wpr-987653

ABSTRACT

@#Knowledge graph technology has promoted the progress of new drug research and development, but domestic research starts late and domain knowledge is mostly stored in text, resulting in low rate of knowledge graph reuse.Based on multi-source and heterogeneous medical texts, this paper designed a Chinese named entity recognition model based on Bert-wwm-ext pre-training model and also integrated cascade thought, which reduced the complexity of traditional single classification and further improved the efficiency of text recognition.The experimental results showed that the model achieved the best performance with an F1-score of 0.903, a precision of 89.2%, and a recall rate of 91.5% on the self-built dataset.At the same time, the model was applied to the public dataset CCKS2019, and the results showed that the model had better performance and recognition effect.Using this model, this paper constructed a Chinese medical knowledge graph, involving 13 530 entities, 10 939 attributes and 39 247 relationships of them in total.The Chinese medical entity extraction and graph construction method proposed in this paper is expected to help researchers accelerate the new discovery of medical knowledge, and shorten the process of new drug discovery.

Automatic labeling and extraction of terms in natural language processing in acupuncture clinical literature / 中国针灸

Hua-Yun LIU; Chen-Jing HAN; Jie XIONG; Hai-Yan LI; Lei LEI; Bao-Yan LIU.

Chinese Acupuncture & Moxibustion ; (12): 327-331, 2022.

Article in Chinese | WPRIM | ID: wpr-927383

ABSTRACT

The paper analyzes the specificity of term recognition in acupuncture clinical literature and compares the advantages and disadvantages of three named entity recognition (NER) methods adopted in the field of traditional Chinese medicine. It is believed that the bi-directional long short-term memory networks-conditional random fields (Bi LSTM-CRF) may communicate the context information and complete NER by using less feature rules. This model is suitable for term recognition in acupuncture clinical literature. Based on this model, it is proposed that the process of term recognition in acupuncture clinical literature should include 4 aspects, i.e. literature pretreatment, sequence labeling, model training and effect evaluation, which provides an approach to the terminological structurization in acupuncture clinical literature.

Subject(s)

Acupuncture Therapy , Electronic Health Records , Natural Language Processing

Research on the construction of drug knowledge base based on machine learning / 中华医院管理杂志

Yunfei HOU; Yicheng LI; Zongyu ZOU; Zijun ZHOU.

Chinese Journal of Hospital Administration ; (12): 232-236, 2021.

Article in Chinese | WPRIM | ID: wpr-912731

ABSTRACT

Objective:To construct a drug knowledge base based on drug instructions.Methods:Six hundred randomly selected drug instructions were labeled manually and divided into training set and test set. The training was based on bidirectional long short-term memory(Bi-LSTM) and conditional random fields(CRF) model to complete the recognition of medical entities. The extracted entities were standardized by the hybrid model of " similarity calculation and rule mapping table" , and then the drug information was imported into the Access database.Results:In the task of named entity recognition based on Bi-LSTM and CRF model, except for the crowd entities, the other entities had achieved good results with an F-value higher than 85%. Based on the hybrid model of " similarity calculation and rule mapping table" , the accuracy of entity standardization was 88.23%.Conclusions:The effect of the machine learning model in this study is similar to that of other named entity recognition and entity standardization studies, which can complete the task of drug knowledge base construction satisfactorily.

OryzaGP: rice gene and protein dataset for named-entity recognition

Pierre LARMANDE; Huy DO; Yue WANG.

Genomics & Informatics ; : e17-2019.

Article in English | WPRIM | ID: wpr-763807

ABSTRACT

Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

Subject(s)

Benchmarking , Biology , Data Mining , Dataset , Machine Learning , Methods , Molecular Biology , Natural Language Processing , Oryza , Plants

Improving spaCy dependency annotation and PoS tagging web service using independent NER services

Nico COLIC; Fabio RINALDI.

Genomics & Informatics ; : e21-2019.

Article in English | WPRIM | ID: wpr-763803

ABSTRACT

Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.

Subject(s)

Natural Language Processing

Medical Name Entity Recognition and Application in Chinese Admission Record of Stroke Patients Based on CRF and RUTA rule / 中山大学学报(医学科学版)

Yuan XU; Yan-Qiu GE; Qiang WANG; Gang XIONG; Ying-Ping YI.

Journal of Sun Yat-sen University(Medical Sciences) ; (6): 455-462, 2018.

Article in Chinese | WPRIM | ID: wpr-712974

ABSTRACT

[Objective] To research the construction and optimization of natural language processing model for unstructured medical records,and using the model to extract structured data from medical records of stroke patients in Jiangxi Medical Big Data Platform.[Methods] According to the actual needs of clinical research,a stroke specialist entity annotation system and named entity annotation corpus were constructed based on 500 hospital admission records of stroke patients,which randomly selected between 2011 to 2016 from the Jiangxi provincial medical big data platform.The corpus is used to construct a named entity extraction model based on CRF and RUTA rules,and the recognition accuracy is improved by adjusting RUTA rules and parameters.[Results] Accuracy rate of extraction model was 0.960,recall rate was 0.916 and F-score was 0.939.The extraction model was used to extract 264 580 entities and 1 161 077 entity relation from 10 295 stroke patients' admission records of the medical big data platform.[Conclusions] The constructed natural language extraction model has a high recognition accuracy,which can accurately obtain valuable scientific research data of patients' past history,life history and clinical manifestations from a large number of unstructured medical records and effectively improve the clinical research efficiency and scientific research level of cerebrovascular diseases.

A Study on the Named Entity Recognition Method on Symptom Names in the History of Present Illness in Traditional Chinese Medical (TCM) Clinic / 世界科学技术-中医药现代化

Yuhu YUAN; Xuezhong ZHOU; Runshun ZHANG; Xiaodong LI.

World Science and Technology-Modernization of Traditional Chinese Medicine ; (12): 70-77, 2017.

Article in Chinese | WPRIM | ID: wpr-513107

ABSTRACT

Clinical cases of TCM are used as important clinical data to record the whole process of the interaction between doctors and patients in the form of text.However,in the context of big data,there is a lack of research on the use of information covered in clinical cases.Therefore,we studied the method of extracting the symptom term from the history of present illness in TCM clinic in this paper,in order to lay the foundation for the further use of clinical cases.First,twelve thousand,three hundred and sixty-seven history data of present illness were obtained by random selection and expert review.According to the different disease types,they were divided into the two groups of the experiments:4,838 data in the diabetes group,7,529 data in the spleen and stomach disease group and 12,367 data in the mixed or combined group.A glossary of symptom terms covering 22,996 words were compiled.Then,five feature templates,such as sliding window feature,prefix and suffix character and lexical features,were selected.CRFs model was adopted to carry out named entity extraction experiment.As a result,in the open test,the performance of diabetes,spleen and stomach disease and mixed group were (0.83,0.8,0.82),(0.9,0.9,0.89) and (0.88,0.87,0.87),respectively,while the results were (0.83,0.82,0.83),(0.95,0.95,0.95) and (0.93,0.92,0.92) in the ten-fold cross validation.In conclusion,the results showed that the CRFs algorithm was an excellent sequence labeling algorithm and applied to the named entity extraction task of symptom history.

Steps and tools of text mining in biomedical field / 中华医学图书情报杂志

Lei CUI.

Chinese Journal of Medical Library and Information Science ; (12): 1-5, 2017.

Article in Chinese | WPRIM | ID: wpr-511115

ABSTRACT

The steps of text mining in biomedical field and the methods used in its each step were described with stress laid on the tools used in each step of text mining in order to promote text mining in biomedical field.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL