Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Article in Chinese | WPRIM | ID: wpr-1023491

ABSTRACT

Purpose/Significance The paper discusses the application of artificial intelligence technology to the key entity recognition ofunstructured text data in the electronic medical records of lymphedema patients.Method/Process It expounds the solution of model fine-tuning training under the background of sample scarcity,a total of 594 patients admitted to the department of lymphatic surgery of Beijing Shijitan Hospital,Capital Medical University are selected as the research objects.The prediction layer of the GlobalPointer model is fine-tuned according to 15 key entity categories labeled by clinicians,nested and non-nested key entities are identified with its glob-al pointer.The accuracy of the experimental results and the feasibility of clinical application are analyzed.Result/Conclusion After fine-tuning,the average accuracy rate,recall rate and Macro_F1 ofthe model are 0.795,0.641 and 0.697,respectively,which lay a foundation for accurate mining of lymphedema EMR data.

2.
Article in Chinese | WPRIM | ID: wpr-1026199

ABSTRACT

Objective To present a named entity recognition method referred to as BioBERT-Att-BiLSTM-CRF for eligibility criteria based on the BioBERT pretrained model.The method can automatically extract relevant information from clinical trials and provide assistance in efficiently formulating eligibility criteria.Methods Based on the UMLS medical semantic network and expert-defined rules,the study established medical entity annotation rules and constructed a named entity recognition corpus to clarify the entity recognition task.BioBERT-Att-BiLSTM-CRF converted the text into BioBERT vectors and inputted them into a bidirectional long short-term memory network to capture contextual semantic features.Meanwhile,attention mechanisms were applied to extract key features,and a conditional random field was used for decoding and outputting the optimal label sequence.Results BioBERT-Att-BiLSTM-CRF outperformed other baseline models on the eligibility criteria named entity recognition dataset.Conclusion BioBERT-Att-BiLSTM-CRF can efficiently extract eligibility criteria-related information from clinical trials,thereby enhancing the scientific validity of clinical trial registration data and providing assistance in the formulation of eligibility criteria for clinical trials.

3.
Article | IMSEAR | ID: sea-221381

ABSTRACT

The groundwork for extracting a significant amount of biomedical information from unstructured texts into structured formats is the difficult research area of biological entity recognition from medical documents. The existing work implemented the named entity recognition for diseases using the sequence labelling framework. The performance of this strategy, however, is not always adequate, and it frequently cannot fully exploit the semantic information in the dataset. The Syndrome Diseases Named Entity problem is presented in this work as a sequence labelling with multi-context learning. By using well-designed text/queries, this formulation may incorporate more previous information and to decode it using decoding techniques such conditional random fields (CRF). We performed experiments on three biomedical datasets, and the outcomes show how effective our methodology is on the BC5CDR-Disease, JNLPBA and NCBI-Disease, compared with other techniques our methodology performs with accuracy levels of 96.70%,98.65 and 96.72% respectively.

4.
Article in Chinese | WPRIM | ID: wpr-987653

ABSTRACT

@#Knowledge graph technology has promoted the progress of new drug research and development, but domestic research starts late and domain knowledge is mostly stored in text, resulting in low rate of knowledge graph reuse.Based on multi-source and heterogeneous medical texts, this paper designed a Chinese named entity recognition model based on Bert-wwm-ext pre-training model and also integrated cascade thought, which reduced the complexity of traditional single classification and further improved the efficiency of text recognition.The experimental results showed that the model achieved the best performance with an F1-score of 0.903, a precision of 89.2%, and a recall rate of 91.5% on the self-built dataset.At the same time, the model was applied to the public dataset CCKS2019, and the results showed that the model had better performance and recognition effect.Using this model, this paper constructed a Chinese medical knowledge graph, involving 13 530 entities, 10 939 attributes and 39 247 relationships of them in total.The Chinese medical entity extraction and graph construction method proposed in this paper is expected to help researchers accelerate the new discovery of medical knowledge, and shorten the process of new drug discovery.

5.
Zhongguo zhenjiu ; (12): 327-331, 2022.
Article in Chinese | WPRIM | ID: wpr-927383

ABSTRACT

The paper analyzes the specificity of term recognition in acupuncture clinical literature and compares the advantages and disadvantages of three named entity recognition (NER) methods adopted in the field of traditional Chinese medicine. It is believed that the bi-directional long short-term memory networks-conditional random fields (Bi LSTM-CRF) may communicate the context information and complete NER by using less feature rules. This model is suitable for term recognition in acupuncture clinical literature. Based on this model, it is proposed that the process of term recognition in acupuncture clinical literature should include 4 aspects, i.e. literature pretreatment, sequence labeling, model training and effect evaluation, which provides an approach to the terminological structurization in acupuncture clinical literature.


Subject(s)
Acupuncture Therapy , Electronic Health Records , Natural Language Processing
6.
Article in Chinese | WPRIM | ID: wpr-912731

ABSTRACT

Objective:To construct a drug knowledge base based on drug instructions.Methods:Six hundred randomly selected drug instructions were labeled manually and divided into training set and test set. The training was based on bidirectional long short-term memory(Bi-LSTM) and conditional random fields(CRF) model to complete the recognition of medical entities. The extracted entities were standardized by the hybrid model of " similarity calculation and rule mapping table" , and then the drug information was imported into the Access database.Results:In the task of named entity recognition based on Bi-LSTM and CRF model, except for the crowd entities, the other entities had achieved good results with an F-value higher than 85%. Based on the hybrid model of " similarity calculation and rule mapping table" , the accuracy of entity standardization was 88.23%.Conclusions:The effect of the machine learning model in this study is similar to that of other named entity recognition and entity standardization studies, which can complete the task of drug knowledge base construction satisfactorily.

7.
Article in English | WPRIM | ID: wpr-763803

ABSTRACT

Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.


Subject(s)
Natural Language Processing
8.
Article in English | WPRIM | ID: wpr-763807

ABSTRACT

Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.


Subject(s)
Benchmarking , Biology , Data Mining , Dataset , Machine Learning , Methods , Molecular Biology , Natural Language Processing , Oryza , Plants
9.
Article in Chinese | WPRIM | ID: wpr-712974

ABSTRACT

[Objective] To research the construction and optimization of natural language processing model for unstructured medical records,and using the model to extract structured data from medical records of stroke patients in Jiangxi Medical Big Data Platform.[Methods] According to the actual needs of clinical research,a stroke specialist entity annotation system and named entity annotation corpus were constructed based on 500 hospital admission records of stroke patients,which randomly selected between 2011 to 2016 from the Jiangxi provincial medical big data platform.The corpus is used to construct a named entity extraction model based on CRF and RUTA rules,and the recognition accuracy is improved by adjusting RUTA rules and parameters.[Results] Accuracy rate of extraction model was 0.960,recall rate was 0.916 and F-score was 0.939.The extraction model was used to extract 264 580 entities and 1 161 077 entity relation from 10 295 stroke patients' admission records of the medical big data platform.[Conclusions] The constructed natural language extraction model has a high recognition accuracy,which can accurately obtain valuable scientific research data of patients' past history,life history and clinical manifestations from a large number of unstructured medical records and effectively improve the clinical research efficiency and scientific research level of cerebrovascular diseases.

10.
Article in Chinese | WPRIM | ID: wpr-511115

ABSTRACT

The steps of text mining in biomedical field and the methods used in its each step were described with stress laid on the tools used in each step of text mining in order to promote text mining in biomedical field.

11.
Article in Chinese | WPRIM | ID: wpr-513107

ABSTRACT

Clinical cases of TCM are used as important clinical data to record the whole process of the interaction between doctors and patients in the form of text.However,in the context of big data,there is a lack of research on the use of information covered in clinical cases.Therefore,we studied the method of extracting the symptom term from the history of present illness in TCM clinic in this paper,in order to lay the foundation for the further use of clinical cases.First,twelve thousand,three hundred and sixty-seven history data of present illness were obtained by random selection and expert review.According to the different disease types,they were divided into the two groups of the experiments:4,838 data in the diabetes group,7,529 data in the spleen and stomach disease group and 12,367 data in the mixed or combined group.A glossary of symptom terms covering 22,996 words were compiled.Then,five feature templates,such as sliding window feature,prefix and suffix character and lexical features,were selected.CRFs model was adopted to carry out named entity extraction experiment.As a result,in the open test,the performance of diabetes,spleen and stomach disease and mixed group were (0.83,0.8,0.82),(0.9,0.9,0.89) and (0.88,0.87,0.87),respectively,while the results were (0.83,0.82,0.83),(0.95,0.95,0.95) and (0.93,0.92,0.92) in the ten-fold cross validation.In conclusion,the results showed that the CRFs algorithm was an excellent sequence labeling algorithm and applied to the named entity extraction task of symptom history.

SELECTION OF CITATIONS
SEARCH DETAIL