Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
Artif Intell Med ; 103: 101772, 2020 03.
Article in English | MEDLINE | ID: mdl-32143787

ABSTRACT

The representation of knowledge based on first-order logic captures the richness of natural language and supports multiple probabilistic inference models. Although symbolic representation enables quantitative reasoning with statistical probability, it is difficult to utilize with machine learning models as they perform numerical operations. In contrast, knowledge embedding (i.e., high-dimensional and continuous vectors) is a feasible approach to complex reasoning that can not only retain the semantic information of knowledge, but also establish the quantifiable relationship among embeddings. In this paper, we propose a recursive neural knowledge network (RNKN), which combines medical knowledge based on first-order logic with a recursive neural network for multi-disease diagnosis. After the RNKN is efficiently trained using manually annotated Chinese Electronic Medical Records (CEMRs), diagnosis-oriented knowledge embeddings and weight matrixes are learned. The experimental results confirm that the diagnostic accuracy of the RNKN is superior to those of four machine learning models, four classical neural networks and Markov logic network. The results also demonstrate that the more explicit the evidence extracted from CEMRs, the better the performance. The RNKN gradually reveals the interpretation of knowledge embeddings as the number of training epochs increases.


Subject(s)
Diagnosis, Computer-Assisted/methods , Electronic Health Records/organization & administration , Neural Networks, Computer , Algorithms , Humans , Machine Learning
2.
PLoS One ; 14(5): e0216046, 2019.
Article in English | MEDLINE | ID: mdl-31048840

ABSTRACT

Specific entity terms such as disease, test, symptom, and genes in Electronic Medical Record (EMR) can be extracted by Named Entity Recognition (NER). However, limited resources of labeled EMR pose a great challenge for mining medical entity terms. In this study, a novel multitask bi-directional RNN model combined with deep transfer learning is proposed as a potential solution of transferring knowledge and data augmentation to enhance NER performance with limited data. The proposed model has been evaluated using micro average F-score, macro average F-score and accuracy. It is observed that the proposed model outperforms the baseline model in the case of discharge datasets. For instance, for the case of discharge summary, the micro average F-score is improved by 2.55% and the overall accuracy is improved by 7.53%. For the case of progress notes, the micro average F-score and the overall accuracy are improved by 1.63% and 5.63%, respectively.


Subject(s)
Data Collection/methods , Data Mining/methods , Electronic Health Records/classification , Asian People , Database Management Systems , Deep Learning/trends , Humans , Machine Learning , Neural Networks, Computer , Records/classification
3.
BMC Bioinformatics ; 19(Suppl 17): 499, 2018 Dec 28.
Article in English | MEDLINE | ID: mdl-30591015

ABSTRACT

BACKGROUND: Electronic Medical Record (EMR) comprises patients' medical information gathered by medical stuff for providing better health care. Named Entity Recognition (NER) is a sub-field of information extraction aimed at identifying specific entity terms such as disease, test, symptom, genes etc. NER can be a relief for healthcare providers and medical specialists to extract useful information automatically and avoid unnecessary and unrelated information in EMR. However, limited resources of available EMR pose a great challenge for mining entity terms. Therefore, a multitask bi-directional RNN model is proposed here as a potential solution of data augmentation to enhance NER performance with limited data. METHODS: A multitask bi-directional RNN model is proposed for extracting entity terms from Chinese EMR. The proposed model can be divided into a shared layer and a task specific layer. Firstly, vector representation of each word is obtained as a concatenation of word embedding and character embedding. Then Bi-directional RNN is used to extract context information from sentence. After that, all these layers are shared by two different task layers, namely the parts-of-speech tagging task layer and the named entity recognition task layer. These two tasks layers are trained alternatively so that the knowledge learned from named entity recognition task can be enhanced by the knowledge gained from parts-of-speech tagging task. RESULTS: The performance of our proposed model has been evaluated in terms of micro average F-score, macro average F-score and accuracy. It is observed that the proposed model outperforms the baseline model in all cases. For instance, experimental results conducted on the discharge summaries show that the micro average F-score and the macro average F-score are improved by 2.41% point and 4.16% point, respectively, and the overall accuracy is improved by 5.66% point. CONCLUSIONS: In this paper, a novel multitask bi-directional RNN model is proposed for improving the performance of named entity recognition in EMR. Evaluation results using real datasets demonstrate the effectiveness of the proposed model.


Subject(s)
Electronic Health Records , Language , Models, Theoretical , China , Humans , Information Storage and Retrieval
4.
Comput Methods Programs Biomed ; 156: 179-190, 2018 Mar.
Article in English | MEDLINE | ID: mdl-29428070

ABSTRACT

BACKGROUND AND OBJECTIVE: The application of medical knowledge strongly affects the performance of intelligent diagnosis, and method of learning the weights of medical knowledge plays a substantial role in probabilistic graphical models (PGMs). The purpose of this study is to investigate a discriminative weight-learning method based on a medical knowledge network (MKN). METHODS: We propose a training model called the maximum margin medical knowledge network (M3KN), which is strictly derived for calculating the weight of medical knowledge. Using the definition of a reasonable margin, the weight learning can be transformed into a margin optimization problem. To solve the optimization problem, we adopt a sequential minimal optimization (SMO) algorithm and the clique property of a Markov network. Ultimately, M3KN not only incorporates the inference ability of PGMs but also deals with high-dimensional logic knowledge. RESULTS: The experimental results indicate that M3KN obtains a higher F-measure score than the maximum likelihood learning algorithm of MKN for both Chinese Electronic Medical Records (CEMRs) and Blood Examination Records (BERs). Furthermore, the proposed approach is obviously superior to some classical machine learning algorithms for medical diagnosis. To adequately manifest the importance of domain knowledge, we numerically verify that the diagnostic accuracy of M3KN is gradually improved as the number of learned CEMRs increase, which contain important medical knowledge. CONCLUSIONS: Our experimental results show that the proposed method performs reliably for learning the weights of medical knowledge. M3KN outperforms other existing methods by achieving an F-measure of 0.731 for CEMRs and 0.4538 for BERs. This further illustrates that M3KN can facilitate the investigations of intelligent healthcare.


Subject(s)
Diagnosis, Computer-Assisted/methods , Models, Statistical , Neural Networks, Computer , Signal Processing, Computer-Assisted , Algorithms , China , Computer Graphics , Electronic Health Records , Humans , Likelihood Functions , Machine Learning , Markov Chains , Models, Theoretical , Reproducibility of Results
5.
J Biomed Inform ; 69: 203-217, 2017 05.
Article in English | MEDLINE | ID: mdl-28404537

ABSTRACT

OBJECTIVE: To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. MATERIALS AND METHODS: An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. RESULTS: The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. DISCUSSION: The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. CONCLUSIONS: In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain.


Subject(s)
Data Curation , Natural Language Processing , Semantics , China , Data Mining , Humans , Language , Narration
6.
Prim Care Diabetes ; 2(3): 121-6, 2008 Sep.
Article in English | MEDLINE | ID: mdl-18779035

ABSTRACT

AIMS: (1) To determine the incidence of type 1 diabetes mellitus in children aged<15 years in Harbin, China and (2) to examine the trend in incidence over the period from 1990 to 2000. METHODS: Newly diagnosed cases of type 1 diabetes from 1990 to 2000 were identified among 1,286,154 Chinese children aged 0-14 years in Harbin. The primary source of case ascertainment was from hospital records and the secondary source from the health records of school clinics. RESULTS: One hundred and three cases were identified during 1990 and 2000. The annual incidence rate was 0.73 per 100,000 (95% CI: 0.59-0.88 per 100,000). No significant difference between males and females in the incidence of type 1 diabetes was observed. The incidence was significantly associated with age. With those aged<5 years as reference, the rate ratios were 2.06 and 4.1 for those aged 5-9 and 10-14 years, respectively. The incidence was higher in urban than in suburban regions, particularly among those aged 10-14 years. No significant seasonality was observed. There was a significant increasing trend in the incidence of type 1 diabetes during the period of 1990 and 2000, with an annual increase of 7.4% (95% CI: 1.6-13.5%). CONCLUSIONS: There is a significantly increasing trend in the incidence of type 1 diabetes among children in Harbin. Increased number of cases has important implications for diabetes care providers. Understanding the etiology of this rise is critical for developing preventive measures to halt the trend.


Subject(s)
Diabetes Mellitus, Type 1/epidemiology , Adolescent , Child , Child, Preschool , China/epidemiology , Ethnicity/statistics & numerical data , Female , Humans , Incidence , Infant , Male , Medical Records , Sex Characteristics
SELECTION OF CITATIONS
SEARCH DETAIL
...