Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Sensors (Basel) ; 22(24)2022 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-36560177

RESUMO

Query understanding (QU) plays a vital role in natural language processing, particularly in regard to question answering and dialogue systems. QU finds the named entity and query intent in users' questions. Traditional pipeline approaches manage the two mentioned tasks, namely, the named entity recognition (NER) and the question classification (QC), separately. NER is seen as a sequence labeling task to predict a keyword, while QC is a semantic classification task to predict the user's intent. Considering the correlation between these two tasks, training them together could be of benefit to both of them. Kazakh is a low-resource language with wealthy lexical and agglutinative characteristics. We argue that current QU techniques restrict the power of the word-level and sentence-level features of agglutinative languages, especially the stem, suffixes, POS, and gazetteers. This paper proposes a new multi-task learning model for query understanding (MTQU). The MTQU model is designed to establish direct connections for QC and NER tasks to help them promote each other mutually, while we also designed a multi-feature input layer that significantly influenced the model's performance during training. In addition, we constructed new corpora for the Kazakh query understanding task, namely, the KQU. As a result, the MTQU model is simple and effective and obtains competitive results for the KQU.


Assuntos
Idioma , Processamento de Linguagem Natural , Semântica , Aprendizado de Máquina , Aprendizagem
2.
Heliyon ; 7(10): e08216, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34746470

RESUMO

Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro F 1 -score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro F 1 -score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset.

3.
PeerJ Comput Sci ; 7: e570, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34435091

RESUMO

Question classification is one of the essential tasks for automatic question answering implementation in natural language processing (NLP). Recently, there have been several text-mining issues such as text classification, document categorization, web mining, sentiment analysis, and spam filtering that have been successfully achieved by deep learning approaches. In this study, we illustrated and investigated our work on certain deep learning approaches for question classification tasks in an extremely inflected Turkish language. In this study, we trained and tested the deep learning architectures on the questions dataset in Turkish. In addition to this, we used three main deep learning approaches (Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN)) and we also applied two different deep learning combinations of CNN-GRU and CNN-LSTM architectures. Furthermore, we applied the Word2vec technique with both skip-gram and CBOW methods for word embedding with various vector sizes on a large corpus composed of user questions. By comparing analysis, we conducted an experiment on deep learning architectures based on test and 10-cross fold validation accuracy. Experiment results were obtained to illustrate the effectiveness of various Word2vec techniques that have a considerable impact on the accuracy rate using different deep learning approaches. We attained an accuracy of 93.7% by using these techniques on the question dataset.

4.
BMC Public Health ; 20(1): 990, 2020 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-32576159

RESUMO

BACKGROUND: Today,. most people use the Internet to seek online health-related information from general public health-related websites and discussion groups. However, there are no Internet-based analyses of health information needs pertaining to diabetes in China until now. With the development of artificial intelligence,we can analyzed these online health-related information and provide references for health providers to improve their health service. METHODS: We have done a study of statistically analyzing the questions about diabetes collected from 39 health website, the number of which is 151,589. We have divided these questions into 9 categories using a convolutional neural network. RESULTS: The diabetes problems of consumer are presented as follows, diagnosis: 34.95%, treatment: 25.17%, lifestyle: 21.09%, complication: 8.00%, maternity-related:5.00%, prognosis: 2.59%, health provider choosing: 1.40%, prevention: 1.23%, others: 0.58%, The elderly are more concerned about the treatment and complications of diabetes, while the young are more concerned about the maternity-related and prognosis of diabetes. The diabetes drugs most frequently mentioned by consumers are insulin, metformin and Xiaoke pills, The most concerned complication is caidiovascular disease and diabetic eye disease. CONCLUSION: Diabetes health education should focus on how to prevent diabetes and the contents of health education should be different for differernt age groups;on diabetes treatment, the use of insulin and oral hypoglycemic drugs education should be strengthened.


Assuntos
Informação de Saúde ao Consumidor/estatística & dados numéricos , Diabetes Mellitus , Internet/estatística & dados numéricos , Avaliação das Necessidades/estatística & dados numéricos , Adulto , Idoso , Inteligência Artificial , China , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Gravidez , Adulto Jovem
5.
J Med Internet Res ; 22(4): e13071, 2020 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-32297872

RESUMO

BACKGROUND: Since the turn of this century, the internet has become an invaluable resource for people seeking health information and answers to health-related queries. Health question and answer websites have grown in popularity in recent years as a means for patients to obtain health information from medical professionals. For patients suffering from chronic illnesses, it is vital that health care providers become better acquainted with patients' information needs and learn how they express them in text format. OBJECTIVE: The aims of this study were to: (1) explore whether patients can accurately and adequately express their information needs on health question and answer websites, (2) identify what types of problems are of most concern to those suffering from chronic illnesses, and (3) determine the relationship between question characteristics and the number of answers received. METHODS: Questions were collected from a leading Chinese health question and answer website called "All questions will be answered" in January 2018. We focused on questions relating to diabetes and hepatitis, including those that were free and those that were financially rewarded. Content analysis was completed on a total of 7068 (diabetes) and 6685 (hepatitis) textual questions. Correlations between the characteristics of questions (number of words per question, value of reward) and the number of answers received were evaluated using linear regression analysis. RESULTS: The majority of patients are able to accurately express their problem in text format, while some patients may require minor social support. The questions posted were related to three main topics: (1) prevention and examination, (2) diagnosis, and (3) treatment. Patients with diabetes were most concerned with the treatment received, whereas patients with hepatitis focused on the diagnosis results. The number of words per question and the value of the reward were negatively correlated with the number of answers. The number of words per question and the value of the reward were negatively correlated with the number of answers. CONCLUSIONS: This study provides valuable insights into the ability of patients suffering from chronic illnesses to make an understandable request on health question and answer websites. Health topics relating to diabetes and hepatitis were classified to address the health information needs of chronically ill patients. Furthermore, identification of the factors affecting the number of answers received per question can help users of these websites to better frame their questions to obtain more valuable answers.


Assuntos
Coleta de Dados/métodos , Internet , Pacientes/estatística & dados numéricos , Médicos/normas , China , Humanos , Inquéritos e Questionários
6.
Artigo em Inglês | MEDLINE | ID: mdl-33664987

RESUMO

In recent years, the social web has been increasingly used for health information seeking, sharing, and subsequent health-related research. Women often use the Internet or social networking sites to seek information related to pregnancy in different stages. They may ask questions about birth control, trying to conceive, labor, or taking care of a newborn or baby. Classifying different types of questions about pregnancy information (e.g., before, during, and after pregnancy) can inform the design of social media and professional websites for pregnancy education and support. This research aims to investigate the attention mechanism built-in or added on top of the BERT model in classifying and annotating the pregnancy-related questions posted on a community Q&A site. We evaluated two BERT-based models and compared them against the traditional machine learning models for question classification. Most importantly, we investigated two attention mechanisms: the built-in self-attention mechanism of BERT and the additional attention layer on top of BERT for relevant term annotation. The classification performance showed that the BERT-based models worked better than the traditional models, and BERT with an additional attention layer can achieve higher overall precision than the basic BERT model. The results also showed that both attention mechanisms work differently on annotating relevant content, and they could serve as feature selection methods for text mining in general.

7.
J Biomed Inform ; 93: 103143, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30872137

RESUMO

Question classification is considered one of the most significant phases of a typical Question Answering (QA) system. It assigns certain answer types to each question which leads to narrow down the search space of possible answers for factoid and list type questions. The process of assigning certain answer types to each question is also known as Lexical Answer Type (LAT) Prediction. Although much work has been done to enhance the performance of question classification into coarse and fine classes in diverse domains, it is still considered a challenging task in the biomedical field. The difficulty in biomedical question classification stems from the fact that one question might have more than one label or expected answer types associated with it (also, referred to as a multi-label classification). In the biomedical domain, only preliminary work is done to classify multi-label questions by transforming them into a single label through copy transformation technique. In this paper, we have generated a multi-labeled corpus (MLBioMedLAT) by exploring the process of Open Advancement of Question Answering (OAQA) system for the task of biomedical question classification. We use 780 biomedical questions from BioASQ challenge and assign them appropriate labels. To annotate these labels, we use the answers for each question and assign the question semantic type labels by leveraging an existing corpus and utilizing OAQA system. The paper introduces a data transformation approach namely Label Power Set with logistic regression (LPLR) for the task of multi-label biomedical question classification and compares its performance with Structured SVM (SSVM), Restricted Boltzmann Machine (RBM), and copy transformation based logistic regression (CLR) (previously used for a similar task in the OAQA system). To evaluate the integrity of the introduced data transformation technique, we use three prominent evaluation measures namely MicroF1, Accuracy, and Hamming Loss. Regarding MicroF1, our introduced technique coupled with a new feature set surpasses CLR, SSVM, and RBM with a margin of 7%, 8%, and 22% respectively.


Assuntos
Semântica , Humanos , Modelos Logísticos , Aprendizado de Máquina
8.
Methods Inf Med ; 56(3): 209-216, 2017 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-28361158

RESUMO

BACKGROUND AND OBJECTIVE: Biomedical question type classification is one of the important components of an automatic biomedical question answering system. The performance of the latter depends directly on the performance of its biomedical question type classification system, which consists of assigning a category to each question in order to determine the appropriate answer extraction algorithm. This study aims to automatically classify biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3) list, and (4) summary. METHODS: In this paper, we propose a biomedical question type classification method based on machine learning approaches to automatically assign a category to a biomedical question. First, we extract features from biomedical questions using the proposed handcrafted lexico-syntactic patterns. Then, we feed these features for machine-learning algorithms. Finally, the class label is predicted using the trained classifiers. RESULTS: Experimental evaluations performed on large standard annotated datasets of biomedical questions, provided by the BioASQ challenge, demonstrated that our method exhibits significant improved performance when compared to four baseline systems. The proposed method achieves a roughly 10-point increase over the best baseline in terms of accuracy. Moreover, the obtained results show that using handcrafted lexico-syntactic patterns as features' provider of support vector machine (SVM) lead to the highest accuracy of 89.40 %. CONCLUSION: The proposed method can automatically classify BioASQ questions into one of the four categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated that our method produced the best classification performance compared to four baseline systems.


Assuntos
Ontologias Biológicas , Armazenamento e Recuperação da Informação/métodos , Aprendizado de Máquina , Processamento de Linguagem Natural , Semântica , Inquéritos e Questionários/classificação , Reconhecimento Automatizado de Padrão/métodos
9.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-512153

RESUMO

Taking the diet problem of diabetic patients as an example,the paper puts forward the problems classification system based on functions in the view of users,classifies the problems put forward by patients through the Support Vector Machine (SVM) algorithm,and provides important support for the construction of the deep automatic Question Answering (QA) system.

10.
Health Informatics J ; 22(3): 523-35, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-25759063

RESUMO

This article examines methods for automated question classification applied to cancer-related questions that people have asked on the web. This work is part of a broader effort to provide automated question answering for health education. We created a new corpus of consumer-health questions related to cancer and a new taxonomy for those questions. We then compared the effectiveness of different statistical methods for developing classifiers, including weighted classification and resampling. Basic methods for building classifiers were limited by the high variability in the natural distribution of questions and typical refinement approaches of feature selection and merging categories achieved only small improvements to classifier accuracy. Best performance was achieved using weighted classification and resampling methods, the latter yielding an accuracy of F1 = 0.963. Thus, it would appear that statistical classifiers can be trained on natural data, but only if natural distributions of classes are smoothed. Such classifiers would be useful for automated question answering, for enriching web-based content, or assisting clinical professionals to answer questions.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação/classificação , Neoplasias , Bases de Dados Factuais , Educação em Saúde , Humanos , Disseminação de Informação/métodos , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...