Pesquisa | Portal Regional da BVS

Comparing deep learning architectures for sentiment analysis on drug reviews.

Colón-Ruiz, Cristóbal; Segura-Bedmar, Isabel.

J Biomed Inform ; 110: 103539, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-32818665

RESUMO

Since the turn of the century, as millions of user's opinions are available on the web, sentiment analysis has become one of the most fruitful research fields in Natural Language Processing (NLP). Research on sentiment analysis has covered a wide range of domains such as economy, polity, and medicine, among others. In the pharmaceutical field, automatic analysis of online user reviews allows for the analysis of large amounts of user's opinions and to obtain relevant information about the effectiveness and side effects of drugs, which could be used to improve pharmacovigilance systems. Throughout the years, approaches for sentiment analysis have progressed from simple rules to advanced machine learning techniques such as deep learning, which has become an emerging technology in many NLP tasks. Sentiment analysis is not oblivious to this success, and several systems based on deep learning have recently demonstrated their superiority over former methods, achieving state-of-the-art results on standard sentiment analysis datasets. However, prior work shows that very few attempts have been made to apply deep learning to sentiment analysis of drug reviews. We present a benchmark comparison of various deep learning architectures such as Convolutional Neural Networks (CNN) and Long short-term memory (LSTM) recurrent neural networks. We propose several combinations of these models and also study the effect of different pre-trained word embedding models. As transformers have revolutionized the NLP field achieving state-of-art results for many NLP tasks, we also explore Bidirectional Encoder Representations from Transformers (BERT) with a Bi-LSTM for the sentiment analysis of drug reviews. Our experiments show that the usage of BERT obtains the best results, but with a very high training time. On the other hand, CNN achieves acceptable results while requiring less training time.

Assuntos

Aprendizado Profundo , Preparações Farmacêuticas , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação

Predicting of anaphylaxis in big data EMR by exploring machine learning approaches.

Segura-Bedmar, Isabel; Colón-Ruíz, Cristobal; Tejedor-Alonso, Miguél Ángel; Moro-Moro, Mar.

J Biomed Inform ; 87: 50-59, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30266231

RESUMO

Anaphylaxis is a life-threatening allergic reaction that occurs suddenly after contact with an allergen. Epidemiological studies about anaphylaxis are very important in planning and evaluating new strategies that prevent this reaction, but also in providing a guide to the treatment of patients who have just suffered an anaphylactic reaction. Electronic Medical Records (EMR) are one of the most effective and richest sources for the epidemiology of anaphylaxis, because they provide a low-cost way of accessing rich longitudinal data on large populations. However, a negative aspect is that researchers have to manually review a huge amount of information, which is a very costly and highly time consuming task. Therefore, our goal is to explore different machine learning techniques to process Big Data EMR, lessening the needed efforts for performing epidemiological studies about anaphylaxis. In particular, we aim to study the incidence of anaphylaxis by the automatic classification of EMR. To do this, we employ the most widely used and efficient classifiers in text classification and compare different document representations, which range from well-known methods such as Bag Of Words (BoW) to more recent ones based on word embedding models, such as a simple average of word embeddings or a bag of centroids of word embeddings. Because the identification of anaphylaxis cases in EMR is a class-imbalanced problem (less than 1% describe anaphylaxis cases), we employ a novel undersampling technique based on clustering to balance our dataset. In addition to classical machine learning algorithms, we also use a Convolutional Neural Network (CNN) to classify our dataset. In general, experiments show that the most classifiers and representations are effective (F1 above 90%). Logistic Regression, Linear SVM, Multilayer Perceptron and Random Forest achieve an F1 around 95%, however linear methods have considerably lower training times. CNN provides slightly better performance (F1â¯=â¯95.6%).

Assuntos

Anafilaxia/diagnóstico , Anafilaxia/epidemiologia , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Informática Médica/métodos , Redes Neurais de Computação , Algoritmos , Big Data , Análise por Conglomerados , Tomada de Decisões , Humanos , Idioma , Modelos Lineares , Análise de Regressão

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA