Pesquisa | Portal Regional da BVS

Hipson, Will E; Mohammad, Saif M.

PLoS One ; 16(9): e0256153, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34543312

RESUMO

Emotion dynamics is a framework for measuring how an individual's emotions change over time. It is a powerful tool for understanding how we behave and interact with the world. In this paper, we introduce a framework to track emotion dynamics through one's utterances. Specifically we introduce a number of utterance emotion dynamics (UED) metrics inspired by work in Psychology. We use this approach to trace emotional arcs of movie characters. We analyze thousands of such character arcs to test hypotheses that inform our broader understanding of stories. Notably, we show that there is a tendency for characters to use increasingly more negative words and become increasingly emotionally discordant with each other until about 90% of the narrative length. UED also has applications in behavior studies, social sciences, and public health.

Assuntos

Comunicação , Emoções/fisiologia , Processos Mentais/fisiologia , Filmes Cinematográficos/instrumentação , Teoria Psicológica , Humanos

The natural selection of words: Finding the features of fitness.

Turney, Peter D; Mohammad, Saif M.

PLoS One ; 14(1): e0211512, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30689665

RESUMO

We introduce a dataset for studying the evolution of words, constructed from WordNet and the Google Books Ngram Corpus. The dataset tracks the evolution of 4,000 synonym sets (synsets), containing 9,000 English words, from 1800 AD to 2000 AD. We present a supervised learning algorithm that is able to predict the future leader of a synset: the word in the synset that will have the highest frequency. The algorithm uses features based on a word's length, the characters in the word, and the historical frequencies of the word. It can predict change of leadership (including the identity of the new leader) fifty years in the future, with an F-score considerably above random guessing. Analysis of the learned models provides insight into the causes of change in the leader of a synset. The algorithm confirms observations linguists have made, such as the trend to replace the -ise suffix with -ize, the rivalry between the -ity and -ness suffixes, and the struggle between economy (shorter words are easier to remember and to write) and clarity (longer words are more distinctive and less likely to be confused with one another). The results indicate that integration of the Google Books Ngram Corpus with WordNet has significant potential for improving our understanding of how language evolves.

Assuntos

Comportamento de Escolha , Idioma , Modelos Teóricos , Terminologia como Assunto , Aprendizagem Verbal , Humanos , Fonética

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.

Sarker, Abeed; Belousov, Maksim; Friedrichs, Jasper; Hakala, Kai; Kiritchenko, Svetlana; Mehryary, Farrokh; Han, Sifei; Tran, Tung; Rios, Anthony; Kavuluru, Ramakanth; de Bruijn, Berry; Ginter, Filip; Mahata, Debanjan; Mohammad, Saif M; Nenadic, Goran; Gonzalez-Hernandez, Graciela.

J Am Med Inform Assoc ; 25(10): 1274-1283, 2018 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-30272184

RESUMO

Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/classificação , Processamento de Linguagem Natural , Redes Neurais de Computação , Mídias Sociais/classificação , Máquina de Vetores de Suporte , Mineração de Dados/métodos , Humanos , Farmacovigilância

Binary classifiers and latent sequence models for emotion detection in suicide notes.

Cherry, Colin; Mohammad, Saif M; de Bruijn, Berry.

Biomed Inform Insights ; 5(Suppl. 1): 147-54, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22879771

RESUMO

This paper describes the National Research Council of Canada's submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA