Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 3605-3608, 2019 Jul.
Article in English | MEDLINE | ID: mdl-31946657

ABSTRACT

Mental health is a growing concern and its problems range from inability to cope with day-to-day stress to severe conditions like depression. Ability to detect these symptoms heavily relies on accurate measurements of emotion and its components, such as emotional valence comprising of positive, negative and neutral affect. Speech as a bio-signal to measure valence is interesting because of the ubiquity of smartphones that can easily record and process speech signals. Speech-based emotion detection uses a broad spectrum of features derived from audio samples including pitch, energy, Mel Frequency Cepstral Coefficients (MFCCs), Linear Predictive Cepstral Coefficients, Log frequency power coefficients, spectrograms and so on. Despite the array of features and classifiers, detecting valence from speech alone remains a challenge. Further, the algorithms for extracting some of these features are computeintensive. This becomes a problem particularly in smartphone applications where the algorithms have to be executed on the device itself. We propose a novel time-domain feature that not only improves the valence detection accuracy, but also saves 10% of the computational cost of extraction as compared to that of MFCCs. A Random Forest Regressor operating on the proposed feature-set detects speaker-independent valence on a non-acted database with 70% accuracy. The algorithm also achieves 100% accuracy when tested with the acted speech database, Emo-DB.


Subject(s)
Algorithms , Emotions , Speech , Databases, Factual , Humans
2.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 4241-4244, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30441290

ABSTRACT

Psychological well-being at the workplace has increased the demand for detecting emotions with higher accuracies. Speech, one of the most non-obtrusive modes of capturing emotions at the workplace, is still in need of robust emotion annotation mechanisms for non-acted speech corpora. In this paper, we extend our experiments on our non-acted speech database in two ways. First, we report how participants themselves perceive the emotion in their voice after a long gap of about six months, and how a third person, who has not heard the clips earlier, perceives the emotion in the same utterances. Both annotators also rated the intensity of the emotion. They agreed better in neutral (84%) and negative clips (74%) than in positive ones (38%). Second, we restrict our attention to those samples that had agreement and show that the classification accuracy of 80% by machine learning, an improvement of 7% over the state-of-the-art results for speakerdependent classification. This result suggests that the high-level perception of emotion does translate to the low-level features of speech. Further analysis shows that the silently expressed positive and negative emotions are often misinterpreted as neutral. For the speaker-independent test set, we report an overall accuracy of 61%.


Subject(s)
Emotions , Speech Perception , Speech , Voice , Humans , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...