Search | VHL Regional Portal

1.

Data-driven measurement of precision of components of pitch curves in Carnatic music.

Viraraghavan, Venkata Subramanian; Pal, Arpan; Aravind, Rangarajan; Murthy, Hema A.

J Acoust Soc Am ; 147(5): 3657, 2020 May.

Article in English | MEDLINE | ID: mdl-32486769

ABSTRACT

Carnatic music (CM) is characterized by continuous pitch variations called gamakas, which are learned by example. Precision is measured on the points of zero-slope in gamaka- and non-gamaka-segments of the pitch curve as the standard deviation (SD) of the error in their pitch with respect to targets. Two previous techniques are considered to identify targets: the nearest semitone and the most likely mean of a semi-continuous Gaussian mixture model. These targets are employed irrespective of where the points of zero-slope occur in the pitch curve. The authors propose segmenting CM pitch curves into non-overlapping components called constant-pitch notes (CPNs) and stationary points (STAs), i.e., points where the pitch curve outside the CPNs changes direction. Targets are obtained statistically from the histograms of the mean pitch-values of CPNs, anchors (CPNs adjacent to STAs), and STAs. The upper and lower quartiles of SDs of errors in long CPNs (9-15 cents), short CPNs (20-26 cents), and STAs (41-54 cents) are separable, which justifies the component-wise treatment. The CPN-STA model also brings out a hitherto unreported structure in ragas and explains the precision obtained using the previous techniques.

2.

Detecting emotional valence using time-domain analysis of speech signals.

Deshpande, Gauri; Viraraghavan, Venkata Subramanian; Duggirala, Mayuri; Patel, Sachin.

Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 3605-3608, 2019 Jul.

Article in English | MEDLINE | ID: mdl-31946657

ABSTRACT

Mental health is a growing concern and its problems range from inability to cope with day-to-day stress to severe conditions like depression. Ability to detect these symptoms heavily relies on accurate measurements of emotion and its components, such as emotional valence comprising of positive, negative and neutral affect. Speech as a bio-signal to measure valence is interesting because of the ubiquity of smartphones that can easily record and process speech signals. Speech-based emotion detection uses a broad spectrum of features derived from audio samples including pitch, energy, Mel Frequency Cepstral Coefficients (MFCCs), Linear Predictive Cepstral Coefficients, Log frequency power coefficients, spectrograms and so on. Despite the array of features and classifiers, detecting valence from speech alone remains a challenge. Further, the algorithms for extracting some of these features are computeintensive. This becomes a problem particularly in smartphone applications where the algorithms have to be executed on the device itself. We propose a novel time-domain feature that not only improves the valence detection accuracy, but also saves 10% of the computational cost of extraction as compared to that of MFCCs. A Random Forest Regressor operating on the proposed feature-set detects speaker-independent valence on a non-acted database with 70% accuracy. The algorithm also achieves 100% accuracy when tested with the acted speech database, Emo-DB.

Subject(s)

Algorithms , Emotions , Speech , Databases, Factual , Humans

3.

Spike Estimation from Fluorescence Signals Using High-Resolution Property of Group Delay.

Sebastian, Jilt; Kumar, Mari Ganesh; Viraraghavan, Venkata Subramanian; Sur, Mriganka; Murthy, Hema A.

IEEE Trans Signal Process ; 67(11): 2923-2936, 2019 Jun 01.

Article in English | MEDLINE | ID: mdl-33981133

ABSTRACT

Spike estimation from calcium (Ca2+) fluorescence signals is a fundamental and challenging problem in neuroscience. Several models and algorithms have been proposed for this task over the past decade. Nevertheless, it is still hard to achieve accurate spike positions from the Ca2+ fluorescence signals. While existing methods rely on data-driven methods and the physiology of neurons for modelling the spiking process, this work exploits the nature of the fluorescence responses to spikes using signal processing. We first motivate the problem by a novel analysis of the high-resolution property of minimum-phase group delay (GD) functions for multi-pole resonators. The resonators could be connected either in series or in parallel. The Ca2+ indicator responds to a spike with a sudden rise, that is followed by an exponential decay. We interpret the Ca2+ signal as the response of an impulse train to the change in Ca2+ concentration, where the Ca2+ response corresponds to a resonator. We perform minimum-phase group delay-based filtering of the Ca2+ signal for resolving spike locations. The performance of the proposed algorithm is evaluated on nine datasets spanning various indicators, sampling rates and, mouse brain regions. The proposed approach: GDspike, is compared with other spike estimation methods including MLspike, Vogelstein de-convolution algorithm, and data-driven Spike Triggered Mixture (STM) model. The performance of GDSpike is superior to that of the Vogelstein algorithm and is comparable to that of MLSpike. It can also be used to post-process the output of MLSpike, which further enhances the performance.

4.

Comparing Manual and Machine Annotations of Emotions in Non-acted Speech.

Deshpande, Gauri; Viraraghavan, Venkata Subramanian; Duggirala, Mayuri; Vempada, Ramu Reddy; Patel, Sachin.

Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 4241-4244, 2018 Jul.

Article in English | MEDLINE | ID: mdl-30441290

ABSTRACT

Psychological well-being at the workplace has increased the demand for detecting emotions with higher accuracies. Speech, one of the most non-obtrusive modes of capturing emotions at the workplace, is still in need of robust emotion annotation mechanisms for non-acted speech corpora. In this paper, we extend our experiments on our non-acted speech database in two ways. First, we report how participants themselves perceive the emotion in their voice after a long gap of about six months, and how a third person, who has not heard the clips earlier, perceives the emotion in the same utterances. Both annotators also rated the intensity of the emotion. They agreed better in neutral (84%) and negative clips (74%) than in positive ones (38%). Second, we restrict our attention to those samples that had agreement and show that the classification accuracy of 80% by machine learning, an improvement of 7% over the state-of-the-art results for speakerdependent classification. This result suggests that the high-level perception of emotion does translate to the low-level features of speech. Further analysis shows that the silently expressed positive and negative emotions are often misinterpreted as neutral. For the speaker-independent test set, we report an overall accuracy of 61%.

Subject(s)

Emotions , Speech Perception , Speech , Voice , Humans , Machine Learning

5.

EmoSense: Automatically Sensing Emotions From Speech By Multi-way Classification.

Reddy, V Ramu; Viraraghavan, Venkata Subramanian.

Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 4987-4990, 2018 Jul.

Article in English | MEDLINE | ID: mdl-30441461

ABSTRACT

Reliably detecting emotions is a topic of current research in understanding mental health. Among the many modes of detecting emotion, audio has a prominent place. In this paper, we propose a two-level, multi-way classifier applied to classification of seven emotions from the standard Emo-DB database. The multi-way classifier is an automated methodology of analyzing a confusion matrix of a first-level classifier to build more classifiers at the next level. A random forest classifier is used on state-of-the-art features for analyzing affective speech. The confusion matrix from this classification level is analyzed to decide, for each class, which other classes are most confused by using a threshold on the misclassification rate. For the chosen pairs, second level classifiers are built and trained on the same data. Its performance on the training-set (73.3{\%) as well as a non-intersecting training set (72.9{\%) are both better than state-of-the-art performance. We initiate a possible explanation of the performance improvement by considering the confusion among emotions placed on Russel's circumplex model.

Subject(s)

Emotions , Speech , Databases, Factual

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL