Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Sensors (Basel) ; 22(9)2022 May 06.
Article in English | MEDLINE | ID: mdl-35591224

ABSTRACT

In this paper, we introduce an approach for future frames prediction based on a single input image. Our method is able to generate an entire video sequence based on the information contained in the input frame. We adopt an autoregressive approach in our generation process, i.e., the output from each time step is fed as the input to the next step. Unlike other video prediction methods that use "one shot" generation, our method is able to preserve much more details from the input image, while also capturing the critical pixel-level changes between the frames. We overcome the problem of generation quality degradation by introducing a "complementary mask" module in our architecture, and we show that this allows the model to only focus on the generation of the pixels that need to be changed, and to reuse those that should remain static from its previous frame. We empirically validate our methods against various video prediction models on the UT Dallas Dataset, and show that our approach is able to generate high quality realistic video sequences from one static input image. In addition, we also validate the robustness of our method by testing a pre-trained model on the unseen ADFES facial expression dataset. We also provide qualitative results of our model tested on a human action dataset: The Weizmann Action database.


Subject(s)
Algorithms , Databases, Factual , Humans
2.
Sensors (Basel) ; 23(1)2022 Dec 29.
Article in English | MEDLINE | ID: mdl-36616980

ABSTRACT

Music is capable of conveying many emotions. The level and type of emotion of the music perceived by a listener, however, is highly subjective. In this study, we present the Music Emotion Recognition with Profile information dataset (MERP). This database was collected through Amazon Mechanical Turk (MTurk) and features dynamical valence and arousal ratings of 54 selected full-length songs. The dataset contains music features, as well as user profile information of the annotators. The songs were selected from the Free Music Archive using an innovative method (a Triple Neural Network with the OpenSmile toolkit) to identify 50 songs with the most distinctive emotions. Specifically, the songs were chosen to fully cover the four quadrants of the valence-arousal space. Four additional songs were selected from the DEAM dataset to act as a benchmark in this study and filter out low quality ratings. A total of 452 participants participated in annotating the dataset, with 277 participants remaining after thoroughly cleaning the dataset. Their demographic information, listening preferences, and musical background were recorded. We offer an extensive analysis of the resulting dataset, together with a baseline emotion prediction model based on a fully connected model and an LSTM model, for our newly proposed MERP dataset.


Subject(s)
Music , Humans , Arousal , Auditory Perception , Emotions , Music/psychology , Neural Networks, Computer
3.
Sensors (Basel) ; 21(24)2021 Dec 14.
Article in English | MEDLINE | ID: mdl-34960450

ABSTRACT

In this paper, we tackle the problem of predicting the affective responses of movie viewers, based on the content of the movies. Current studies on this topic focus on video representation learning and fusion techniques to combine the extracted features for predicting affect. Yet, these typically, while ignoring the correlation between multiple modality inputs, ignore the correlation between temporal inputs (i.e., sequential features). To explore these correlations, a neural network architecture-namely AttendAffectNet (AAN)-uses the self-attention mechanism for predicting the emotions of movie viewers from different input modalities. Particularly, visual, audio, and text features are considered for predicting emotions (and expressed in terms of valence and arousal). We analyze three variants of our proposed AAN: Feature AAN, Temporal AAN, and Mixed AAN. The Feature AAN applies the self-attention mechanism in an innovative way on the features extracted from the different modalities (including video, audio, and movie subtitles) of a whole movie to, thereby, capture the relationships between them. The Temporal AAN takes the time domain of the movies and the sequential dependency of affective responses into account. In the Temporal AAN, self-attention is applied on the concatenated (multimodal) feature vectors representing different subsequent movie segments. In the Mixed AAN, we combine the strong points of the Feature AAN and the Temporal AAN, by applying self-attention first on vectors of features obtained from different modalities in each movie segment and then on the feature representations of all subsequent (temporal) movie segments. We extensively trained and validated our proposed AAN on both the MediaEval 2016 dataset for the Emotional Impact of Movies Task and the extended COGNIMUSE dataset. Our experiments demonstrate that audio features play a more influential role than those extracted from video and movie subtitles when predicting the emotions of movie viewers on these datasets. The models that use all visual, audio, and text features simultaneously as their inputs performed better than those using features extracted from each modality separately. In addition, the Feature AAN outperformed other AAN variants on the above-mentioned datasets, highlighting the importance of taking different features as context to one another when fusing them. The Feature AAN also performed better than the baseline models when predicting the valence dimension.


Subject(s)
Emotions , Motion Pictures , Arousal , Neural Networks, Computer
4.
Sensors (Basel) ; 21(16)2021 Aug 18.
Article in English | MEDLINE | ID: mdl-34450996

ABSTRACT

Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). To train a deep neural network model, we collected a new dataset of cough sounds, labelled with a clinician's diagnosis. The chosen model is a bidirectional long-short-term memory network (BiLSTM) based on Mel-Frequency Cepstral Coefficients (MFCCs) features. The resulting trained model when trained for classifying two classes of coughs-healthy or pathology (in general or belonging to a specific respiratory pathology)-reaches accuracy exceeding 84% when classifying the cough to the label provided by the physicians' diagnosis. To classify the subject's respiratory pathology condition, results of multiple cough epochs per subject were combined. The resulting prediction accuracy exceeds 91% for all three respiratory pathologies. However, when the model is trained to classify and discriminate among four classes of coughs, overall accuracy dropped: one class of pathological coughs is often misclassified as the other. However, if one considers the healthy cough classified as healthy and pathological cough classified to have some kind of pathology, then the overall accuracy of the four-class model is above 84%. A longitudinal study of MFCC feature space when comparing pathological and recovered coughs collected from the same subjects revealed the fact that pathological coughs, irrespective of the underlying conditions, occupy the same feature space making it harder to differentiate only using MFCC features.


Subject(s)
Asthma , Cough , Asthma/diagnosis , Child , Cough/diagnosis , Humans , Longitudinal Studies , Neural Networks, Computer , Respiratory Sounds/diagnosis , Sound
5.
J Acoust Soc Am ; 148(3): EL253, 2020 09.
Article in English | MEDLINE | ID: mdl-33003873

ABSTRACT

Cough is a common symptom presenting in asthmatic children. In this investigation, an audio-based classification model is presented that can differentiate between healthy and asthmatic children, based on the combination of cough and vocalised /ɑ:/ sounds. A Gaussian mixture model using mel-frequency cepstral coefficients and constant-Q cepstral coefficients was trained. When comparing the predicted labels with the clinician's diagnosis, this cough sound model reaches an overall accuracy of 95.3%. The vocalised /ɑ:/ model reaches an accuracy of 72.2%, which is still significant because the dataset contains only 333 /ɑ:/ sounds versus 2029 cough sounds.


Subject(s)
Cough , Sound , Child , Cough/diagnosis , Humans , Normal Distribution , Sound Spectrography
6.
Front Psychol ; 9: 2292, 2018.
Article in English | MEDLINE | ID: mdl-30534100

ABSTRACT

Practice is an essential part of music training, but critical content-based analyses of practice behaviors still lack tools for conveying informative representation of practice sessions. To bridge this gap, we present a novel visualization system, the Music Practice Browser, for representing, identifying, and analysing music practice behaviors. The Music Practice Browser provides a graphical interface for reviewing recorded practice sessions, which allows musicians, teachers, and researchers to examine aspects and features of music practice behaviors. The system takes beat and practice segment information together with a musical score in XML format as input, and produces a number of different visualizations: Practice Session Work Maps give an overview of contiguous practice segments; Practice Segment Arcs make evident transitions and repeated segments; Practice Session Precision Maps facilitate the identifying of errors; Tempo-Loudness Evolution Graphs track expressive variations over the course of a practice session. We then test the new system on practice sessions of pianists of varying levels of expertise ranging from novice to expert. The practice patterns found include Drill-Correct, Drill-Smooth, Memorization Strategy, Review and Explore, and Expressive Evolution. The analysis reveals practice patterns and behavior differences between beginners and experts, such as a higher proportion of Drill-Smooth patterns in expert practice.

7.
J Acoust Soc Am ; 143(6): 3300, 2018 06.
Article in English | MEDLINE | ID: mdl-29960505

ABSTRACT

In many applications, it is desirable to achieve a signal that is as close as possible to ideal white noise. One example is in the design of an artificial reverberator, whereby there is a need for its lossless prototype output from an impulse input to be perceptually white as much as possible. The Ljung-Box test, the Drouiche test, and the Wiener Entropy-also called the Spectral Flatness Measure-are three well-known methods for quantifying the similarity of a given signal to ideal white noise. In this paper, listening tests are conducted to measure the Just Noticeable Difference (JND) on the perception of white noise, which is the JND between ideal Gaussian white noise and noise with a specified deviation from the flat spectrum. This paper reports the JND values using one of these measures of whiteness, which is the Ljung-Box test. This paper finds considerable disagreement between the Ljung-Box test and the other two methods and shows that none of the methods is a significantly better predictor of listeners' perception of whiteness. This suggests a need for a whiteness test that is more closely correlated to human perception.

8.
Front Psychol ; 7: 1999, 2016.
Article in English | MEDLINE | ID: mdl-28119641

ABSTRACT

An empirical investigation of how local harmonic structures (e.g., chord progressions) contribute to the experience and enjoyment of uplifting trance (UT) music is presented. The connection between rhythmic and percussive elements and resulting trance-like states has been highlighted by musicologists, but no research, to our knowledge, has explored whether repeated harmonic elements influence affective responses in listeners of trance music. Two alternative hypotheses are discussed, the first highlighting the direct relationship between repetition/complexity and enjoyment, and the second based on the theoretical inverted-U relationship described by the Wundt curve. We investigate the connection between harmonic structure and subjective enjoyment through interdisciplinary behavioral and computational methods: First we discuss an experiment in which listeners provided enjoyment ratings for computer-generated UT anthems with varying levels of harmonic repetition and complexity. The anthems were generated using a statistical model trained on a corpus of 100 uplifting trance anthems created for this purpose, and harmonic structure was constrained by imposing particular repetition structures (semiotic patterns defining the order of chords in the sequence) on a professional UT music production template. Second, the relationship between harmonic structure and enjoyment is further explored using two computational approaches, one based on average Information Content, and another that measures average tonal tension between chords. The results of the listening experiment indicate that harmonic repetition does in fact contribute to the enjoyment of uplifting trance music. More compelling evidence was found for the second hypothesis discussed above, however some maximally repetitive structures were also preferred. Both computational models provide evidence for a Wundt-type relationship between complexity and enjoyment. By systematically manipulating the structure of chord progressions, we have discovered specific harmonic contexts in which repetitive or complex structure contribute to the enjoyment of uplifting trance music.

SELECTION OF CITATIONS
SEARCH DETAIL
...