Pesquisa | Portal Regional da BVS (teste)

Open set classification strategies for long-term environmental field recordings for bird species recognition.

Morgan, Mallory M; Braasch, Jonas.

J Acoust Soc Am ; 151(6): 4028, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35778198

RESUMO

Deep learning is one established tool for carrying out classification tasks on complex, multi-dimensional data. Since audio recordings contain a frequency and temporal component, long-term monitoring of bioacoustics recordings is made more feasible with these computational frameworks. Unfortunately, these neural networks are rarely designed for the task of open set classification in which examples belonging to the training classes must not only be correctly classified but also crucially separated from any spurious or unknown classes. To combat this reliance on closed set classifiers which are singularly inappropriate for monitoring applications in which many non-relevant sounds are likely to be encountered, the performance of several open set classification frameworks is compared on environmental audio datasets recorded and published within this work, containing both biological and anthropogenic sounds. The inference-based open set classification techniques include prediction score thresholding, distance-based thresholding, and OpenMax. Each open set classification technique is evaluated under multi-, single-, and cross-corpus scenarios for two different types of unknown data, configured to highlight common challenges inherent to real-world classification tasks. The performance of each method is highly dependent upon the degree of similarity between the training, testing, and unknown domain.

Assuntos

Redes Neurais de Computação , Som , Animais , Aves

Classifying the emotional speech content of participants in group meetings using convolutional long short-term memory network.

Morgan, Mallory M; Bhattacharya, Indrani; Radke, Richard J; Braasch, Jonas.

J Acoust Soc Am ; 149(2): 885, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33639830

RESUMO

Emotion is a central component of verbal communication between humans. Due to advances in machine learning and the development of affective computing, automatic emotion recognition is increasingly possible and sought after. To examine the connection between emotional speech and significant group dynamics perceptions, such as leadership and contribution, a new dataset (14 group meetings, 45 participants) is collected for analyzing collaborative group work based on the lunar survival task. To establish a training database, each participant's audio is manually annotated both categorically and along a three-dimensional scale with axes of activation, dominance, and valence and then converted to spectrograms. The performance of several neural network architectures for predicting speech emotion are compared for two tasks: categorical emotion classification and 3D emotion regression using multitask learning. Pretraining each neural network architecture on the well-known IEMOCAP (Interactive Emotional Dyadic Motion Capture) corpus improves the performance on this new group dynamics dataset. For both tasks, the two-dimensional convolutional long short-term memory network achieves the highest overall performance. By regressing the annotated emotions against post-task questionnaire variables for each participant, it is shown that the emotional speech content of a meeting can predict 71% of perceived group leaders and 86% of major contributors.

Assuntos

Memória de Curto Prazo , Fala , Emoções , Processos Grupais , Humanos , Redes Neurais de Computação

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA