Search | VHL Regional Portal

Identifying languages in a novel dataset: ASMR-whispered speech.

Song, Meishu; Yang, Zijiang; Parada-Cabaleiro, Emilia; Jing, Xin; Yamamoto, Yoshiharu; Schuller, Björn.

Front Neurosci ; 17: 1120311, 2023.

Article in English | MEDLINE | ID: mdl-37397449

ABSTRACT

Introduction: The Autonomous Sensory Meridian Response (ASMR) is a combination of sensory phenomena involving electrostatic-like tingling sensations, which emerge in response to certain stimuli. Despite the overwhelming popularity of ASMR in the social media, no open source databases on ASMR related stimuli are yet available, which makes this phenomenon mostly inaccessible to the research community; thus, almost completely unexplored. In this regard, we present the ASMR Whispered-Speech (ASMR-WS) database. Methods: ASWR-WS is a novel database on whispered speech, specifically tailored to promote the development of ASMR-like unvoiced Language Identification (unvoiced-LID) systems. The ASMR-WS database encompasses 38 videos-for a total duration of 10 h and 36 min-and includes seven target languages (Chinese, English, French, Italian, Japanese, Korean, and Spanish). Along with the database, we present baseline results for unvoiced-LID on the ASMR-WS database. Results: Our best results on the seven-class problem, based on segments of 2s length, and on a CNN classifier and MFCC acoustic features, achieved 85.74% of unweighted average recall and 90.83% of accuracy. Discussion: For future work, we would like to focus more deeply on the duration of speech samples, as we see varied results with the combinations applied herein. To enable further research in this area, the ASMR-WS database, as well as the partitioning considered in the presented baseline, is made accessible to the research community.

Computer Audition for Fighting the SARS-CoV-2 Corona Crisis-Introducing the Multitask Speech Corpus for COVID-19.

Qian, Kun; Schmitt, Maximilian; Zheng, Huaiyuan; Koike, Tomoya; Han, Jing; Liu, Juan; Ji, Wei; Duan, Junjun; Song, Meishu; Yang, Zijiang; Ren, Zhao; Liu, Shuo; Zhang, Zixing; Yamamoto, Yoshiharu; Schuller, Bjorn W.

IEEE Internet Things J ; 8(21): 16035-16046, 2021 Nov 01.

Article in English | MEDLINE | ID: mdl-35782182

ABSTRACT

Computer audition (CA) has experienced a fast development in the past decades by leveraging advanced signal processing and machine learning techniques. In particular, for its noninvasive and ubiquitous character by nature, CA-based applications in healthcare have increasingly attracted attention in recent years. During the tough time of the global crisis caused by the coronavirus disease 2019 (COVID-19), scientists and engineers in data science have collaborated to think of novel ways in prevention, diagnosis, treatment, tracking, and management of this global pandemic. On the one hand, we have witnessed the power of 5G, Internet of Things, big data, computer vision, and artificial intelligence in applications of epidemiology modeling, drug and/or vaccine finding and designing, fast CT screening, and quarantine management. On the other hand, relevant studies in exploring the capacity of CA are extremely lacking and underestimated. To this end, we propose a novel multitask speech corpus for COVID-19 research usage. We collected 51 confirmed COVID-19 patients' in-the-wild speech data in Wuhan city, China. We define three main tasks in this corpus, i.e., three-category classification tasks for evaluating the physical and/or mental status of patients, i.e., sleep quality, fatigue, and anxiety. The benchmarks are given by using both classic machine learning methods and state-of-the-art deep learning techniques. We believe this study and corpus cannot only facilitate the ongoing research on using data science to fight against COVID-19, but also the monitoring of contagious diseases for general purpose.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL