Pesquisa | Portal Regional da BVS

Dataset of directional room impulse responses for realistic speech data.

Fragner, Stefan; Pfeifenberger, Lukas; Hagmüller, Martin; Pernkopf, Franz.

Data Brief ; 53: 110229, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38445201

RESUMO

Obtaining real-world multi-channel speech recordings is expensive and time-consuming. Therefore, multi-channel recordings are often artificially generated by convolving existing monaural speech recordings with simulated Room Impulse Responses (RIRs) from a so-called shoebox room [1] for stationary (not moving) speakers. Far-field speech processing for home automation or smart assistants have to cope with moving speakers in reverberant environments. With this dataset, we aim to support the generation of realistic speech data by providing multiple directional RIRs along a fine grid of locations in a real room. We provide directional RIR recordings for a classroom and a large corridor. These RIRs can be used to simulate moving speakers by generating random trajectories on that grid, and quantize the trajectories along the grid points. For each matching grid point, the monaural speech recording can be convolved with the RIR at this grid point. Then, the spatialized recording can be compiled using the overlap-add method for each grid point [2]. An example is provided with the data.

Towards Objective Voice Assessment: The Diplophonia Diagram.

Aichinger, Philipp; Roesner, Imme; Schneider-Stickler, Berit; Leonhard, Matthias; Denk-Linnert, Doris-Maria; Bigenzahn, Wolfgang; Fuchs, Anna Katharina; Hagmüller, Martin; Kubin, Gernot.

J Voice ; 31(2): 253.e17-253.e26, 2017 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-27473932

RESUMO

OBJECTIVES: Diplophonia is an often misinterpreted symptom of disordered voice, and needs objectification. An audio signal processing algorithm for the detection of diplophonia is proposed. Diplophonia is produced by two distinct oscillators, which yield a profound physiological interpretation. The algorithm's performance is compared with the clinical standard parameter degree of subharmonics (DSH). STUDY DESIGN: This is a prospective study. METHODS: A total of 50 dysphonic subjects with (28 with diplophonia and 22 without diplophonia) and 30 subjects with euphonia were included in the study. From each subject, up to five sustained phonations were recorded during rigid telescopic high-speed video laryngoscopy. A total of 185 phonations were split up into 285 analysis segments of homogeneous voice qualities. In accordance to the clinical group allocation, the considered segmental voice qualities were (1) diplophonic, (2) dysphonic without diplophonia, and (3) euphonic. The Diplophonia Diagram is a scatter plot that relates the one-oscillator synthesis quality (SQ1) to the two-oscillator synthesis quality (SQ2). Multinomial logistic regression is used to distinguish between diplophonic and nondiplophonic segments. RESULTS: Diplophonic segments can be well distinguished from nondiplophonic segments in the Diplophonia Diagram because two-oscillator synthesis is more appropriate for imitating diplophonic signals than one-oscillator synthesis. The detection of diplophonia using the Diplophonia Diagram clearly outperforms the DSH by means of positive likelihood ratios (56.8 versus 3.6). CONCLUSIONS: The diagnostic accuracy of the newly proposed method for detecting diplophonia is superior to the DSH approach, which should be taken into account for future clinical and scientific work.

Assuntos

Acústica , Algoritmos , Fonação , Processamento de Sinais Assistido por Computador , Distúrbios da Voz/diagnóstico , Qualidade da Voz , Humanos , Laringoscopia/métodos , Modelos Logísticos , Análise Multivariada , Reconhecimento Automatizado de Padrão , Valor Preditivo dos Testes , Estudos Prospectivos , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Espectrografia do Som , Fatores de Tempo , Gravação em Vídeo , Distúrbios da Voz/fisiopatologia

Diplophonia Disturbs Jitter and Shimmer Measurement.

Aichinger, Philipp; Hagmüller, Martin; Roesner, Imme; Bigenzahn, Wolfgang; Schneider-Stickler, Berit; Schoentgen, Jean.

Folia Phoniatr Logop ; 68(1): 22-8, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27439009

RESUMO

OBJECTIVES: The aims of this study are to investigate the effects of diplophonia on jitter and shimmer and to identify measurement limitations with regard to material selection and clinical interpretation. MATERIALS AND METHODS: Four hundred and ninety-eight audio samples of sustained phonations were analyzed. The audio samples were assessed for the grade of hoarseness and the presence of diplophonia. Jitter and shimmer were reported with regard to perceptual ratings. We investigated cycle marker positions exemplarily and qualitatively to understand their implications for perturbation measurements. RESULTS: Medians of jitter and shimmer were higher for diplophonic voices than for nondiplophonic voices with equal grades of hoarseness. The variance of jitter for moderately dysphonic voices was larger than the variance observed in a corpus from which diplophonic samples had been discarded. The positions of cycle markers in diplophonic voices did not match the positions of the pulses, indicating that the validity of jitter and shimmer values for these voices were questionable. CONCLUSION: Diplophonia biases the reporting of dysphonia severity via perturbation measures, and their validity is questionable for these voices. In addition, diplophonia is an influential source of variance in jitter measurements. Thus, diplophonic fragments of voice samples should be excluded prior to perturbation analysis.

Assuntos

Fonação , Qualidade da Voz , Disfonia , Humanos , Acústica da Fala , Voz , Distúrbios da Voz

Speech watermarking: an approach for the forensic analysis of digital telephonic recordings.

Faundez-Zanuy, Marcos; Lucena-Molina, Jose J; Hagmüller, Martin.

J Forensic Sci ; 55(4): 1080-7, 2010 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-20412360

RESUMO

In this article, the authors discuss the problem of forensic authentication of digital audio recordings. Although forensic audio has been addressed in several articles, the existing approaches are focused on analog magnetic recordings, which are less prevalent because of the large amount of digital recorders available on the market (optical, solid state, hard disks, etc.). An approach based on digital signal processing that consists of spread spectrum techniques for speech watermarking is presented. This approach presents the advantage that the authentication is based on the signal itself rather than the recording format. Thus, it is valid for usual recording devices in police-controlled telephone intercepts. In addition, our proposal allows for the introduction of relevant information such as the recording date and time and all the relevant data (this is not always possible with classical systems). Our experimental results reveal that the speech watermarking procedure does not interfere in a significant way with the posterior forensic speaker identification.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA