Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
J Acoust Soc Am ; 155(3): 1767-1779, 2024 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-38441439

RESUMO

Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970-980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of interruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n = 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restoration may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for "U"-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.


Assuntos
Cognição , Fala , Probabilidade , Software
3.
J Voice ; 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37806902

RESUMO

INTRODUCTION: Singers use a whistle register to sing at a fundamental frequency above 1000 Hz. In previous studies, vocal fold vibrations with or without complete closure and partial vocal fold vibrations were observed depending on the subject. However, the production mechanism of the whistle register is not yet clearly understood because of the limitations of the imaging device for the glottis and subjects. OBJECTIVES: This study aims to examine vocal fold vibrations in a whistle register. METHODS: The dynamic behavior of the glottis was recorded for six singers (four females and two males) using a high-speed digital imaging device with a frame rate above 10,000 fps. Audio signals were recorded simultaneously. The data were analyzed in the form of topography, glottal area waveforms, spectrograms, and phonovibrography to examine spatiotemporal patterns of glottal motion. RESULTS: The vibratory motion of the vocal folds was classified into six patterns. The first pattern was the entire vocal fold vibration with complete closure during the closed phase. The second to fifth was the entire vocal fold vibration without complete closure, where a gap was observed for the full length of the vocal folds for the second, at the posterior part of the glottis for the third, at the anterior for the fourth, and at both ends for the fifth. In the sixth pattern, the vocal folds vibrated partially. Our results support the previous findings on the vibration of the vocal folds. In addition, we identified novel vibratory patterns in the vocal folds. CONCLUSION: We conclude that the production of the whistle register is not just an extension of the falsetto register to the higher fundamental-frequency region; rather, the production mechanism of the whistle register appeared to be diverse as a means of vocalization.

4.
J Voice ; 2022 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-36437171

RESUMO

OBJECTIVES: Auditory-perceptual evaluation frameworks, such as the grade-roughness-breathiness-asthenia-strain (GRBAS) scale, are the gold standard for the quantitative evaluation of pathological voice quality. However, the evaluation is subjective; thus, the ratings lack reproducibility due to inter- and intra-rater variation. Prior researchers have proposed deep-learning-based automatic GRBAS score estimation to address this problem. However, these methods require large amounts of labeled voice data. Therefore, this study investigates the potential of automatic GRBAS estimation using deep learning with smaller amounts of data. METHODS: A dataset consisting of 300 pathological sustained /a/ vowel samples was created and rated by eight experts (200 for training, 50 for validation, and 50 for testing). A neural network model that predicts the probability distribution of GRBAS scores from an onset-to-offset waveform was proposed. Random speed perturbation, random crop, and frequency masking were investigated as data augmentation techniques, and power, instantaneous frequency, and group delay were investigated as time-frequency representations. RESULTS: Five-fold cross-validation was conducted, and the automatic scoring performance was evaluated using the quadratic weighted Cohen's kappa. The results showed that the kappa values of the automatic scoring performance were comparable to those of the inter-rater reliability of experts for all GRBAS items and the intra-rater reliability of experts for items G, B, A, and S. Random speed perturbation was the most effective data augmentation technique overall. When data augmentation was applied, power was the most effective for items G, R, A, and S; for Item B, combining group delay and power yielded additional performance gains. CONCLUSION: The automatic GRBAS scoring achieved by the proposed model using scant labeled data was comparable to that of experts. This suggests that the challenges resulting from insufficient data can be alleviated. The findings of this study can also contribute to performance improvements in other tasks such as automatic voice disorder detection.

5.
J Acoust Soc Am ; 152(2): 970, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-36050149

RESUMO

The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough ( ≥ 4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments.


Assuntos
Inteligibilidade da Fala , Percepção da Fala , Estimulação Acústica , Ruído
6.
J Acoust Soc Am ; 118(1): 428-43, 2005 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16119363

RESUMO

A measurement principle of the three-dimensional electromagnetic articulographic device is presented. The state of the miniature receiver coil is described by five variables representing the position in the three-dimensional coordinate system and the rotation angles relative to it. When the receiver coil is placed in the magnetic field produced from the distributed transmitter coils, its state can be optimally estimated by minimizing the difference between the measured strength of the received signal and the predicted one using the known spatial pattern of the magnetic field. Therefore, the design and calibration of the field function inherently determine the accuracy in estimating the state of the receiver coil. The field function in our method is expressed in the form of a multivariate B spline as a function of position in the three-dimensional space. Because of the piecewise property of the basis function and the freedom in the selection of the rank and the number of basis functions, the spline field function has a superior ability to flexibly and accurately represent the actual magnetic field. Given a set of calibration data, the spline function is designed to form a smooth curved surface interpolating all of these data samples. Then, an iterative procedure is employed to solve the nonlinear estimation problem of the receiver state variables. Because the spline basis function is a polynomial, it is also shown that the calculation of the Jacobian or Hessian required to obtain updated quantities for the state variables can be efficiently performed. Finally, experimental results reveal that the measurement accuracy is about 0.2 mm for a preliminary condition, indicating that the method can achieve the degree of precision required for observing articulatory movements in a three-dimensional space. It is also experimentally shown that the Marquardt method is a better nonlinear programming technique than the Gauss-Newton or Newton-Raphson method for solving the receiver state problem.


Assuntos
Fenômenos Eletromagnéticos , Imageamento Tridimensional , Modelos Teóricos , Testes de Articulação da Fala/métodos , Humanos , Testes de Articulação da Fala/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...