Search | VHL Regional Portal

1.

Impact of cochlear implants use on voice production and quality.

Guastamacchia, Angela; Albera, Andrea; Puglisi, Giuseppina Emma; Nudelman, Charles J; Soleimanifar, Simin; Astolfi, Arianna; Aronoff, Justin M; Bottalico, Pasquale.

Sci Rep ; 14(1): 12787, 2024 06 04.

Article in English | MEDLINE | ID: mdl-38834775

ABSTRACT

Cochlear implant users experience difficulties controlling their vocalizations compared to normal hearing peers. However, less is known about their voice quality. The primary aim of the present study was to determine if cochlear implant users' voice quality would be categorized as dysphonic by the Acoustic Voice Quality Index (AVQI) and smoothed cepstral peak prominence (CPPS). A secondary aim was to determine if vocal quality is further impacted when using bilateral implants compared to using only one implant. The final aim was to determine how residual hearing impacts voice quality. Twenty-seven cochlear implant users participated in the present study and were recorded while sustaining a vowel and while reading a standardized passage. These recordings were analyzed to calculate the AVQI and CPPS. The results indicate that CI users' voice quality was detrimentally affected by using their CI, raising to the level of a dysphonic voice. Specifically, when using their CI, mean AVQI scores were 4.0 and mean CPPS values were 11.4 dB, which indicates dysphonia. There were no significant differences in voice quality when comparing participants with bilateral implants to those with one implant. Finally, for participants with residual hearing, as hearing thresholds worsened, the likelihood of a dysphonic voice decreased.

Subject(s)

Cochlear Implants , Voice Quality , Humans , Male , Female , Middle Aged , Aged , Adult , Dysphonia/physiopathology , Speech Acoustics , Cochlear Implantation

2.

Effects of spatial configuration and fundamental frequency on speech intelligibility in multiple-talker conditions in the ipsilateral horizontal plane and median planea).

Yao, Dingding; Zhao, Jiale; Wang, Linyi; Shang, Zengqiang; Gu, Jianjun; Wang, Yunan; Jia, Maoshen; Li, Junfeng.

J Acoust Soc Am ; 155(5): 2934-2947, 2024 May 01.

Article in English | MEDLINE | ID: mdl-38717201

ABSTRACT

Spatial separation and fundamental frequency (F0) separation are effective cues for improving the intelligibility of target speech in multi-talker scenarios. Previous studies predominantly focused on spatial configurations within the frontal hemifield, overlooking the ipsilateral side and the entire median plane, where localization confusion often occurs. This study investigated the impact of spatial and F0 separation on intelligibility under the above-mentioned underexplored spatial configurations. The speech reception thresholds were measured through three experiments for scenarios involving two to four talkers, either in the ipsilateral horizontal plane or in the entire median plane, utilizing monotonized speech with varying F0s as stimuli. The results revealed that spatial separation in symmetrical positions (front-back symmetry in the ipsilateral horizontal plane or front-back, up-down symmetry in the median plane) contributes positively to intelligibility. Both target direction and relative target-masker separation influence the masking release attributed to spatial separation. As the number of talkers exceeds two, the masking release from spatial separation diminishes. Nevertheless, F0 separation remains as a remarkably effective cue and could even facilitate spatial separation in improving intelligibility. Further analysis indicated that current intelligibility models encounter difficulties in accurately predicting intelligibility in scenarios explored in this study.

Subject(s)

Cues , Perceptual Masking , Sound Localization , Speech Intelligibility , Speech Perception , Humans , Female , Male , Young Adult , Adult , Speech Perception/physiology , Acoustic Stimulation , Auditory Threshold , Speech Acoustics , Speech Reception Threshold Test , Noise

3.

Comparing perception of L1 and L2 English by human listeners and machines: Effect of interlocutor adaptationsa).

Vonessen, Jules; Aoki, Nicholas B; Cohn, Michelle; Zellou, Georgia.

J Acoust Soc Am ; 155(5): 3060-3070, 2024 May 01.

Article in English | MEDLINE | ID: mdl-38717210

ABSTRACT

Speakers tailor their speech to different types of interlocutors. For example, speech directed to voice technology has different acoustic-phonetic characteristics than speech directed to a human. The present study investigates the perceptual consequences of human- and device-directed registers in English. We compare two groups of speakers: participants whose first language is English (L1) and bilingual L1 Mandarin-L2 English talkers. Participants produced short sentences in several conditions: an initial production and a repeat production after a human or device guise indicated either understanding or misunderstanding. In experiment 1, a separate group of L1 English listeners heard these sentences and transcribed the target words. In experiment 2, the same productions were transcribed by an automatic speech recognition (ASR) system. Results show that transcription accuracy was highest for L1 talkers for both human and ASR transcribers. Furthermore, there were no overall differences in transcription accuracy between human- and device-directed speech. Finally, while human listeners showed an intelligibility benefit for coda repair productions, the ASR transcriber did not benefit from these enhancements. Findings are discussed in terms of models of register adaptation, phonetic variation, and human-computer interaction.

Subject(s)

Multilingualism , Speech Intelligibility , Speech Perception , Humans , Male , Female , Adult , Young Adult , Speech Acoustics , Phonetics , Speech Recognition Software

4.

Aging affects auditory contributions to focus perception in Jianghuai Mandarina).

Zhao, Xinxian; Yang, Xiaohu.

J Acoust Soc Am ; 155(5): 2990-3004, 2024 May 01.

Article in English | MEDLINE | ID: mdl-38717206

ABSTRACT

Speakers can place their prosodic prominence on any locations within a sentence, generating focus prosody for listeners to perceive new information. This study aimed to investigate age-related changes in the bottom-up processing of focus perception in Jianghuai Mandarin by clarifying the perceptual cues and the auditory processing abilities involved in the identification of focus locations. Young, middle-aged, and older speakers of Jianghuai Mandarin completed a focus identification task and an auditory perception task. The results showed that increasing age led to a decrease in listeners' accuracy rate in identifying focus locations, with all participants performing the worst when dynamic pitch cues were inaccessible. Auditory processing abilities did not predict focus perception performance in young and middle-aged listeners but accounted significantly for the variance in older adults' performance. These findings suggest that age-related deteriorations in focus perception can be largely attributed to declined auditory processing of perceptual cues. Poor ability to extract frequency modulation cues may be the most important underlying psychoacoustic factor for older adults' difficulties in perceiving focus prosody in Jianghuai Mandarin. The results contribute to our understanding of the bottom-up mechanisms involved in linguistic prosody processing in aging adults, particularly in tonal languages.

Subject(s)

Aging , Cues , Speech Perception , Humans , Middle Aged , Aged , Male , Female , Aging/psychology , Aging/physiology , Young Adult , Adult , Speech Perception/physiology , Age Factors , Speech Acoustics , Acoustic Stimulation , Pitch Perception , Language , Voice Quality , Psychoacoustics , Audiometry, Speech

5.

Acoustic cues to femininity and masculinity in spontaneous speech.

Nylén, Fredrik; Holmberg, Jenny; Södersten, Maria.

J Acoust Soc Am ; 155(5): 3090-3100, 2024 May 01.

Article in English | MEDLINE | ID: mdl-38717212

ABSTRACT

The perceived level of femininity and masculinity is a prominent property by which a speaker's voice is indexed, and a vocal expression incongruent with the speaker's gender identity can greatly contribute to gender dysphoria. Our understanding of the acoustic cues to the levels of masculinity and femininity perceived by listeners in voices is not well developed, and an increased understanding of them would benefit communication of therapy goals and evaluation in gender-affirming voice training. We developed a voice bank with 132 voices with a range of levels of femininity and masculinity expressed in the voice, as rated by 121 listeners in independent, individually randomized perceptual evaluations. Acoustic models were developed from measures identified as markers of femininity or masculinity in the literature using penalized regression and tenfold cross-validation procedures. The 223 most important acoustic cues explained 89% and 87% of the variance in the perceived level of femininity and masculinity in the evaluation set, respectively. The median fo was confirmed to provide the primary cue, but other acoustic properties must be considered in accurate models of femininity and masculinity perception. The developed models are proposed to afford communication and evaluation of gender-affirming voice training goals and improve voice synthesis efforts.

Subject(s)

Cues , Speech Acoustics , Speech Perception , Voice Quality , Humans , Female , Male , Adult , Young Adult , Masculinity , Middle Aged , Femininity , Adolescent , Gender Identity , Acoustics

6.

Acoustic analysis of English tense and lax vowels: Comparing the production between Mandarin Chinese learners and native English speakers.

Feng, Hui; Wang, Lijuan.

J Acoust Soc Am ; 155(5): 3071-3089, 2024 May 01.

Article in English | MEDLINE | ID: mdl-38717213

ABSTRACT

This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-Éª/, /u-Ê/, /É-Ê/, and /æ-Îµ/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-Éª/, /u-Ê/, and /É-Ê/ pairs, although they maintained sufficient spectral differences for /æ-Îµ/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.

Subject(s)

Multilingualism , Phonetics , Speech Acoustics , Humans , Male , Female , Young Adult , Speech Production Measurement , Adult , Language , Acoustics , Learning , Voice Quality , Sound Spectrography , East Asian People

7.

Cepstral Peak Prominence Smoothed - CPPS and Acoustic Voice Quality Index - AVQI in healthy and altered children's voices: comparation, relationship with auditory-perceptual judgment and cut-off points. / Cepstral Peak Prominence Smoothed - CPPS e Acoustic Voice Quality Index - AVQI em vozes infantis saudáveis e alteradas: comparação, relação com julgamento perceptivo-auditivo e pontos de corte.

Rabelo, Evelyn Carla Dos Santos; Dassie-Leite, Ana Paula; Ribeiro, Vanessa Veis; Madazio, Glaucya; Behlau, Mara Suzana.

Codas ; 36(4): e20230047, 2024.

Article in Portuguese, English | MEDLINE | ID: mdl-38808777

ABSTRACT

PURPOSE: To compare the acoustic measurements of Cepstral Peak Prominence Smoothed (CPPS) and Acoustic Voice Quality Index (AVQI) of children with normal and altered voices, to relationship with auditory-perceptual judgment (APJ) and to establish cut-off points. METHODS: Vocal recordings of the sustained vowel and number counting tasks of 185 children were selected from a database and submitted to acoustic analysis with extraction of CPPS and AVQI measurements, and to APJ. The APJ was performed individually for each task, classified as normal or altered, and for the tasks together defining whether the child would pass or fail in a situation of vocal screening. RESULTS: Children with altered APJ and who failed the screening had lower CPPS values and higher AVQI values, than those with normal APJ and who passed the screening. The APJ of the sustained vowel task was related to CPPS and AVQI, and APJ of the number counting task was related only to AVQI and CPPS numbers. The cut-off points that differentiate children with and without vocal deviation are 14.07 for the vowel CPPS, 7.62 for the CPPS numbers and 2.01 for the AVQI. CONCLUSION: Children with altered voices, have higher AVQI values and lower CPPS values, when detected in children with voices within the normal range. The acoustic measurements were related to the auditory perceptual judgment of vocal quality in the sustained vowel task, however, the number counting task was related only to the AVQI and CPPS. The cut-off points that differentiate children with and without vocal deviation are 14.07 for the CPPS vowel, 7.62 for the CPPS numbers and 2.01 for the AVQI. The three measures were similar in identifying voices without deviation and dysphonic voices.

OBJETIVO: Comparar as medidas acústicas de Cepstral Peak Prominence Smoothed (CPPS) e Acoustic Voice Quality Index (AVQI) de crianças com vozes normais e alteradas, relacionar com o julgamento perceptivo-auditivo (JPA) da voz e estabelecer pontos de corte. MÉTODO: Gravações vocais das tarefas de vogal sustentada e contagem de números de 185 crianças foram selecionadas em um banco de dados e submetidas a análise acústica com extração das medidas de CPPS e AVQI, e ao JPA. O JPA foi realizado individualmente para cada tarefa e as amostras foram classificadas posteriormente como normal ou alterada, e para as tarefas em conjunto definindo-se se a criança passaria ou falharia em uma situação de triagem vocal. RESULTADOS: Crianças com JPA alterado e que falharam na triagem apresentaram valores menores de CPPS e maiores de AVQI, do que as com JPA normal e que passaram na triagem. O JPA da tarefa de vogal sustentada se relacionou ao CPPS e AVQI, e da tarefa de contagem de números relacionou-se apenas ao AVQI e CPPS números. Os pontos de corte que diferenciam crianças com e sem desvio vocal são 14,07 para o CPPS vogal, 7,62 para o CPPS números e 2,01 para o AVQI. CONCLUSÃO: Crianças com JPA alterado apresentaram maiores valores de AVQI e menores valores de CPPs. O JPA da tarefa de vogal previu todas as medidas acústicas, porém, de contagem previu apenas as medidas extraídas dela. As três medidas foram semelhantes na identificação de vozes sem desvio e vozes disfônicas.

Subject(s)

Speech Acoustics , Voice Quality , Humans , Voice Quality/physiology , Child , Female , Male , Auditory Perception/physiology , Voice Disorders/diagnosis , Voice Disorders/physiopathology , Adolescent , Case-Control Studies , Speech Production Measurement , Judgment

8.

Phrase boundaries lacking word prosody: An articulatory investigation of Seoul Korean.

Jang, Jiyoung; Katsika, Argyro.

J Acoust Soc Am ; 155(5): 3521-3536, 2024 May 01.

Article in English | MEDLINE | ID: mdl-38809098

ABSTRACT

This electromagnetic articulography study explores the kinematic profile of Intonational Phrase boundaries in Seoul Korean. Recent findings suggest that the scope of phrase-final lengthening is conditioned by word- and/or phrase-level prominence. However, evidence comes mainly from head-prominence languages, which conflate positions of word prosody with positions of phrasal prominence. Here, we examine phrase-final lengthening in Seoul Korean, an edge-prominence language with no word prosody, with respect to focus location as an index of phrase-level prominence and Accentual Phrase (AP) length as an index of word demarcation. Results show that phrase-final lengthening extends over the phrase-final syllable. The effect is greater the further away that focus occurs. It also interacts with the domains of AP and prosodic word: lengthening is greater in smaller APs, whereas shortening is observed in the initial gesture of the phrase-final word. Additional analyses of kinematic displacement and peak velocity revealed that Korean phrase-final gestures bear the kinematic profile of IP boundaries concurrently to what is typically considered prominence marking. Based on these results, a gestural coordination account is proposed, in which boundary-related events interact systematically with phrase-level prominence as well as lower prosodic levels, and how this proposal relates to the findings in head-prominence languages is discussed.

Subject(s)

Phonetics , Speech Acoustics , Humans , Male , Female , Young Adult , Biomechanical Phenomena , Adult , Language , Gestures , Speech Production Measurement , Republic of Korea , Voice Quality , Time Factors

9.

Differential spectral characteristics of the Spanish fricative /s/ in the articulation of individuals with dysarthria and apraxia of speech.

Melle, Natalia; Gallego, Carlos; Lahoz-Bengoechea, José María; Nieva, Silvia.

J Commun Disord ; 109: 106428, 2024.

Article in English | MEDLINE | ID: mdl-38744198

ABSTRACT

PURPOSE: This study examines whether there are differences in the speech of speakers with dysarthria, speakers with apraxia and healthy speakers in spectral acoustic measures during production of the central-peninsular Spanish alveolar sibilant fricative /s/. METHOD: To this end, production of the sibilant was analyzed in 20 subjects with dysarthria, 8 with apraxia of speech and 28 healthy speakers. Participants produced 12 sV(C) words. The variables compared across groups were the fricative's spectral amplitude difference (AmpD) and spectral moments in the temporal midpoint of fricative execution. RESULTS: The results indicate that individuals with dysarthria can be distinguished from healthy speakers in terms of the spectral characteristics AmpD, standard deviation (SD), center of gravity (CoG) and skewness, the last two in context with unrounded vowel, while no differences in kurtosis were detected. Participants with AoS group differ significantly from healthy speaker group in AmpD, SD and CoG and Kurtosis, the first one followed unrounded vowel and the latter two followed by rounded vowels. In addition, speakers with apraxia of speech group returned significant differences with respect to speakers with dysarthria group in AmpD, CoG and skewness. CONCLUSIONS: The differences found between the groups in the measures studied as a function of the type of vowel context could provide insights into the distinctive manifestations of motor speech disorders, contributing to the differential diagnosis between apraxia and dysarthria in motor control processes.

Subject(s)

Apraxias , Dysarthria , Speech Acoustics , Humans , Dysarthria/physiopathology , Dysarthria/etiology , Apraxias/physiopathology , Male , Female , Middle Aged , Adult , Aged , Phonetics , Speech Production Measurement

10.

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones.

Nawas, K Khadar; Shahina, A; Balachandar, Keshav; Maadeshwaran, P; Devanathan, N G; Kumar, Navein; Khan, A Nayeemulla.

Sci Rep ; 14(1): 12513, 2024 05 31.

Article in English | MEDLINE | ID: mdl-38822054

ABSTRACT

Speech is produced by a nonlinear, dynamical Vocal Tract (VT) system, and is transmitted through multiple (air, bone and skin conduction) modes, as captured by the air, bone and throat microphones respectively. Speaker specific characteristics that capture this nonlinearity are rarely used as stand-alone features for speaker modeling, and at best have been used in tandem with well known linear spectral features to produce tangible results. This paper proposes Recurrent Plot (RP) embeddings as stand-alone, non-linear speaker-discriminating features. Two datasets, the continuous multimodal TIMIT speech corpus and the consonant-vowel unimodal syllable dataset, are used in this study for conducting closed-set speaker identification experiments. Experiments with unimodal speaker recognition systems show that RP embeddings capture the nonlinear dynamics of the VT system which are unique to every speaker, in all the modes of speech. The Air (A), Bone (B) and Throat (T) microphone systems, trained purely on RP embeddings perform with an accuracy of 95.81%, 98.18% and 99.74%, respectively. Experiments using the joint feature space of combined RP embeddings for bimodal (A-T, A-B, B-T) and trimodal (A-B-T) systems show that the best trimodal system (99.84% accuracy) performs on par with trimodal systems using spectrogram (99.45%) and MFCC (99.98%). The 98.84% performance of the B-T bimodal system shows the efficacy of a speaker recognition system based entirely on alternate (bone and throat) speech, in the absence of the standard (air) speech. The results underscore the significance of the RP embedding, as a nonlinear feature representation of the dynamical VT system that can act independently for speaker recognition. It is envisaged that speech recognition too will benefit from this nonlinear feature.

Subject(s)

Pharynx , Humans , Pharynx/physiology , Speech/physiology , Nonlinear Dynamics , Male , Female , Speech Acoustics , Bone and Bones/physiology , Adult

11.

Immediate effect of inspiratory exercise with exerciser and respiratory encourager in women without vocal complaints. / Efeito imediato de exercício inspiratório com exercitador e incentivador respiratório em mulheres sem queixas vocais.

Beraldo, Alessandra Thais; Batistella, Julia; Martins, Perla do Nascimento; Dassie-Leite, Ana Paula; Pereira, Eliane Cristina.

Codas ; 36(4): e20230148, 2024.

Article in Portuguese, English | MEDLINE | ID: mdl-38775526

ABSTRACT

PURPOSE: To evaluate the immediate effect of the inspiratory exercise with a booster and a respiratory exerciser on the voice of women without vocal complaints. METHODS: 25 women with no vocal complaints, between 18 and 34 years old, with a score of 1 on the Vocal Disorder Screening Index (ITDV) participated. Data collection was performed before and after performing the inspiratory exercise and consisted of recording the sustained vowel /a/, connected speech and maximum phonatory times (MPT) of vowels, fricative phonemes and counting numbers. In the auditory-perceptual judgment, the Vocal Deviation Scale (VSD) was used to verify the general degree of vocal deviation. Acoustic evaluation was performed using the PRAAT software and the parameters fundamental frequency (f0), jitter, shimmer, harmonium-to-noise ratio (HNR), Cepstral Peak Prominence Smoothed (CPPS), Acoustic Voice Quality Index (AVQI) and Acoustic Breathiness Index (ABI). To measure the aerodynamic measurements, the time of each emission was extracted in the Audacity program. Data were statistically analyzed using the Statistica for Windows software and normality was tested using the Shapiro-Wilk test. To compare the results, Student's and Wilcoxon's t tests were applied, adopting a significance level of 5%. RESULTS: There were no significant differences between the results of the JPA and the acoustic measures, in the pre and post inspiratory exercise moments. As for the aerodynamic measures, it was possible to observe a significant increase in the value of the TMF /s/ (p=0.008). CONCLUSION: There was no change in vocal quality after the inspiratory exercise with stimulator and respiratory exerciser, but an increase in the MPT of the phoneme /s/ was observed after the exercise.

OBJETIVO: Avaliar o efeito imediato do exercício inspiratório com incentivador e exercitador respiratório na voz de mulheres sem queixas vocais. MÉTODO: Participaram 25 mulheres sem queixas vocais, entre 18 e 34 anos, com pontuação 1 no Índice de Triagem para Distúrbio Vocal (ITDV). A coleta de dados foi realizada nos momentos antes e após realização de exercício inspiratório e consistiu na gravação de vogal sustentada /a/, fala encadeada e tempos máximos fonatórios (TMF) de vogais, fonemas fricativos e contagem de números. No julgamento perceptivo-auditivo foi utilizada a Escala de Desvio Vocal (EDV) para verificar o grau geral do desvio vocal. Avaliação acústica foi feita no software PRAAT e foram extraídos os parâmetros frequência fundamental (f0), jitter, shimmer, proporção harmônico -ruído (HNR), Cepstral Peak Prominence Smoothed (CPPS), Acoustic Voice Quality Index (AVQI) e Acoustic Breathiness Index (ABI). Para mensuração das medidas aerodinâmicas, o tempo de emissão foi extraído no programa Audacity. Para comparar os resultados utilizou-se o teste paramétrico t de Student para amostras dependentes na análise das variáveis com distribuição normal e o teste de Wilcoxon para variáveis com distribuição não normal. RESULTADOS: Não houve diferenças entre os resultados do JPA e das medidas acústicas, nos momentos pré e pós exercício inspiratório. Quanto às medidas aerodinâmicas foi possível observar aumento significativo no valor do TMF /s/ (p=0,008). CONCLUSÃO: Não houve modificação na qualidade vocal após o exercício inspiratório com incentivador e exercitador respiratório, porém foi observado aumento do TMF do fonema /s/ após a realização do exercício.

Subject(s)

Breathing Exercises , Voice Quality , Humans , Female , Adult , Young Adult , Adolescent , Breathing Exercises/methods , Speech Acoustics , Voice Disorders/physiopathology , Voice Disorders/diagnosis , Phonation/physiology

12.

Clear speech effects in production of sentence-medial Mandarin lexical tonesa).

Rittenberry, Jack; Shport, Irina A.

JASA Express Lett ; 4(5)2024 May 01.

Article in English | MEDLINE | ID: mdl-38804812

ABSTRACT

Adding to limited research on clear speech in tone languages, productions of Mandarin lexical tones were examined in pentasyllabic sentences. Fourteen participants read sentences imagining a hard-of-hearing addressee or a friend in a casual social setting. Tones produced in clear speech had longer duration, higher intensity, and larger F0 values. This style effect was rarely modulated by tone, preceding tonal context, or syllable position, consistent with an overall signal enhancement strategy. Possible evidence for tone enhancement was observed only in one set of analysis for F0 minimum and F0 range, contrasting tones with low targets and tones with high targets.

Subject(s)

Language , Humans , Female , Male , Speech Acoustics , Adult , Young Adult , Speech , Speech Perception/physiology , Phonetics

13.

Validity of Acoustic Measures Obtained Using Various Recording Methods Including Smartphones With and Without Headset Microphones.

Awan, Shaheen N; Bahr, Ruth; Watts, Stephanie; Boyer, Micah; Budinsky, Robert; Bensoussan, Yael.

J Speech Lang Hear Res ; 67(6): 1712-1730, 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38749007

ABSTRACT

PURPOSE: The goal of this study was to assess various recording methods, including combinations of high- versus low-cost microphones, recording interfaces, and smartphones in terms of their ability to produce commonly used time- and spectral-based voice measurements. METHOD: Twenty-four vowel samples representing a diversity of voice quality deviations and severities from a wide age range of male and female speakers were played via a head-and-thorax model and recorded using a high-cost, research standard GRAS 40AF (GRAS Sound & Vibration) microphone and amplification system. Additional recordings were made using various combinations of headset microphones (AKG C555 L [AKG Acoustics GmbH], Shure SM35-XLR [Shure Incorporated], AVID AE-36 [AVID Products, Inc.]) and audio interfaces (Focusrite Scarlett 2i2 [Focusrite Audio Engineering Ltd.] and PC, Focusrite and smartphone, smartphone via a TRRS adapter), as well as smartphones direct (Apple iPhone 13 Pro, Google Pixel 6) using their built-in microphones. The effect of background noise from four different room conditions was also evaluated. Vowel samples were analyzed for measures of fundamental frequency, perturbation, cepstral peak prominence, and spectral tilt (low vs. high spectral ratio). RESULTS: Results show that a wide variety of recording methods, including smartphones with and without a low-cost headset microphone, can effectively track the wide range of acoustic characteristics in a diverse set of typical and disordered voice samples. Although significant differences in acoustic measures of voice may be observed, the presence of extremely strong correlations (rs > .90) with the recording standard implies a strong linear relationship between the results of different methods that may be used to predict and adjust any observed differences in measurement results. CONCLUSION: Because handheld smartphone distance and positioning may be highly variable when used in actual clinical recording situations, smartphone + a low-cost headset microphone is recommended as an affordable recording method that controls mouth-to-microphone distance and positioning and allows both hands to be available for manipulation of the smartphone device.

Subject(s)

Smartphone , Speech Acoustics , Humans , Female , Male , Adult , Young Adult , Speech Production Measurement/instrumentation , Speech Production Measurement/methods , Reproducibility of Results , Voice Quality , Middle Aged , Adolescent

14.

The Accompanying Effect in Responses to Auditory Perturbations: Unconscious Vocal Adjustments to Unperturbed Parameters.

Ning, Li-Hsin; Hui, Tak-Cheung.

J Speech Lang Hear Res ; 67(6): 1731-1751, 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38754028

ABSTRACT

PURPOSE: The present study examined whether participants respond to unperturbed parameters while experiencing specific perturbations in auditory feedback. For instance, we aim to determine if speakers adjust voice loudness when only pitch is artificially altered in auditory feedback. This phenomenon is referred to as the "accompanying effect" in the present study. METHOD: Thirty native Mandarin speakers were asked to sustain the vowel /É/ for 3 s while their auditory feedback underwent single shifts in one of the three distinct ways: pitch shift (±100 cents; coded as PT), loudness shift (±6 dB; coded as LD), or first formant (F1) shift (±100 Hz; coded as FM). Participants were instructed to ignore the perturbations in their auditory feedback. Response types were categorized based on pitch, loudness, and F1 for each individual trial, such as Popp_Lopp_Fopp indicating opposing responses in all three domains. RESULTS: The accompanying effect appeared 93% of the time. Bayesian Poisson regression models indicate that opposing responses in all three domains (Popp_Lopp_Fopp) were the most prevalent response type across the conditions (PT, LD, and FM). The more frequently used response types exhibited opposing responses and significantly larger response curves than the less frequently used response types. Following responses became more prevalent only when the perturbed stimuli were perceived as voices from someone else (external references), particularly in the FM condition. In terms of isotropy, loudness and F1 tended to change in the same direction rather than loudness and pitch. CONCLUSION: The presence of the accompanying effect suggests that the motor systems responsible for regulating pitch, loudness, and formants are not entirely independent but rather interconnected to some degree.

Subject(s)

Bayes Theorem , Pitch Perception , Humans , Male , Female , Young Adult , Pitch Perception/physiology , Adult , Speech Perception/physiology , Loudness Perception/physiology , Feedback, Sensory/physiology , Voice/physiology , Acoustic Stimulation/methods , Speech Acoustics

15.

Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment.

Iob, Naomi Anna; He, Lei; Ternström, Sten; Cai, Huanchen; Brockmann-Bauser, Meike.

J Speech Lang Hear Res ; 67(6): 1660-1681, 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38758676

ABSTRACT

PURPOSE: Literature suggests a dependency of the acoustic metrics, smoothed cepstral peak prominence (CPPS) and harmonics-to-noise ratio (HNR), on human voice loudness and fundamental frequency (F0). Even though this has been explained with different oscillatory patterns of the vocal folds, so far, it has not been specifically investigated. In the present work, the influence of three elicitation levels, calibrated sound pressure level (SPL), F0 and vowel on the electroglottographic (EGG) and time-differentiated EGG (dEGG) metrics hybrid open quotient (OQ), dEGG OQ and peak dEGG, as well as on the acoustic metrics CPPS and HNR, was examined, and their suitability for voice assessment was evaluated. METHOD: In a retrospective study, 29 women with a mean age of 25 years (± 8.9, range: 18-53) diagnosed with structural vocal fold pathologies were examined before and after voice therapy or phonosurgery. Both acoustic and EGG signals were recorded simultaneously during the phonation of the sustained vowels /É/, /i/, and /u/ at three elicited levels of loudness (soft/comfortable/loud) and unconstrained F0 conditions. RESULTS: A linear mixed-model analysis showed a significant effect of elicitation effort levels on peak dEGG, HNR, and CPPS (all p < .01). Calibrated SPL significantly influenced HNR and CPPS (both p < .01). Furthermore, F0 had a significant effect on peak dEGG and CPPS (p < .0001). All metrics showed significant changes with regard to vowel (all p < .05). However, the treatment had no effect on the examined metrics, regardless of the treatment type (surgery vs. voice therapy). CONCLUSIONS: The value of the investigated metrics for voice assessment purposes when sampled without sufficient control of SPL and F0 is limited, in that they are significantly influenced by the phonatory context, be it speech or elicited sustained vowels. Future studies should explore the diagnostic value of new data collation approaches such as voice mapping, which take SPL and F0 effects into account.

Subject(s)

Dysphonia , Speech Acoustics , Humans , Female , Adult , Dysphonia/physiopathology , Dysphonia/therapy , Retrospective Studies , Young Adult , Middle Aged , Adolescent , Voice Quality/physiology , Electrodiagnosis/methods , Glottis/physiopathology , Phonation/physiology , Vocal Cords/physiopathology , Voice Training , Speech Production Measurement/methods

16.

What's special about human speech? A student exercise for comparing speech production between humans and chimpanzees.

Shofner, William P.

J Acoust Soc Am ; 155(5): 3206-3212, 2024 May 01.

Article in English | MEDLINE | ID: mdl-38738937

ABSTRACT

Modern humans and chimpanzees share a common ancestor on the phylogenetic tree, yet chimpanzees do not spontaneously produce speech or speech sounds. The lab exercise presented in this paper was developed for undergraduate students in a course entitled "What's Special About Human Speech?" The exercise is based on acoustic analyses of the words "cup" and "papa" as spoken by Viki, a home-raised, speech-trained chimpanzee, as well as the words spoken by a human. The analyses allow students to relate differences in articulation and vocal abilities between Viki and humans to the known anatomical differences in their vocal systems. Anatomical and articulation differences between humans and Viki include (1) potential tongue movements, (2) presence or absence of laryngeal air sacs, (3) presence or absence of vocal membranes, and (4) exhalation vs inhalation during production.

Subject(s)

Pan troglodytes , Speech Acoustics , Speech , Humans , Animals , Pan troglodytes/physiology , Speech/physiology , Tongue/physiology , Tongue/anatomy & histology , Vocalization, Animal/physiology , Species Specificity , Speech Production Measurement , Larynx/physiology , Larynx/anatomy & histology , Phonetics

17.

Articulatory and acoustic dynamics of fronted back vowels in American English.

Havenhill, Jonathan.

J Acoust Soc Am ; 155(4): 2285-2301, 2024 Apr 01.

Article in English | MEDLINE | ID: mdl-38557735

ABSTRACT

Fronting of the vowels /u, Ê, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, Ê, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda.

Subject(s)

Phonetics , Speech Acoustics , United States , Acoustics , Language , South Carolina

18.

Speech perception as information processing.

Redford, Melissa A.

J Acoust Soc Am ; 155(4): R7-R8, 2024 Apr 01.

Article in English | MEDLINE | ID: mdl-38558083

ABSTRACT

The Reflections series takes a look back on historical articles from The Journal of the Acoustical Society of America that have had a significant impact on the science and practice of acoustics.

Subject(s)

Speech Perception , Acoustics , Speech Acoustics , Cognition

19.

Perceptual formant discrimination during speech movement planning.

Wang, Hantao; Ali, Yusuf; Max, Ludo.

PLoS One ; 19(4): e0301514, 2024.

Article in English | MEDLINE | ID: mdl-38564597

ABSTRACT

Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.

Subject(s)

Auditory Cortex , Speech Perception , Humans , Speech/physiology , Speech Perception/physiology , Acoustics , Movement , Phonetics , Speech Acoustics

20.

Acoustic, phonetic, and phonological features of Drehu vowels.

Torres, Catalina; Li, Weicong; Escudero, Paola.

J Acoust Soc Am ; 155(4): 2612-2626, 2024 Apr 01.

Article in English | MEDLINE | ID: mdl-38629882

ABSTRACT

This study presents an acoustic investigation of the vowel inventory of Drehu (Southern Oceanic Linkage), spoken in New Caledonia. Reportedly, Drehu has a 14 vowel system distinguishing seven vowel qualities and an additional length distinction. Previous phonological descriptions were based on impressionistic accounts showing divergent proposals for two out of seven reported vowel qualities. This study presents the first phonetic investigation of Drehu vowels based on acoustic data from eight speakers. To examine the phonetic correlates of the proposed phonological vowel inventory, multi-point acoustic analyses were used, and vowel inherent spectral change (VISC) was investigated (F1, F2, and F3). Additionally, vowel duration was measured. Contrary to reports from other studies on VISC in monophthongs, we find that monophthongs in Drehu are mostly steady state. We propose a revised vowel inventory and focus on the acoustic description of open-mid /É/ and the central vowel /É/, whose status was previously unclear. Additionally, we find that vowel quality stands orthogonal to vowel quantity by demonstrating that the phonological vowel length distinction is primarily based on a duration cue rather than formant structure. Finally, we report the acoustic properties of the seven vowel qualities that were identified.

Subject(s)

Phonetics , Speech Acoustics , Acoustics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL