Search | VHL Regional Portal

The role of isochrony in speech perception in noise.

Aubanel, Vincent; Schwartz, Jean-Luc.

Sci Rep ; 10(1): 19580, 2020 11 11.

Article in English | MEDLINE | ID: mdl-33177590

ABSTRACT

The role of isochrony in speech-the hypothetical division of speech units into equal duration intervals-has been the subject of a long-standing debate. Current approaches in neurosciences have brought new perspectives in that debate through the theoretical framework of predictive coding and cortical oscillations. Here we assess the comparative roles of naturalness and isochrony in the intelligibility of speech in noise for French and English, two languages representative of two well-established contrastive rhythm classes. We show that both top-down predictions associated with the natural timing of speech and to a lesser extent bottom-up predictions associated with isochrony at a syllabic timescale improve intelligibility. We found a similar pattern of results for both languages, suggesting that temporal characterisation of speech from different rhythm classes could be unified around a single core speech unit, with neurophysiologically defined duration and linguistically anchored temporal location. Taken together, our results suggest that isochrony does not seem to be a main dimension of speech processing, but may be a consequence of neurobiological processing constraints, manifesting in behavioural performance and ultimately explaining why isochronous stimuli occupy a particular status in speech and human perception in general.

Subject(s)

Speech Perception/physiology , Acoustic Stimulation , Adult , Female , Humans , Language , Male , Noise , Nontherapeutic Human Experimentation , Phonetics , Speech Intelligibility

Speaking to a common tune: Between-speaker convergence in voice fundamental frequency in a joint speech production task.

Aubanel, Vincent; Nguyen, Noël.

PLoS One ; 15(5): e0232209, 2020.

Article in English | MEDLINE | ID: mdl-32365075

ABSTRACT

Recent research on speech communication has revealed a tendency for speakers to imitate at least some of the characteristics of their interlocutor's speech sound shape. This phenomenon, referred to as phonetic convergence, entails a moment-to-moment adaptation of the speaker's speech targets to the perceived interlocutor's speech. It is thought to contribute to setting up a conversational common ground between speakers and to facilitate mutual understanding. However, it remains uncertain to what extent phonetic convergence occurs in voice fundamental frequency (F0), in spite of the major role played by pitch, F0's perceptual correlate, as a conveyor of both linguistic information and communicative cues associated with the speaker's social/individual identity and emotional state. In the present work, we investigated to what extent two speakers converge towards each other with respect to variations in F0 in a scripted dialogue. Pairs of speakers jointly performed a speech production task, in which they were asked to alternately read aloud a written story divided into a sequence of short reading turns. We devised an experimental set-up that allowed us to manipulate the speakers' F0 in real time across turns. We found that speakers tended to imitate each other's changes in F0 across turns that were both limited in amplitude and spread over large temporal intervals. This shows that, at the perceptual level, speakers monitor slow-varying movements in their partner's F0 with high accuracy and, at the production level, that speakers exert a very fine-tuned control on their laryngeal vibrator in order to imitate these F0 variations. Remarkably, F0 convergence across turns was found to occur in spite of the large melodic variations typically associated with reading turns. Our study sheds new light on speakers' perceptual tracking of F0 in speech processing, and the impact of this perceptual tracking on speech production.

Subject(s)

Imitative Behavior/physiology , Speech/physiology , Adult , Algorithms , Female , Humans , Middle Aged , Phonetics , Speech Perception , Young Adult

Cued Speech Enhances Speech-in-Noise Perception.

Bayard, Clémence; Machart, Laura; Strauß, Antje; Gerber, Silvain; Aubanel, Vincent; Schwartz, Jean-Luc.

J Deaf Stud Deaf Educ ; 24(3): 223-233, 2019 07 01.

Article in English | MEDLINE | ID: mdl-30809665

ABSTRACT

Speech perception in noise remains challenging for Deaf/Hard of Hearing people (D/HH), even fitted with hearing aids or cochlear implants. The perception of sentences in noise by 20 implanted or aided D/HH subjects mastering Cued Speech (CS), a system of hand gestures complementing lip movements, was compared with the perception of 15 typically hearing (TH) controls in three conditions: audio only, audiovisual, and audiovisual + CS. Similar audiovisual scores were obtained for signal-to-noise ratios (SNRs) 11 dB higher in D/HH participants compared with TH ones. Adding CS information enabled D/HH participants to reach a mean score of 83% in the audiovisual + CS condition at a mean SNR of 0 dB, similar to the usual audio score for TH participants at this SNR. This confirms that the combination of lipreading and Cued Speech system remains extremely important for persons with hearing loss, particularly in adverse hearing conditions.

Subject(s)

Deafness/psychology , Noise , Persons With Hearing Impairments/psychology , Speech Perception/physiology , Speech/physiology , Acoustic Stimulation , Adolescent , Adult , Child , Cues , Female , Humans , Lipreading , Male , Perceptual Masking/physiology , Photic Stimulation , Young Adult

Temporal factors in cochlea-scaled entropy and intensity-based intelligibility predictions.

Aubanel, Vincent; Cooke, Martin; Davis, Chris; Kim, Jeesun.

J Acoust Soc Am ; 143(6): EL443, 2018 06.

Article in English | MEDLINE | ID: mdl-29960455

ABSTRACT

Cochlea-scaled entropy (CSE) was proposed as a signal-based metric for automatic detection of speech regions most important for intelligibility, but its proposed superiority over traditional linguistic and psychoacoustical characterisations was not subsequently confirmed. This paper shows that the CSE concept is closely related to intensity and as such captures similar speech regions. However, a slight but significant advantage of a CSE over an intensity-based characterisation was observed, associated with a time difference between the two metrics, suggesting that the CSE index may capture dynamical properties of the speech signal crucial for intelligibility.

Subject(s)

Cochlea/physiology , Cues , Speech Intelligibility , Speech Perception , Time Perception , Acoustic Stimulation , Audiometry, Speech , Female , Humans , Male , Recognition, Psychology , Time Factors

Effects of linear and nonlinear speech rate changes on speech intelligibility in stationary and fluctuating maskers.

Cooke, Martin; Aubanel, Vincent.

J Acoust Soc Am ; 141(6): 4126, 2017 06.

Article in English | MEDLINE | ID: mdl-28618803

ABSTRACT

Algorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identified keywords in sentences that had undergone linear and nonlinear speech rate changes resulting in overall temporal lengthening in the presence of stationary and fluctuating maskers. Relative to unmodified speech, a slower speech rate produced no intelligibility gains for the stationary masker, suggesting that a reduction in information rate does not underlie intelligibility benefits of durationally modified speech. However, both linear and nonlinear modifications led to substantial intelligibility increases in fluctuating noise. One possibility is that overall increases in speech duration provide no new phonetic information in stationary masking conditions, but that temporal fluctuations in the background increase the likelihood of glimpsing additional salient speech cues. Alternatively, listeners may have benefitted from an increase in the difference in speech rates between the target and background.

Subject(s)

Cues , Noise/adverse effects , Perceptual Masking , Speech Acoustics , Speech Intelligibility , Speech Perception , Voice Quality , Acoustic Stimulation/methods , Adult , Audiometry, Speech , Auditory Threshold , Female , Humans , Linear Models , Male , Nonlinear Dynamics , Phonetics , Time Factors , Young Adult

Exploring the Role of Brain Oscillations in Speech Perception in Noise: Intelligibility of Isochronously Retimed Speech.

Aubanel, Vincent; Davis, Chris; Kim, Jeesun.

Front Hum Neurosci ; 10: 430, 2016.

Article in English | MEDLINE | ID: mdl-27630552

ABSTRACT

A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

The Sharvard Corpus: a phonemically-balanced Spanish sentence resource for audiology.

Aubanel, Vincent; Lecumberri, Maria Luisa García; Cooke, Martin.

Int J Audiol ; 53(9): 633-8, 2014 Sep.

Article in English | MEDLINE | ID: mdl-24863133

ABSTRACT

OBJECTIVE: The current study describes the collection of a new phonemically-balanced Spanish sentence resource, known as the Sharvard Corpus. DESIGN: The resource contains 700 sentences inspired by the original English Harvard sentences along with speech recordings from a male and female native peninsular Spanish talker. Sentences each contain five keywords for scoring and are grouped into 70 lists of 10 sentences using an automatic phoneme-balancing procedure. STUDY SAMPLE: Twenty-three native Spanish listeners identified keywords in the Sharvard sentences in speech-shaped noise. RESULTS: Psychometric functions for the Sharvard sentences indicate mean speech reception thresholds of -6.07 and -6.24 dB, and slopes of 10.53 and 11.03 percentage points per dB at the 50% keywords correct point for male and female talkers respectively. CONCLUSIONS: The resulting open source collection of Spanish sentence material for speech perception testing is available online.

Subject(s)

Phonetics , Speech Acoustics , Speech Perception , Speech Reception Threshold Test/methods , Voice Quality , Acoustic Stimulation , Adult , Female , Humans , Male , Middle Aged , Noise/adverse effects , Perceptual Masking , Psychometrics , Recognition, Psychology , Spain , Speech Intelligibility , Speech Production Measurement

Strategies adopted by talkers faced with fluctuating and competing-speech maskers.

Aubanel, Vincent; Cooke, Martin.

J Acoust Soc Am ; 134(4): 2884-94, 2013 Oct.

Article in English | MEDLINE | ID: mdl-24116425

ABSTRACT

Studying how interlocutors exchange information efficiently during conversations in less-than-ideal acoustic conditions promises to both further the understanding of links between perception and production and inform the design of human-computer dialogue systems. The current study explored how interlocutors' speech changes in the presence of fluctuating noise. Pairs of talkers were recorded while solving puzzles cooperatively in quiet and with modulated-noise or competing speech maskers whose silent intervals were manipulated to produce either temporally sparse or dense maskers. Talkers responded to masked conditions by both increasing the amount of speech produced and locally changing their speech activity patterns, resulting in a net reduction in the proportion of speech in temporal overlap with the maskers, with larger relative reductions for sparse maskers. An analysis of talker activity in the vicinity of masker onset and offset events showed a significant reduction in onsets following masker onsets, and a similar increase in onsets following masker offsets. These findings demonstrate that talkers are sensitive to masking noise and respond to its fluctuations by adopting a "wait-and-talk" strategy.

Subject(s)

Noise/adverse effects , Perceptual Masking , Speech Acoustics , Speech Perception , Voice , Adaptation, Psychological , Adolescent , Cooperative Behavior , Cues , Female , Humans , Problem Solving , Sound Spectrography , Speech Production Measurement , Time Factors , Young Adult

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL