Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Front Hum Neurosci ; 18: 1399316, 2024.
Article in English | MEDLINE | ID: mdl-38903407

ABSTRACT

A quick correction mechanism of the tongue has been formerly experimentally observed in speech posture stabilization in response to a sudden tongue stretch perturbation. Given its relatively short latency (< 150 ms), the response could be driven by somatosensory feedback alone. The current study assessed this hypothesis by examining whether this response is induced in the absence of auditory feedback. We compared the response under two auditory conditions: with normal versus masked auditory feedback. Eleven participants were tested. They were asked to whisper the vowel /e/ for a few seconds. The tongue was stretched horizontally with step patterns of force (1 N during 1 s) using a robotic device. The articulatory positions were recorded using electromagnetic articulography simultaneously with the produced sound. The tongue perturbation was randomly and unpredictably applied in one-fifth of trials. The two auditory conditions were tested in random order. A quick compensatory response was induced in a similar way to the previous study. We found that the amplitudes of the compensatory responses were not significantly different between the two auditory conditions, either for the tongue displacement or for the produced sounds. These results suggest that the observed quick correction mechanism is primarily based on somatosensory feedback. This correction mechanism could be learned in such a way as to maintain the auditory goal on the sole basis of somatosensory feedback.

2.
PLoS One ; 18(2): e0276691, 2023.
Article in English | MEDLINE | ID: mdl-36735662

ABSTRACT

OBJECTIVES: The study aims to better understand the rhythmic abilities of people who stutter and to identify which processes potentially are impaired in this population: (1) beat perception and reproduction; (2) the execution of movements, in particular their initiation; (3) sensorimotor integration. MATERIAL AND METHOD: Finger tapping behavior of 16 adults who stutter (PWS) was compared with that of 16 matching controls (PNS) in five rhythmic tasks of various complexity: three synchronization tasks - a simple 1:1 isochronous pattern, a complex non-isochronous pattern, and a 4 tap:1 beat isochronous pattern -, a reaction task to an aperiodic and unpredictable pattern, and a reproduction task of an isochronous pattern after passively listening. RESULTS: PWS were able to reproduce an isochronous pattern on their own, without external auditory stimuli, with similar accuracy as PNS, but with increased variability. This group difference in variability was observed immediately after passive listening, without prior motor engagement, and was not enhanced or reduced after several seconds of tapping. Although PWS showed increased tapping variability in the reproduction task as well as in synchronization tasks, this timing variability did not correlate significantly with the variability in reaction times or tapping force. Compared to PNS, PWS exhibited larger negative mean asynchronies, and increased synchronization variability in synchronization tasks. These group differences were not affected by beat hierarchy (i.e., "strong" vs. "weak" beats), pattern complexity (non-isochronous vs. isochronous) or presence versus absence of external auditory stimulus (1:1 vs. 1:4 isochronous pattern). Differences between PWS and PNS were not enhanced or reduced with sensorimotor learning, over the first taps of a synchronization task. CONCLUSION: Our observations support the hypothesis of a deficit in neuronal oscillators coupling in production, but not in perception, of rhythmic patterns, and a larger delay in multi-modal feedback processing for PWS.


Subject(s)
Auditory Perception , Stuttering , Humans , Adult , Auditory Perception/physiology , Movement/physiology , Reaction Time , Auscultation , Cognition
3.
J Acoust Soc Am ; 149(1): 191, 2021 01.
Article in English | MEDLINE | ID: mdl-33514144

ABSTRACT

Acoustic characteristics, lingual and labial articulatory dynamics, and ventilatory behaviors were studied on a beatboxer producing twelve drum sounds belonging to five main categories of his repertoire (kick, snare, hi-hat, rimshot, cymbal). Various types of experimental data were collected synchronously (respiratory inductance plethysmography, electroglottography, electromagnetic articulography, and acoustic recording). Automatic unsupervised classification was successfully applied on acoustic data with t-SNE spectral clustering technique. A cluster purity value of 94% was achieved, showing that each sound has a specific acoustic signature. Acoustical intensity of sounds produced with the humming technique was found to be significantly lower than their non-humming counterparts. For these sounds, a dissociation between articulation and breathing was observed. Overall, a wide range of articulatory gestures was observed, some of which were non-linguistic. The tongue was systematically involved in the articulation of the explored beatboxing sounds, either as the main articulator or as accompanying the lip dynamics. Two pulmonic and three non-pulmonic airstream mechanisms were identified. Ejectives were found in the production of all the sounds with bilabial occlusion or alveolar occlusion with egressive airstream. A phonetic annotation using the IPA alphabet was performed, highlighting the complexity of such sound production and the limits of speech-based annotation.


Subject(s)
Phonetics , Speech , Acoustics , Electromagnetic Phenomena , Humans , Music , Tongue/diagnostic imaging
4.
Proc Natl Acad Sci U S A ; 117(11): 6255-6263, 2020 03 17.
Article in English | MEDLINE | ID: mdl-32123070

ABSTRACT

Auditory speech perception enables listeners to access phonological categories from speech sounds. During speech production and speech motor learning, speakers' experience matched auditory and somatosensory input. Accordingly, access to phonetic units might also be provided by somatosensory information. The present study assessed whether humans can identify vowels using somatosensory feedback, without auditory feedback. A tongue-positioning task was used in which participants were required to achieve different tongue postures within the /e, ε, a/ articulatory range, in a procedure that was totally nonspeech like, involving distorted visual feedback of tongue shape. Tongue postures were measured using electromagnetic articulography. At the end of each tongue-positioning trial, subjects were required to whisper the corresponding vocal tract configuration with masked auditory feedback and to identify the vowel associated with the reached tongue posture. Masked auditory feedback ensured that vowel categorization was based on somatosensory feedback rather than auditory feedback. A separate group of subjects was required to auditorily classify the whispered sounds. In addition, we modeled the link between vowel categories and tongue postures in normal speech production with a Bayesian classifier based on the tongue postures recorded from the same speakers for several repetitions of the /e, ε, a/ vowels during a separate speech production task. Overall, our results indicate that vowel categorization is possible with somatosensory feedback alone, with an accuracy that is similar to the accuracy of the auditory perception of whispered sounds, and in congruence with normal speech articulation, as accounted for by the Bayesian classifier.


Subject(s)
Feedback, Physiological , Phonetics , Sensation/physiology , Speech Perception/physiology , Tongue/physiology , Adult , Female , Humans , Male , Palate/physiology , Speech Production Measurement , Young Adult
5.
Multisens Res ; 31(1-2): 57-78, 2018 Jan 01.
Article in English | MEDLINE | ID: mdl-31264596

ABSTRACT

Speech unfolds in time and, as a consequence, its perception requires temporal integration. Yet, studies addressing audio-visual speech processing have often overlooked this temporal aspect. Here, we address the temporal course of audio-visual speech processing in a phoneme identification task using a Gating paradigm. We created disyllabic Spanish word-like utterances (e.g., /pafa/, /paθa/, …) from high-speed camera recordings. The stimuli differed only in the middle consonant (/f/, /θ/, /s/, /r/, /g/), which varied in visual and auditory saliency. As in classical Gating tasks, the utterances were presented in fragments of increasing length (gates), here in 10 ms steps, for identification and confidence ratings. We measured correct identification as a function of time (at each gate) for each critical consonant in audio, visual and audio-visual conditions, and computed the Identification Point and Recognition Point scores. The results revealed that audio-visual identification is a time-varying process that depends on the relative strength of each modality (i.e., saliency). In some cases, audio-visual identification followed the pattern of one dominant modality (either A or V), when that modality was very salient. In other cases, both modalities contributed to identification, hence resulting in audio-visual advantage or interference with respect to unimodal conditions. Both unimodal dominance and audio-visual interaction patterns may arise within the course of identification of the same utterance, at different times. The outcome of this study suggests that audio-visual speech integration models should take into account the time-varying nature of visual and auditory saliency.

6.
J Speech Lang Hear Res ; 60(2): 322-340, 2017 02 01.
Article in English | MEDLINE | ID: mdl-28152131

ABSTRACT

Purpose: This study compares the precision of the electromagnetic articulographs used in speech research: Northern Digital Instruments' Wave and Carstens' AG200, AG500, and AG501 systems. Method: The fluctuation of distances between 3 pairs of sensors attached to a manually rotated device that can position them inside the measurement volumes was determined. For each device, 2 precision estimates made on the basis of the 95% quantile range of these distances (QR95) were defined: The local QR95 was computed for bins around specific rotation angles, and the global QR95 was computed for all angles pooled. Results: For all devices, although the local precision lies around 0.1 cm, the global precision is much more worrisome, ranging from 0.03 cm to 2.18 cm, and displays large variations as a function of the position of the sensors in the measurement volume. No influence of the rotational speed was found. The AG501 produced-by far-the lowest errors, in particular concerning the global precision. Conclusions: The local precision can be considered suitable for speech articulatory measurements, but the variations of the global precision need to be taken into account by the knowledge of the spatial distribution of errors. A guideline for good practice in EMA recording is proposed for each system.


Subject(s)
Electrodiagnosis/instrumentation , Speech Production Measurement/instrumentation , Analysis of Variance , Electrical Equipment and Supplies , Humans , Linear Models
7.
PLoS Comput Biol ; 12(11): e1005119, 2016 Nov.
Article in English | MEDLINE | ID: mdl-27880768

ABSTRACT

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.


Subject(s)
Biofeedback, Psychology/methods , Brain-Computer Interfaces , Communication Aids for Disabled , Neural Networks, Computer , Sound Spectrography/methods , Speech Production Measurement/methods , Biofeedback, Psychology/instrumentation , Computer Systems , Humans , Phonetics , Sound Spectrography/instrumentation , Speech Acoustics , Speech Intelligibility , Speech Production Measurement/instrumentation
8.
Front Psychol ; 5: 1179, 2014.
Article in English | MEDLINE | ID: mdl-25374551

ABSTRACT

We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience-i.e., the exposure to a double phonological code during childhood-affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identification experiment with bilingual and monolingual adult participants. It was an ABX task involving a Bengali dental-retroflex contrast that does not exist in any of the participants' languages. The phonemes were presented in audiovisual (AV) and audio-only (A) conditions. The results revealed that in the audio-only condition monolinguals and bilinguals had difficulties in discriminating the retroflex non-native phoneme. They were phonologically "deaf" and assimilated it to the dental phoneme that exists in their native languages. In the audiovisual presentation instead, both groups could overcome the phonological deafness for the retroflex non-native phoneme and identify both Bengali phonemes. However, monolinguals were more accurate and responded quicker than bilinguals. This suggests that bilinguals do not use the same processes as monolinguals to decode visual speech.

9.
J Acoust Soc Am ; 136(4): 1869-79, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25324087

ABSTRACT

Interaction between covert and overt orofacial gestures has been poorly studied apart from old and rather qualitative experiments. The question deserves special interest in the context of the debate between auditory and motor theories of speech perception, where dual tasks may be of great interest. It is shown here that dynamic mandible and lips movement produced by a participant result in strong and stable perturbations to an inner speech counting task that has to be realized at the same time, while static orofacial configurations and static or dynamic manual actions produce no perturbation. This enables the authors to discuss how such kinds of orofacial perturbations could be introduced in dual task paradigms to assess the role of motor processes in speech perception.


Subject(s)
Facial Expression , Gestures , Lip/physiology , Mandible/physiology , Mathematical Concepts , Speech , Female , Humans , Male , Movement , Psychophysics , Task Performance and Analysis , Thinking , Time Factors
10.
PLoS Comput Biol ; 10(7): e1003743, 2014 Jul.
Article in English | MEDLINE | ID: mdl-25079216

ABSTRACT

An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.


Subject(s)
Auditory Perception/physiology , Speech/physiology , Visual Perception/physiology , Computational Biology , Humans , Male , Models, Biological
11.
Clin Linguist Phon ; 28(4): 241-56, 2014 Apr.
Article in English | MEDLINE | ID: mdl-23837408

ABSTRACT

This article focuses on methodological issues related to quantitative assessments of speech quality after glossectomy. Acoustic and articulatory data were collected for 8 consonants from two patients. The acoustic analysis is based on spectral moments and the Klatt VOT. Lingual movements are recorded with ultrasound without calibration. The variations of acoustic and articulatory parameters across pre- and post-surgery conditions are analyzed in the light of perceptual evaluations of the stimuli. A parameter is considered to be relevant if its variation is congruent with perceptual ratings. The most relevant acoustic parameters are the skewness and the Center of Gravity. The Klatt VOT explains differences that could not be explained by spectral parameters. The SNTS ultrasound parameter provides information to describe impairments not accounted for by acoustical parameters. These results suggest that the combination of articulatory, perceptual and acoustic data provides comprehensive complementary information for a quantitative assessment of speech after glossectomy.


Subject(s)
Articulation Disorders/rehabilitation , Glossectomy/rehabilitation , Speech Articulation Tests , Speech Intelligibility , Speech Production Measurement , Tongue Neoplasms/surgery , Adult , Articulation Disorders/diagnosis , Female , Glossectomy/methods , Humans , Male , Middle Aged , Neck Dissection/rehabilitation , Phonetics , Postoperative Complications/diagnosis , Postoperative Complications/rehabilitation , Sound Spectrography , Speech Acoustics , Ultrasonography
12.
J Acoust Soc Am ; 125(2): 1184-96, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19206891

ABSTRACT

This paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences (i.e., when no sound is produced by the speaker). The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector (VAD). To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence (i.e., speech+nonspeech audible events). A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals.


Subject(s)
Lip/physiology , Lipreading , Movement , Speech Perception , Visual Perception , Voice , Algorithms , Cues , Humans , Male , Pattern Recognition, Automated , Pattern Recognition, Physiological , Signal Detection, Psychological , Sound Spectrography , Video Recording
13.
J Acoust Soc Am ; 124(2): 1192-206, 2008 Aug.
Article in English | MEDLINE | ID: mdl-18681607

ABSTRACT

The relations between production and perception in 4-year-old children were examined in a study of compensation strategies for a lip-tube perturbation. Acoustic and perceptual analyses of the rounded vowel [u] produced by twelve 4-year-old French speakers were conducted under two conditions: normal and with a 15-mm-diam tube inserted between the lips. Recordings of isolated vowels were made in the normal condition before any perturbation (N1), immediately upon insertion of the tube and for the next 19 trials in this perturbed condition, with (P2) or without articulatory instructions (P1), and in the normal condition after the perturbed trials (N2). The results of the acoustic analyses reveal speaker-dependent alterations of F1, F2, and/or F0 in the perturbed conditions and after the removal of the tube. For some subjects, the presence of the tube resulted in very little change; for others, an increase in F2 was observed in P1, which was generally reduced in some of the 20 repetitions, but not systematically and not continuously. The use of articulatory instructions provided in the P2 condition was detrimental to the achievement of a good acoustic target. Perceptual data are used to determine optimal combinations of F0, F1, and F2 (in bark) related to these patterns. The data are compared to a previous study conducted with adults [Savariaux et al., J. Acoust. Soc. Am. 106, 381-393 (1999)].


Subject(s)
Language , Lip/physiology , Phonetics , Speech Acoustics , Speech Perception , Adaptation, Physiological , Adult , Child, Preschool , France , Humans , Learning , Sound Spectrography , Speech Production Measurement
14.
Cognition ; 93(2): B69-78, 2004 Sep.
Article in English | MEDLINE | ID: mdl-15147940

ABSTRACT

Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.


Subject(s)
Lipreading , Speech Perception , Visual Perception , Cues , Humans , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...