ABSTRACT
Purpose: Models of speech production often abstract away from shared physiology in pitch control and lingual articulation, positing independent control of tone and vowel units. We assess the validity of this assumption in Mandarin Chinese by evaluating the stability of lingual articulation for vowels across variation in tone. Method: Electromagnetic articulography was used to track flesh points on the tongue (tip, body, dorsum), lips, and jaw while native Mandarin speakers (n = 6) produced 3 vowels, /pa/, /pi/, /pu/, combined with 4 Mandarin tones: T1 "high," T2 "rising," T3 "low," and T4 "falling." Results: Consistent with physiological expectations, tones that begin low, T2 and T3, conditioned a lower position of the tongue body for the vowel /a/. For the vowel /i/, we found the opposite effect, whereby tones that begin low, T2 and T3, conditioned a higher tongue body position. Conclusions: The physiology of pitch control exerts systematic variation on the lingual articulation of /a/ across tones. The effects of tone on /i/ articulation are in the opposite direction predicted by physiological considerations. Physiologically arbitrary variation of the type observed for /i/ challenges the assumption that phonetic patterns can be determined by independent control of tone (source) and vowel (filter) production units.
Subject(s)
Jaw , Motor Activity , Phonetics , Speech Acoustics , Tongue , Adult , Biomechanical Phenomena , Female , Humans , Jaw/physiology , Male , Motor Activity/physiology , Motor Skills/physiology , Tongue/physiology , Young AdultABSTRACT
Noninvasive imaging is widely used in speech research as a means to investigate the shaping and dynamics of the vocal tract during speech production. 3-D dynamic MRI would be a major advance, as it would provide 3-D dynamic visualization of the entire vocal tract. We present a novel method for the creation of 3-D dynamic movies of vocal tract shaping based on the acquisition of 2-D dynamic data from parallel slices and temporal alignment of the image sequences using audio information. Multiple sagittal 2-D real-time movies with synchronized audio recordings are acquired for English vowel-consonant-vowel stimuli /ala/, /a.ιa/, /asa/, and /a∫a/. Audio data are aligned using mel-frequency cepstral coefficients (MFCC) extracted from windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using dynamic time warping (DTW). The aligned image sequences enable dynamic 3-D visualization by creating synthesized movies of the moving airway in the coronal planes, visualizing desired tissue surfaces and tube-shaped vocal tract airway after manual segmentation of targeted articulators and smoothing. The resulting volumes allow for dynamic 3-D visualization of salient aspects of lingual articulation, including the formation of tongue grooves and sublingual cavities, with a temporal resolution of 78 ms.
Subject(s)
Imaging, Three-Dimensional/methods , Magnetic Resonance Imaging/methods , Speech Production Measurement/methods , Vocal Cords/anatomy & histology , Vocal Cords/physiology , Adult , Humans , Male , Signal Processing, Computer-AssistedABSTRACT
PURPOSE: To develop a real-time imaging technique that allows for simultaneous visualization of vocal tract shaping in multiple scan planes, and provides dynamic visualization of complex articulatory features. MATERIALS AND METHODS: Simultaneous imaging of multiple slices was implemented using a custom real-time imaging platform. Midsagittal, coronal, and axial scan planes of the human upper airway were prescribed and imaged in real-time using a fast spiral gradient-echo pulse sequence. Two native speakers of English produced voiceless and voiced fricatives /f/-/v/, /θ/-/ð/, /s/-/z/, /∫/- in symmetrical maximally contrastive vocalic contexts /a_a/, /i_i/, and /u_u/. Vocal tract videos were synchronized with noise-cancelled audio recordings, facilitating the selection of frames associated with production of English fricatives. RESULTS: Coronal slices intersecting the postalveolar region of the vocal tract revealed tongue grooving to be most pronounced during fricative production in back vowel contexts, and more pronounced for sibilants /s/-/z/ than for /∫/-. The axial slice best revealed differences in dorsal and pharyngeal articulation; voiced fricatives were observed to be produced with a larger cross-sectional area in the pharyngeal airway. Partial saturation of spins provided accurate location of imaging planes with respect to each other. CONCLUSION: Real-time MRI of multiple intersecting slices can provide valuable spatial and temporal information about vocal tract shaping, including details not observable from a single slice.
Subject(s)
Magnetic Resonance Imaging, Cine/methods , Magnetic Resonance Imaging/methods , Speech Production Measurement/methods , Tongue/physiology , Vocal Cords/physiology , Voice/physiology , Adult , Computer Systems , Female , Humans , Male , Tongue/anatomy & histology , Vocal Cords/anatomy & histology , Young AdultABSTRACT
Due to its aerodynamic, articulatory, and acoustic complexities, the fricative /s/ is known to require high precision in its control, and to be highly resistant to coarticulation. This study documents in detail how jaw, tongue front, tongue back, lips, and the first spectral moment covary during the production of /s/, to establish how coarticulation affects this segment. Data were obtained from 24 speakers in the Wisconsin x-ray microbeam database producing /s/ in prevocalic and pre-obstruent sequences. Analysis of the data showed that certain aspects of jaw and tongue motion had specific kinematic trajectories, regardless of context, and the first spectral moment trajectory corresponded to these in some aspects. In particular contexts, variability due to jaw motion is compensated for by tongue-tip motion and bracing against the palate, to maintain an invariant articulatory-aerodynamic goal, constriction degree. The change in the first spectral moment, which rises to a peak at the midpoint of the fricative, primarily reflects the motion of the jaw. Implications of the results for theories of speech motor control and acoustic-articulatory relations are discussed.
Subject(s)
Jaw/physiology , Language , Mouth/physiology , Phonetics , Speech Acoustics , Biomechanical Phenomena , Databases as Topic , Female , Friction , Humans , Jaw/diagnostic imaging , Lip/physiology , Male , Mouth/diagnostic imaging , Radiography , Sound Spectrography , Speech Production Measurement , Tongue/physiology , Young AdultABSTRACT
A structural magnetic resonance imaging study has revealed that pharyngeal articulation varies considerably with voicing during the production of English fricatives. In a study of four speakers of American English, pharyngeal volume was generally found to be greater during the production of sustained voiced fricatives, compared to voiceless equivalents. Though pharyngeal expansion is expected for voiced stops, it is more surprising for voiced fricatives. For three speakers, all four voiced oral fricatives were produced with a larger pharynx than that used during the production of the voiceless fricative at the same place of articulation. For one speaker, pharyngeal volume during the production of voiceless labial fricatives was found to be greater, and sibilant pharyngeal volume varied with vocalic context as well as voicing. Pharyngeal expansion was primarily achieved through forward displacement of the anterior and lateral walls of the upper pharynx, but some displacement of the rear pharyngeal wall was also observed. These results suggest that the production of voiced fricatives involves the complex interaction of articulatory constraints from three separate goals: the formation of the appropriate oral constriction, the control of airflow through the constriction so as to achieve frication, and the maintenance of glottal oscillation by attending to transglottal pressure.