Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 67
Filter
1.
NMR Biomed ; : e5135, 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38440911

ABSTRACT

This work develops and evaluates a self-navigated variable density spiral (VDS)-based manifold regularization scheme to prospectively improve dynamic speech magnetic resonance imaging (MRI) at 3 T. Short readout duration spirals (1.3-ms long) were used to minimize sensitivity to off-resonance. A custom 16-channel speech coil was used for improved parallel imaging of vocal tract structures. The manifold model leveraged similarities between frames sharing similar vocal tract postures without explicit motion binning. The self-navigating capability of VDS was leveraged to learn the Laplacian structure of the manifold. Reconstruction was posed as a sensitivity-encoding-based nonlocal soft-weighted temporal regularization scheme. Our approach was compared with view-sharing, low-rank, temporal finite difference, extra dimension-based sparsity reconstruction constraints. Undersampling experiments were conducted on five volunteers performing repetitive and arbitrary speaking tasks at different speaking rates. Quantitative evaluation in terms of mean square error over moving edges was performed in a retrospective undersampling experiment on one volunteer. For prospective undersampling, blinded image quality evaluation in the categories of alias artifacts, spatial blurring, and temporal blurring was performed by three experts in voice research. Region of interest analysis at articulator boundaries was performed in both experiments to assess articulatory motion. Improved performance with manifold reconstruction constraints was observed over existing constraints. With prospective undersampling, a spatial resolution of 2.4 × 2.4 mm2 /pixel and a temporal resolution of 17.4 ms/frame for single-slice imaging, and 52.2 ms/frame for concurrent three-slice imaging, were achieved. We demonstrated implicit motion binning by analyzing the mechanics of the Laplacian matrix. Manifold regularization demonstrated superior image quality scores in reducing spatial and temporal blurring compared with all other reconstruction constraints. While it exhibited faint (nonsignificant) alias artifacts that were similar to temporal finite difference, it provided statistically significant improvements compared with the other constraints. In conclusion, the self-navigated manifold regularized scheme enabled robust high spatiotemporal resolution dynamic speech MRI at 3 T.

2.
J Acoust Soc Am ; 154(6): 3741-3759, 2023 12 01.
Article in English | MEDLINE | ID: mdl-38099832

ABSTRACT

The purpose of this study was to determine whether the threshold of velopharyngeal (VP) coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English was different for speech produced by a model based on an adult male, an adult female, and a 4-year-old child. V1CV2 stimuli were generated with a speech production model that encodes phonetic segments as relative acoustic targets imposed on an underlying vocal tract and laryngeal structure that can be scaled according to sex and age. Each V1CV2 was synthesized with a set of VP coupling functions whose maximum area ranged from 0 to 0.1 cm2. Results showed that scaling the vocal tract and vocal folds had essentially no effect on the VP coupling area at which listener identification shifted from stop to nasal. The range of coupling areas at which the crossover occurred was 0.037-0.049 cm2 for the male model, 0.040-0.055 cm2 for the female model, and 0.039-0.052 cm2 for the 4-year-old child model, and overall mean was 0.044 cm2. Calculations of band limited peak nasalance indicated that 85% peak nasalance during the consonant was well aligned with listener responses.


Subject(s)
Larynx , Speech , Adult , Female , Male , Humans , Child, Preschool , Acoustics , Language , Nose
3.
J Voice ; 2023 Apr 18.
Article in English | MEDLINE | ID: mdl-37080890

ABSTRACT

Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.

4.
J Acoust Soc Am ; 152(6): 3548, 2022 12.
Article in English | MEDLINE | ID: mdl-36586864

ABSTRACT

A well-known concept of singing voice pedagogy is "formant tuning," where the lowest two vocal tract resonances ( fR1, fR2) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of fR1, fR2 variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of -6, -12, and -18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.


Subject(s)
Singing , Voice , Male , Female , Humans , Computer Simulation , Sound , Vibration
5.
J Acoust Soc Am ; 152(3): 1783, 2022 09.
Article in English | MEDLINE | ID: mdl-36182331

ABSTRACT

The harmonics-to-noise ratio (HNR) and other spectral noise parameters are important in clinical objective voice assessment as they could indicate the presence of nonharmonic phenomena, which are tied to the perception of hoarseness or breathiness. Existing HNR estimators are built on the voice signals to be nearly periodic (fixed over a short period), although voice pathology could induce involuntary slow modulation to void this assumption. This paper proposes the use of a deterministically time-varying harmonic model to improve the HNR measurements. To estimate the time-varying model, a two-stage iterative least squares algorithm is proposed to reduce model overfitting. The efficacy of the proposed HNR estimator is demonstrated with synthetic signals, simulated tremor signals, and recorded acoustic signals. Results indicate that the proposed algorithm can produce consistent HNR measures as the extent and rate of tremor are varied.


Subject(s)
Tremor , Voice , Acoustics , Humans , Noise , Speech Acoustics
6.
PLoS One ; 17(3): e0264981, 2022.
Article in English | MEDLINE | ID: mdl-35275939

ABSTRACT

PURPOSE: Normative data on the growth and development of the upper airway across the sexes is needed for the diagnosis and treatment of congenital and acquired respiratory anomalies and to gain insight on developmental changes in speech acoustics and disorders with craniofacial anomalies. METHODS: The growth of the upper airway in children ages birth to 5 years, as compared to adults, was quantified using an imaging database with computed tomography studies from typically developing individuals. Methodological criteria for scan inclusion and airway measurements included: head position, histogram-based airway segmentation, anatomic landmark placement, and development of a semi-automatic centerline for data extraction. A comprehensive set of 2D and 3D supra- and sub-glottal measurements from the choanae to tracheal opening were obtained including: naso-oro-laryngo-pharynx subregion volume and length, each subregion's superior and inferior cross-sectional-area, and antero-posterior and transverse/width distances. RESULTS: Growth of the upper airway during the first 5 years of life was more pronounced in the vertical and transverse/lateral dimensions than in the antero-posterior dimension. By age 5 years, females have larger pharyngeal measurement than males. Prepubertal sex-differences were identified in the subglottal region. CONCLUSIONS: Our findings demonstrate the importance of studying the growth of the upper airway in 3D. As the lumen length increases, its shape changes, becoming increasingly elliptical during the first 5 years of life. This study also emphasizes the importance of methodological considerations for both image acquisition and data extraction, as well as the use of consistent anatomic structures in defining pharyngeal regions.


Subject(s)
Imaging, Three-Dimensional , Larynx , Adult , Anatomic Landmarks , Child , Child, Preschool , Cross-Sectional Studies , Female , Humans , Imaging, Three-Dimensional/methods , Male , Pharynx/diagnostic imaging
7.
J Voice ; 36(2): 149, 2022 03.
Article in English | MEDLINE | ID: mdl-35177292

Subject(s)
Speech , Voice , Humans
8.
Cogn Neuropsychol ; 38(4): 309-317, 2021.
Article in English | MEDLINE | ID: mdl-34881683

ABSTRACT

We agree with Cristina Romani (CR) about reducing confusion and agree that the issues raised in her commentary are central to the study of apraxia of speech (AOS). However, CR critiques our approach from the perspective of basic cognitive neuropsychology. This is confusing and misleading because, contrary to CR's claim, we did not attempt to inform models of typical speech production. Instead, we relied on such models to study the impairment in the clinical category of AOS (translational cognitive neuropsychology). Thus, the approach along with the underlying assumptions is different. This response aims to clarify these assumptions, broaden the discussion regarding the methodological approach, and address CR's concerns. We argue that our approach is well-suited to meet the goals of our recent studies and is commensurate with the current state of the science of AOS. Ultimately, a plurality of approaches is needed to understand a phenomenon as complex as AOS.


Subject(s)
Aphasia , Apraxias , Aphasia/complications , Apraxias/etiology , Confusion/complications , Female , Humans , Speech , Speech Disorders , Speech Production Measurement
9.
J Acoust Soc Am ; 150(5): 3618, 2021 11.
Article in English | MEDLINE | ID: mdl-34852618

ABSTRACT

The purpose of this study was to determine the threshold of velopharyngeal coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English, based on V1CV2 stimuli generated with a speech production model that encodes phonetic segments as relative acoustic targets. Each V1CV2 was synthesized with a set of velopharyngeal coupling functions whose area ranged from 0 to 0.1 cm2. Results show that consonants were identified by listeners as a stop when the coupling area was less than 0.035-0.057 cm2, depending on place of articulation and final vowel. The smallest coupling area (0.035 cm2) at which the stop-to-nasal switch occurred was found for an alveolar consonant in the /ɑCi/ context, whereas the largest (0.057 cm2) was for a bilabial in /ɑCɑ/. For each stimulus, the balance of oral versus nasal acoustic energy was characterized by the peak nasalance during the consonant. Stimuli with peak nasalance below 40% were mostly identified by listeners as stops, whereas those above 40% were identified as nasals. This study was intended to be a precursor to further investigations using the same model but scaled to represent the developing speech production system of male and female talkers.


Subject(s)
Speech Perception , Speech , Female , Humans , Male , North America , Phonetics , Speech Production Measurement
10.
J Voice ; 2021 Oct 23.
Article in English | MEDLINE | ID: mdl-34702610

ABSTRACT

PURPOSE: Studies on medical and behavioral interventions for essential vocal tremor (EVT) have shown inconsistent effects on acoustical and perceptual outcome measures across studies and across participants. Remote acoustical and perceptual assessments might facilitate studies with larger samples of participants and repeated measures that could clarify treatment effects and identify optimal treatment candidates. Furthermore, remote acoustical and perceptual assessment might allow clinicians to monitor clients' treatment responses and optimize treatment approaches during telepractice. Thus, the purpose of this study was to evaluate the accuracy of remote signal transmission and recording for acoustical and perceptual assessment of EVT. METHOD: Simulations of EVT were produced using a computational model and were recorded using local and remote procedures to represent client- and clinician-end recordings respectively. Acoustical analyses measured the extent and rate of fundamental frequency (fo) and intensity modulation to represent vocal tremor severity and the cepstral peak prominence (CPPS) to represent voice quality. The data were analyzed using repeated measures analysis of variance (ANOVA) with recording as the within-subjects factor and sex of the computational model as the between-subjects factor. RESULTS: There was a significant main effect of recording on the rate of fo modulation and significant interactions of recording and sex for the extent of intensity modulation, rate of intensity modulation, and CPPS. Posthoc pairwise comparisons and analysis of effect size indicated that recording procedures had the largest effect on the extent of intensity modulation for male simulations, the rate of intensity modulation for male and female simulations, and the CPPS for male and female simulations. Despite having disabled all known software and computer audio enhancing options and having stable ethernet connections, there was inconsistent attenuation of signal amplitude in remote recordings that was most problematic for samples with a breathy voice quality but also affected samples with typical and pressed voice qualities. CONCLUSIONS: Acoustical measures that correlate to perception of vocal tremor and voice quality were altered by remote signal transmission and recording. In particular, signal transmission and recording in Zoom altered time-based estimates of intensity modulation and CPPS with male and female simulations of EVT and magnitude-based estimates of intensity modulation with male simulations of EVT. In contrast, signal transmission and recording in Zoom minimally altered time- and magnitude-based estimates of fo modulation with male and female simulations of EVT. Therefore, acoustical and perceptual assessments of EVT should be performed using audio recordings that are collected locally on the participant- or client-end, particularly when measuring modulation of intensity and CPP or estimating vocal tremor severity and voice quality. Development of procedures for collecting local audio recordings in remote settings may expand data collection for treatment research and enhance telepractice.

11.
J Acoust Soc Am ; 149(6): 4565, 2021 06.
Article in English | MEDLINE | ID: mdl-34241428

ABSTRACT

In recent studies, it has been assumed that vocal tract formants (Fn) and the voice source could interact. However, there are only few studies analyzing this assumption in vivo. Here, the vowel transition /i/-/a/-/u/-/i/ of 12 professional classical singers (6 females, 6 males) when phonating on the pitch D4 [fundamental frequency (ƒo) ca. 294 Hz] were analyzed using transnasal high speed videoendoscopy (20.000 fps), electroglottography (EGG), and audio recordings. Fn data were calculated using a cepstral method. Source-filter interaction candidates (SFICs) were determined by (a) algorithmic detection of major intersections of Fn/nƒo and (b) perceptual assessment of the EGG signal. Although the open quotient showed some increase for the /i-a/ and /u-i/ transitions, there were no clear effects at the expected Fn/nƒo intersections. In contrast, ƒo adjustments and changes in the phonovibrogram occurred at perceptually derived SFICs, suggesting level-two interactions. In some cases, these were constituted by intersections between higher nƒo and Fn. The presented data partially corroborates that vowel transitions may result in level-two interactions also in professional singers. However, the lack of systematically detectable effects suggests either the absence of a strong interaction or existence of confounding factors, which may potentially counterbalance the level-two-interactions.


Subject(s)
Singing , Voice , Female , Humans , Male , Occupations , Phonation , Voice Quality
12.
JASA Express Lett ; 1(8): 085203, 2021 08.
Article in English | MEDLINE | ID: mdl-36154248

ABSTRACT

A recently developed speech production model, in which speech segments are specified by relative acoustic events called resonance deflection patterns, was used to generate speech signals that were presented to listeners in a perceptual test. The purpose was to determine the effect of variations of the magnitude and polarity of the third resonance deflection on identification of the consonant in a V1CV2 disyllable while the deflections of the first and second resonances were held constant. Result showed that listeners' identification changed from /d/ to /É¡/ when the polarity of the third resonance deflection switched from positive to negative.


Subject(s)
Phonetics , Voice , Acoustics , Speech Acoustics
13.
Cogn Neuropsychol ; 38(1): 72-87, 2021 02.
Article in English | MEDLINE | ID: mdl-33249997

ABSTRACT

This study investigated the underlying nature of apraxia of speech (AOS) by testing two competing hypotheses. The Reduced Buffer Capacity Hypothesis argues that people with AOS can plan speech only one syllable at a time Rogers and Storkel [1999. Planning speech one syllable at a time: The reduced buffer capacity hypothesis in apraxia of speech. Aphasiology, 13(9-11), 793-805. https://doi.org/10.1080/026870399401885]. The Program Retrieval Deficit Hypothesis states that selecting a motor programme is difficult in face of competition from other simultaneously activated programmes Mailend and Maas [2013. Speech motor programming in apraxia of speech: Evidence from a delayed picture-word interference task. American Journal of Speech-Language Pathology, 22(2), S380-S396. https://doi.org/10.1044/1058-0360(2013/12-0101)]. Speakers with AOS and aphasia, aphasia without AOS, and unimpaired controls were asked to prepare and hold a two-word utterance until a go-signal prompted a spoken response. Phonetic similarity between target words was manipulated. Speakers with AOS had longer reaction times in conditions with two similar words compared to two identical words. The Control and the Aphasia group did not show this effect. These results suggest that speakers with AOS need additional processing time to retrieve target words when multiple motor programmes are simultaneously activated.


Subject(s)
Aphasia/physiopathology , Apraxias/physiopathology , Phonetics , Speech Disorders/physiopathology , Speech , Adult , Aged , Female , Humans , Male , Middle Aged , Reaction Time , Speech Production Measurement/methods
14.
J Acoust Soc Am ; 147(3): EL221, 2020 03.
Article in English | MEDLINE | ID: mdl-32237805

ABSTRACT

The purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.


Subject(s)
Acoustics , Speech , Child , Female , Humans , Male , Speech Acoustics
15.
Elife ; 92020 02 17.
Article in English | MEDLINE | ID: mdl-32048990

ABSTRACT

Khoomei is a unique singing style originating from the republic of Tuva in central Asia. Singers produce two pitches simultaneously: a booming low-frequency rumble alongside a hovering high-pitched whistle-like tone. The biomechanics of this biphonation are not well-understood. Here, we use sound analysis, dynamic magnetic resonance imaging, and vocal tract modeling to demonstrate how biphonation is achieved by modulating vocal tract morphology. Tuvan singers show remarkable control in shaping their vocal tract to narrowly focus the harmonics (or overtones) emanating from their vocal cords. The biphonic sound is a combination of the fundamental pitch and a focused filter state, which is at the higher pitch (1-2 kHz) and formed by merging two formants, thereby greatly enhancing sound-production in a very narrow frequency range. Most importantly, we demonstrate that this biphonation is a phenomenon arising from linear filtering rather than from a nonlinear source.


The republic of Tuva, a remote territory in southern Russia located on the border with Mongolia, is perhaps best known for its vast mountainous geography and the unique cultural practice of "throat singing". These singers simultaneously create two different pitches: a low-pitched drone, along with a hovering whistle above it. This practice has deep cultural roots and has now been shared more broadly via world music performances and the 1999 documentary Genghis Blues. Despite many scientists being fascinated by throat singing, it was unclear precisely how throat singers could create two unique pitches. Singing and speaking in general involves making sounds by vibrating the vocal cords found deep in the throat, and then shaping those sounds with the tongue, teeth and lips as they move up the vocal tract and out of the body. Previous studies using static images taken with magnetic resonance imaging (MRI) suggested how Tuvan singers might produce the two pitches, but a mechanistic understanding of throat singing was far from complete. Now, Bergevin et al. have better pinpointed how throat singers can produce their unique sound. The analysis involved high quality audio recordings of three Tuvan singers and dynamic MRI recordings of the movements of one of those singers. The images showed changes in the singer's vocal tract as they sang inside an MRI scanner, providing key information needed to create a computer model of the process. This approach revealed that Tuvan singers can create two pitches simultaneously by forming precise constrictions in their vocal tract. One key constriction occurs when tip of the tongue nearly touches a ridge on the roof of the mouth, and a second constriction is formed by the base of the tongue. The computer model helped explain that these two constrictions produce the distinctive sounds of throat singing by selectively amplifying a narrow set of high frequency notes that are made by the vocal cords. Together these discoveries show how very small, targeted movements of the tongue can produce distinctive sounds.


Subject(s)
Pharynx/physiology , Singing , Audiovisual Aids , Humans , Magnetic Resonance Imaging , Pharynx/diagnostic imaging , Russia
16.
J Acoust Soc Am ; 146(4): 2522, 2019 10.
Article in English | MEDLINE | ID: mdl-31671993

ABSTRACT

A model is described in which the effects of articulatory movements to produce speech are generated by specifying relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. Because the time course of the events may be considerably overlapped in time, coarticulatory effects are automatically generated. Production of sentence-level speech with the model is demonstrated with audio samples and vocal tract animations.


Subject(s)
Models, Biological , Speech Production Measurement , Speech/physiology , Acoustics , Humans , Jaw/physiology , Larynx/physiology , Lip/physiology , Male , Tongue/physiology
17.
Neuropsychologia ; 127: 171-184, 2019 04.
Article in English | MEDLINE | ID: mdl-30817912

ABSTRACT

The purpose of this study was to test two competing hypotheses about the nature of the impairment in apraxia of speech (AOS). The Reduced Buffer Capacity Hypothesis argues that people with AOS can hold only one syllable at a time in the speech motor planning buffer. The Program Retrieval Deficit Hypothesis, states that people with AOS have difficulty accessing the intended motor program in the context where several motor programs are activated simultaneously. The participants included eight speakers with AOS, most of whom also had aphasia, nine speakers with aphasia without AOS, and 25 age-matched control speakers. The experimental paradigm prompted single word production following three types of primes. In most trials, prime and target were the same (e.g., bill-bill). On some trials, the initial consonant differed in one phonetic feature (e.g., bill-dill; Similar) or in all phonetic features (fill-bill; Different). The dependent measures were accuracy and reaction time. The results revealed a switch cost - longer reaction times in trials where the prime and target differed compared to trials where they were the same words - in all groups; however, the switch cost was significantly larger in the AOS group compared to the other two groups. These findings are in line with the prediction of the Program Retrieval Deficit Hypothesis and suggest that speakers with AOS have difficulty with selecting one program over another when several programs compete for selection.


Subject(s)
Anticipation, Psychological , Aphasia/psychology , Phonetics , Speech Disorders/psychology , Speech , Adult , Aged , Apraxias , Female , Humans , Individuality , Male , Middle Aged , Psychomotor Performance , Reaction Time
18.
J Acoust Soc Am ; 143(5): 3079, 2018 05.
Article in English | MEDLINE | ID: mdl-29857736

ABSTRACT

The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.


Subject(s)
Child Development/physiology , Sex Characteristics , Speech/physiology , Vocal Cords/anatomy & histology , Vocal Cords/physiology , Adult , Age Factors , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Sex Factors , Vocal Cords/diagnostic imaging
19.
J Acoust Soc Am ; 141(5): EL458, 2017 05.
Article in English | MEDLINE | ID: mdl-28599542

ABSTRACT

The purpose of this study was to develop a method for visualizing and assessing the characteristics of vowel production by measuring the local density of normalized F1 and F2 formant frequencies. The result is a three-dimensional plot called the vowel space density (VSD) and indicates the regions in the vowel space most heavily used by a talker during speech production. The area of a convex hull enclosing the vowel space at specific threshold density values was proposed as a means of quantifying the VSD.


Subject(s)
Acoustics , Phonetics , Speech Acoustics , Speech Production Measurement/methods , Voice Quality , Humans , Signal Processing, Computer-Assisted , Sound Spectrography
20.
J Speech Lang Hear Res ; 60(2): 306-321, 2017 02 01.
Article in English | MEDLINE | ID: mdl-28199505

ABSTRACT

Purpose: The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. Method: A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric settings of parameters controlling glottal configuration. These parameters were then altered individually to determine their effect on maximum flow declination rate, spectral slope, cepstral peak prominence, harmonics-to-noise ratio, and perceived voice quality. Results: Asymmetry of each of the 5 vocal fold parameters influenced vocal function and voice quality; measured change was greatest for adduction and bulging. Increasing the symmetry of all parameters improved voice, and the best voice occurred with overcorrection of adduction, followed by bulging, nodal point ratio, starting phase, and amplitude of vibration. Conclusions: Although vocal process adduction and edge bulging asymmetries are most influential in voice quality for simulated vocal fold motion impairment, amplitude of vibration and starting phase asymmetries are also perceptually important. These findings are consistent with the current surgical approach to vocal fold motion impairment, where goals include medializing the vocal process and straightening concave edges. The results also explain many of the residual postoperative voice limitations.


Subject(s)
Computer Simulation , Models, Biological , Vocal Cord Paralysis/physiopathology , Voice Quality , Biomechanical Phenomena , Humans , Vibration , Vocal Cords/physiopathology , Voice Quality/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...