Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
J Acoust Soc Am ; 155(2): 1253-1263, 2024 02 01.
Article in English | MEDLINE | ID: mdl-38341748

ABSTRACT

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.


Subject(s)
Voice , Child , Humans , Acoustics , Speech Acoustics , Vibration , Sound Spectrography
2.
J Acoust Soc Am ; 154(3): 1932-1944, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37768114

ABSTRACT

Fricatives have noise sources that are filtered by the vocal tract and that typically possess energy over a much broader range of frequencies than observed for vowels and sonorant consonants. This paper introduces and refines fricative measurements that were designed to reflect underlying articulatory and aerodynamic conditions These show differences in the pattern of high-frequency energy for sibilants vs non-sibilants, voiced vs voiceless fricatives, and non-sibilants differing in place of articulation. The results confirm the utility of a spectral peak measure (FM) and low-mid frequency amplitude difference (AmpD) for sibilants. Using a higher-frequency range for defining FM for female voices for alveolars is justified; a still higher range was considered and rejected. High-frequency maximum amplitude (Fh) and amplitude difference between low- and higher-frequency regions (AmpRange) capture /f-θ/ differences in English and the dynamic amplitude range over the entire spectrum. For this dataset, with spectral information up to 15 kHz, a new measure, HighLevelD, was more effective than previously used LevelD and Slope in showing changes over time within the frication. Finally, isolated words and connected speech differ. This work contributes improved measures of fricative spectra and demonstrates the necessity of including high-frequency energy in those measures.


Subject(s)
Language , Speech , Female , Humans
3.
J Acoust Soc Am ; 153(2): 1412, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36859163

ABSTRACT

Means of characterizing acoustic signals of fricatives with a few parameters have long been sought. When Forrest, Weismer, Milenkovic, and Dougall [(1988) J. Acoust. Soc. Am. 84, 115-123] described their system of treating spectra as probability density functions and computing the first four spectral moments, others quickly adopted their clearly described method, although it did not distinguish /f/ and /θ/. Various problems with their method are described, including the lack of spectral averaging, the necessity of normalizing the amplitude, and correlation between pairs of moments. Even when these issues are rectified by alternative methods, the fact remains that moments are not ideal descriptors because they can only describe departures from the shape of a normal Gaussian distribution. Fricative spectra, particularly of non-sibilants, are often quite dissimilar in shape from Gaussians. Furthermore, shape descriptors do not lend themselves to direct inferences about the production variables that caused the acoustic effects. Here, alternative parameters are defined, it is shown how to adapt them to specific experimental conditions, and tests of efficacy are proposed. These parameters are strongly linked to the articulatory and aerodynamic variables that underlie fricative production.

4.
J Acoust Soc Am ; 152(2): 933, 2022 08.
Article in English | MEDLINE | ID: mdl-36050157

ABSTRACT

Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction" can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.


Subject(s)
Acoustics , Speech Acoustics , Algorithms , Canada , Humans , Language
5.
Am J Speech Lang Pathol ; 29(4): 2012-2022, 2020 11 12.
Article in English | MEDLINE | ID: mdl-32870708

ABSTRACT

Purpose The purpose of this study was to report the variability of electrolarynx (EL) users' speech intelligibility in quiet and in multitalker babble. Method Ten EL users (five Servox® Digital, five TruTone™) who were at least 2 years postlaryngectomy provided recordings of five sentences from the 1965 Revised List of Phonetically Balanced Sentences. Recordings were judged by two groups of naïve listeners in quiet and in the presence of multitalker babble. Fifteen listeners orthographically transcribed a total of 750 sentences containing 3,750 key words in quiet, and another 15 listeners orthographically transcribed the same sentences mixed with multitalker babble. Results Significant differences in speech intelligibility were observed between listening conditions; 17.9% more key words were correctly identified in quiet compared to multitalker babble. Significant differences in fundamental frequency (F0) standard deviation and range but not speech intelligibility were observed between EL device types. A positive correlation of moderate significance was observed between F0 standard deviation and intelligibility for TruTone users in multitalker babble. Conclusions Findings suggest that listeners are able to identify a significantly higher percentage of EL users' speech in quiet compared to multitalker babble, but a large variability in EL users' speech intelligibility exists. Continued investigation involving a larger number of EL users is necessary to confirm this study's findings. Future research should explore the relationships among F0 measures, speaker characteristics (e.g., rate of speech, articulatory precision), and speech intelligibility, in addition to improving alaryngeal rehabilitation training protocols for EL users.


Subject(s)
Speech Intelligibility , Speech Perception , Auditory Perception , Humans , Language , Noise , Speech Disorders
6.
J Acoust Soc Am ; 145(5): EL360, 2019 05.
Article in English | MEDLINE | ID: mdl-31153348

ABSTRACT

Many developmental studies attribute reduction of acoustic variability to increasing motor control. However, linear prediction-based formant measurements are known to be biased toward the nearest harmonic of F0, especially at high F0s. Thus, the amount of reported formant variability generated by changes in F0 is unknown. Here, 470 000 vowels were synthesized, mimicking statistics reported in four developmental studies, to estimate the proportion of formant variability that can be attributed to F0 bias, as well as other formant measurement errors. Results showed that the F0-induced formant measurements errors are large and systematic, and cannot be eliminated by a large sample size.


Subject(s)
Acoustics , Bias , Speech Acoustics , Speech Perception/physiology , Humans , Phonetics , Sound Spectrography/methods
7.
PLoS One ; 13(9): e0202180, 2018.
Article in English | MEDLINE | ID: mdl-30192767

ABSTRACT

Speech motor actions are performed quickly, while simultaneously maintaining a high degree of accuracy. Are speed and accuracy in conflict during speech production? Speed-accuracy tradeoffs have been shown in many domains of human motor action, but have not been directly examined in the domain of speech production. The present work seeks evidence for Fitts' law, a rigorous formulation of this fundamental tradeoff, in speech articulation kinematics by analyzing USC-TIMIT, a real-time magnetic resonance imaging data set of speech production. A theoretical framework for considering Fitts' law with respect to models of speech motor control is elucidated. Methodological challenges in seeking relationships consistent with Fitts' law are addressed, including the operational definitions and measurement of key variables in real-time MRI data. Results suggest the presence of speed-accuracy tradeoffs for certain types of speech production actions, with wide variability across syllable position, and substantial variability also across subjects. Coda consonant targets immediately following the syllabic nucleus show the strongest evidence of this tradeoff, with correlations as high as 0.72 between speed and accuracy. A discussion is provided concerning the potentially limited applicability of Fitts' law in the context of speech production, as well as the theoretical context for interpreting the results.


Subject(s)
Motor Cortex/physiology , Psychomotor Performance/physiology , Reaction Time/physiology , Speech/physiology , Algorithms , Biomechanical Phenomena , Humans , Larynx/diagnostic imaging , Larynx/physiology , Magnetic Resonance Imaging , Models, Biological , Vocal Cords/diagnostic imaging , Vocal Cords/physiology
8.
J Acoust Soc Am ; 139(2): 713-27, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26936555

ABSTRACT

The measurement of formant frequencies of vowels is among the most common measurements in speech studies, but measurements are known to be biased by the particular fundamental frequency (F0) exciting the formants. Approaches to reducing the errors were assessed in two experiments. In the first, synthetic vowels were constructed with five different first formant (F1) values and nine different F0 values; formant bandwidths, and higher formant frequencies, were constant. Input formant values were compared to manual measurements and automatic measures using the linear prediction coding-Burg algorithm, linear prediction closed-phase covariance, the weighted linear prediction-attenuated main excitation (WLP-AME) algorithm [Alku, Pohjalainen, Vainio, Laukkanen, and Story (2013). J. Acoust. Soc. Am. 134(2), 1295-1313], spectra smoothed cepstrally and by averaging repeated discrete Fourier transforms. Formants were also measured manually from pruned reassigned spectrograms (RSs) [Fulop (2011). Speech Spectrum Analysis (Springer, Berlin)]. All but WLP-AME and RS had large errors in the direction of the strongest harmonic; the smallest errors occur with WLP-AME and RS. In the second experiment, these methods were used on vowels in isolated words spoken by four speakers. Results for the natural speech show that F0 bias affects all automatic methods, including WLP-AME; only the formants measured manually from RS appeared to be accurate. In addition, RS coped better with weaker formants and glottal fry.


Subject(s)
Signal Processing, Computer-Assisted , Speech Acoustics , Speech Production Measurement/methods , Voice Quality , Acoustics , Adult , Algorithms , Female , Fourier Analysis , Humans , Linear Models , Male , Middle Aged , Reproducibility of Results , Sound Spectrography , Young Adult
9.
J Acoust Soc Am ; 134(2): 1271-82, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23927125

ABSTRACT

Coarticulation and invariance are two topics at the center of theorizing about speech production and speech perception. In this paper, a quantitative scale is proposed that places coarticulation and invariance at the two ends of the scale. This scale is based on physical information flow in the articulatory signal, and uses Information Theory, especially the concept of mutual information, to quantify these central concepts of speech research. Mutual Information measures the amount of physical information shared across phonological units. In the proposed quantitative scale, coarticulation corresponds to greater and invariance to lesser information sharing. The measurement scale is tested by data from three languages: German, Catalan, and English. The relation between the proposed scale and several existing theories of coarticulation is discussed, and implications for existing theories of speech production and perception are presented.


Subject(s)
Motor Skills , Phonation , Phonetics , Speech Acoustics , Speech Intelligibility , Speech Perception , Stomatognathic System/innervation , Voice Quality , Biomechanical Phenomena , Electromagnetic Phenomena , Female , Humans , Information Theory , Linear Models , Male , Speech Production Measurement
10.
J Speech Lang Hear Res ; 56(4): 1175-89, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23785194

ABSTRACT

PURPOSE: This article introduces theoretically driven acoustic measures of /s/ that reflect aerodynamic and articulatory conditions. The measures were evaluated by assessing whether they revealed expected changes over time and labiality effects, along with possible gender differences suggested by past work. METHOD: Productions of /s/ were extracted from various speaking tasks from typically speaking adolescents (6 boys, 6 girls). Measures were made of relative spectral energies in low- (550-3000 Hz), mid- (3000-7000 Hz), and high-frequency regions (7000-11025 Hz); the mid-frequency amplitude peak; and temporal changes in these parameters. Spectral moments were also obtained to permit comparison with existing work. RESULTS: Spectral balance measures in low-mid and mid-high frequency bands varied over the time course of /s/, capturing the development of sibilance at mid-fricative along with showing some effects of gender and labiality. The mid-frequency spectral peak was significantly higher in nonlabial contexts, and in girls. Temporal variation in the mid-frequency peak differentiated ±labial contexts while normalizing over gender. CONCLUSIONS: The measures showed expected patterns, supporting their validity. Comparison of these data with studies of adults suggests some developmental patterns that call for further study. The measures may also serve to differentiate some cases of typical and misarticulated /s/.


Subject(s)
Phonetics , Sex Characteristics , Speech Acoustics , Speech , Verbal Behavior , Adolescent , Female , Humans , Lip , Male , Reference Values
11.
J Acoust Soc Am ; 129(2): 944-54, 2011 Feb.
Article in English | MEDLINE | ID: mdl-21361451

ABSTRACT

Due to its aerodynamic, articulatory, and acoustic complexities, the fricative /s/ is known to require high precision in its control, and to be highly resistant to coarticulation. This study documents in detail how jaw, tongue front, tongue back, lips, and the first spectral moment covary during the production of /s/, to establish how coarticulation affects this segment. Data were obtained from 24 speakers in the Wisconsin x-ray microbeam database producing /s/ in prevocalic and pre-obstruent sequences. Analysis of the data showed that certain aspects of jaw and tongue motion had specific kinematic trajectories, regardless of context, and the first spectral moment trajectory corresponded to these in some aspects. In particular contexts, variability due to jaw motion is compensated for by tongue-tip motion and bracing against the palate, to maintain an invariant articulatory-aerodynamic goal, constriction degree. The change in the first spectral moment, which rises to a peak at the midpoint of the fricative, primarily reflects the motion of the jaw. Implications of the results for theories of speech motor control and acoustic-articulatory relations are discussed.


Subject(s)
Jaw/physiology , Language , Mouth/physiology , Phonetics , Speech Acoustics , Biomechanical Phenomena , Databases as Topic , Female , Friction , Humans , Jaw/diagnostic imaging , Lip/physiology , Male , Mouth/diagnostic imaging , Radiography , Sound Spectrography , Speech Production Measurement , Tongue/physiology , Young Adult
12.
J Acoust Soc Am ; 127(3): 1507-18, 2010 Mar.
Article in English | MEDLINE | ID: mdl-20329851

ABSTRACT

A structural magnetic resonance imaging study has revealed that pharyngeal articulation varies considerably with voicing during the production of English fricatives. In a study of four speakers of American English, pharyngeal volume was generally found to be greater during the production of sustained voiced fricatives, compared to voiceless equivalents. Though pharyngeal expansion is expected for voiced stops, it is more surprising for voiced fricatives. For three speakers, all four voiced oral fricatives were produced with a larger pharynx than that used during the production of the voiceless fricative at the same place of articulation. For one speaker, pharyngeal volume during the production of voiceless labial fricatives was found to be greater, and sibilant pharyngeal volume varied with vocalic context as well as voicing. Pharyngeal expansion was primarily achieved through forward displacement of the anterior and lateral walls of the upper pharynx, but some displacement of the rear pharyngeal wall was also observed. These results suggest that the production of voiced fricatives involves the complex interaction of articulatory constraints from three separate goals: the formation of the appropriate oral constriction, the control of airflow through the constriction so as to achieve frication, and the maintenance of glottal oscillation by attending to transglottal pressure.


Subject(s)
Magnetic Resonance Imaging , Pharynx/anatomy & histology , Pharynx/physiology , Speech/physiology , Voice/physiology , Adult , Female , Glottis/anatomy & histology , Glottis/physiology , Humans , Larynx/anatomy & histology , Larynx/physiology , Male , Models, Biological , Phonetics , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...