Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
J Acoust Soc Am ; 110(4): 2141-55, 2001 Oct.
Article in English | MEDLINE | ID: mdl-11681391

ABSTRACT

This study is one in a series that has examined factors contributing to vowel perception in everyday listening. Four experimental variables have been manipulated to examine systematical differences between optimal laboratory testing conditions and those characterizing everyday listening. These include length of phonetic context, level of stimulus uncertainty, linguistic meaning, and amount of subject training. The present study investigated the effects of stimulus uncertainty from minimal to high uncertainty in two phonetic contexts, /V/ or /bVd/, when listeners had either little or extensive training. Thresholds for discriminating a small change in a formant for synthetic female vowels /I,E,ae,a,inverted v,o/ were obtained using adaptive tracking procedures. Experiment I optimized extensive training for five listeners by beginning under minimal uncertainty (only one formant tested per block) and then increasing uncertainty from 8-to-16-to-22 formants per block. Effects of higher uncertainty were less than expected; performance only decreased by about 30%. Thresholds for CVCs were 25% poorer than for isolated vowels. A previous study using similar stimuli [Kewley-Port and Zheng. J. Acoust. Soc. Am. 106, 2945-2958 (1999)] determined that the ability to discriminate formants was degraded by longer phonetic context. A comparison of those results with the present ones indicates that longer phonetic context degrades formant frequency discrimination more than higher levels of stimulus uncertainty. In experiment 2, performance in the 22-formant condition was tracked over 1 h for 37 typical listeners without formal laboratory training. Performance for typical listeners was initially about 230% worse than for trained listeners. Individual listeners' performance ranged widely with some listeners occasionally achieving performance similar to that of the trained listeners in just one hour.


Subject(s)
Attention , Phonetics , Practice, Psychological , Speech Perception , Adult , Auditory Threshold , Female , Humans , Male , Reference Values , Sound Spectrography , Speech Acoustics , Speech Reception Threshold Test
2.
J Acoust Soc Am ; 106(5): 2945-58, 1999 Nov.
Article in English | MEDLINE | ID: mdl-10573907

ABSTRACT

Thresholds for formant frequency discrimination have been established using optimal listening conditions. In normal conversation, the ability to discriminate formant frequency is probably substantially degraded. The purpose of the present study was to change the listening procedures in several substantial ways from optimal towards more ordinary listening conditions, including a higher level of stimulus uncertainty, increased levels of phonetic context, and with the addition of a sentence identification task. Four vowels synthesized from a female talker were presented in isolation, or in the phonetic context of /bVd/ syllables, three-word phrases, or nine-word sentences. In the first experiment, formant resolution was estimated under medium stimulus uncertainty for three levels of phonetic context. Some undesirable training effects were obtained and led to the design of a new protocol for the second experiment to reduce this problem and to manipulate both length of phonetic context and level of difficulty in the simultaneous sentence identification task. Similar results were obtained in both experiments. The effect of phonetic context on formant discrimination is reduced as context lengthens such that no difference was found between vowels embedded in the phrase or sentence contexts. The addition of a challenging sentence identification task to the discrimination task did not degrade performance further and a stable pattern for formant discrimination in sentences emerged. This norm for the resolution of vowel formants under these more ordinary listening conditions was shown to be nearly a constant at 0.28 barks. Analysis of vowel spaces from 16 American English talkers determined that the closest vowels, on average, were 0.56 barks apart, that is, a factor of 2 larger than the norm obtained in these vowel formant discrimination tasks.


Subject(s)
Speech Perception/physiology , Female , Humans , Male , Phonetics , Reproducibility of Results , Sound Spectrography , Speech/physiology , Speech Discrimination Tests , Teaching
3.
J Acoust Soc Am ; 104(6): 3597-607, 1998 Dec.
Article in English | MEDLINE | ID: mdl-9857518

ABSTRACT

This study examined both the identification and discrimination of vowels by three listener groups: elderly hearing-impaired, elderly normal-hearing, and young normal-hearing. Each hearing-impaired listener had a longstanding symmetrical, sloping, mild-to-moderate sensorineural hearing loss. Two signal levels [70 and 95 dB sound-pressure level (SPL)] were selected to assess the effects of audibility on both tasks. The stimuli were four vowels, /I,e, epsilon, ae/, synthesized for a female talker. Difference limens (DLs) were estimated for both F1 and F2 formants using adaptive tracking. Discrimination DLs for F1 formants were the same across groups and levels. Discrimination DLs for F2 showed that the best formant resolution was for the young normal-hearing group, the poorest was for the elderly normal-hearing group, and resolution for the elderly hearing-impaired group fell in between the other two at both signal levels. Only the elderly hearing-impaired group had DLs that were significantly poorer than those of the young listeners at the lower, 70 dB, level. In the identification task at both levels, young normal-hearing listeners demonstrated near-perfect performance (M = 95%), while both elderly groups were similar to one another and demonstrated lower performance (M = 71%). The results were examined using correlational analysis of the performance of the hearing-impaired subjects relative to that of the normal-hearing groups. The results suggest that both age and hearing impairment contribute to decreased vowel perception performance in elderly hearing-impaired persons.


Subject(s)
Hearing Loss, Sensorineural/diagnosis , Adult , Age Factors , Aged , Audiometry, Pure-Tone , Auditory Threshold , Female , Humans , Male , Middle Aged , Phonetics , Speech Discrimination Tests
4.
J Acoust Soc Am ; 103(3): 1654-66, 1998 Mar.
Article in English | MEDLINE | ID: mdl-9514029

ABSTRACT

Thresholds for formant discrimination of female and male vowels are significantly elevated by two stimulus factors, increases in formant frequency and fundamental frequency [Kewley-Port et al., J. Acoust. Soc. Am. 100, 2462-2470 (1996)]. The present analysis systematically examined whether auditory models of vowel sounds, including excitation patterns, specific loudness, and a Gammatone filterbank, could explain the effects of stimulus parameters on formant thresholds. The goal was to determine if an auditory metric could be specified that reduced variability observed in the thresholds to a single-valued function across four sets of female and male vowels. Based on Sommers and Kewley-Port [J. Acoust. Soc. Am. 99, 3770-3781 (1996)], four critical bands around the test formant were selected to calculate a metric derived from excitation patterns. A metric derived from specific loudness difference (delta Sone) was calculated across the entire frequency region. Since analyses of spectra from Gammatone filters gave similar results to those derived from excitation patterns, only the 4-ERB (equivalent rectangular bandwidth) and delta Sone metrics were analyzed in detail. Three criteria were applied to the two auditory metrics to determine if they were single-valued functions relative to formant thresholds for female and male vowels. Both the 4-ERB and delta Sone metrics met the criteria of reduced slope, reduced effect of fundamental frequency, although delta Sone was superior to 4-ERB in reducing overall variability. Results suggest that the auditory system has an inherent nonlinear transformation in which differences in vowel discrimination thresholds are almost constant in the internal representation.


Subject(s)
Models, Biological , Speech Perception , Auditory Threshold , Female , Humans , Male , Phonetics , Speech Discrimination Tests
5.
J Acoust Soc Am ; 100(4 Pt 1): 2462-70, 1996 Oct.
Article in English | MEDLINE | ID: mdl-8865651

ABSTRACT

The present experiments examined the effect of fundamental frequency (F0) on thresholds for the discrimination of formant frequency for male vowels. Thresholds for formant-frequency discrimination were obtained for six vowels with two fundamental frequencies: normal F0 (126 Hz) and low F0 (101 Hz). Four well-trained subjects performed an adaptive tracking task under low stimulus uncertainty. Comparisons between the normal-F0 and the low-F0 conditions showed that formants were resolved more accurately for low F0. These thresholds for male vowels were compared to thresholds for female vowels previously reported by Kewley-Port and Watson [J. Acoust. Soc. Am. 95, 485-496 (1994)]. Analyses of the F0 sets demonstrated that formant thresholds were significantly degraded for increases both in formant frequency and in F0. A piece-wise linear function was fit to each of the three sets of delta F thresholds as a function of formant frequency. The shape of the three parallel functions was similar such that delta F was constant in the F1 region and increased with formant frequency in the F2 region. The capability for humans to discriminate formant frequency may therefore be described as uniform in the F1 region (< 800 Hz) when represented as delta F and also uniform in the F2 region when represented as a ratio of delta F/F. A model of formant discrimination is proposed in which the effects of formant frequency are represented by the shape of an underlying piece-wise linear function. Increases in F0 significantly degrade overall discrimination independently from formant frequency.


Subject(s)
Auditory Threshold , Phonetics , Speech Perception , Female , Humans , Male , Sex Factors , Speech Discrimination Tests
6.
J Acoust Soc Am ; 99(6): 3770-81, 1996 Jun.
Article in English | MEDLINE | ID: mdl-8655808

ABSTRACT

The present investigations were designed to establish the features of vowel spectra that mediate formant frequency discrimination. Thresholds for detecting frequency shifts in the first and second formants of two steady-state vowels were initially measured for conditions in which the amplitudes of all harmonics varied in accordance with a model of cascade formant synthesis. In this model, changes in formant frequency produce level variations in components adjacent to the altered formant as well as in harmonics spectrally remote from the shifted resonant frequency. Discrimination thresholds determined with the cascade synthesis procedure were then compared to difference limens (DLs) obtained when the number of harmonics exhibiting level changes was limited to the frequency region surrounding the altered formant. Results indicated that amplitude variations could be restricted to one to three components near the shifted formant before significant increases in formant frequency DLs were observed. In a second experiment, harmonics remote from the shifted formant were removed from the stimuli. In most cases, thresholds for these reduced-harmonic complexes were not significantly different from those obtained with full-spectrum vowels. Preliminary evaluation of an excitation-pattern model of formant frequency discrimination indicated that such a model can provide good accounts of the thresholds obtained in the present experiments once the salient regions of the vowel spectra have been identified. Implications of these findings for understanding the mechanism mediating vowel perception are discussed.


Subject(s)
Phonetics , Speech Perception , Auditory Threshold , Female , Humans , Speech Discrimination Tests
7.
J Acoust Soc Am ; 97(5 Pt 1): 3139-46, 1995 May.
Article in English | MEDLINE | ID: mdl-7759654

ABSTRACT

Thresholds for formant frequency discrimination were shown to be in the range of 1%-2% by Kewley-Port and Watson [J. Acoust. Soc. Am. 95, 485-496 (1994)]. The present experiment extends that study and one by Mermelstein [J. Acoust. Soc. Am. 63, 572-580 (1978)] to determine the effect of consonantal context on the discrimination of formant frequency. Thresholds for formant frequency were measured under minimal stimulus uncertainty for the vowel /I/ synthesized in isolation and in CVC syllables with the consonants /b/, /d/, /g/, /z/, /m/, and /l/. Overall, the effects of consonantal context were similar to those reported by Mermelstein (1978), although his threshold estimates were a factor of 4-5 times larger because less-than-optimal psychophysical methods had been used. Compared to the vowel in isolation, consonantal context had little effect on thresholds for F1 and a larger effect on F2. When a shift in threshold was observed, subject variability was high and resolution was degraded by as much as a factor of 2. Analyses of stimulus parameters indicated that resolution was degraded by shortening steady-state vowel duration or if the separation between the onsets of the formant transitions was small. Overall, consonantal context makes it more difficult for some, but not all, listeners to resolve formant frequency as accurately as for vowels in isolation.


Subject(s)
Auditory Threshold , Phonetics , Speech Perception , Female , Humans , Pilot Projects , Psychophysics , Speech Discrimination Tests , Speech, Alaryngeal
8.
J Acoust Soc Am ; 95(1): 485-96, 1994 Jan.
Article in English | MEDLINE | ID: mdl-8120259

ABSTRACT

Thresholds for formant-frequency discrimination were obtained for ten synthetic English vowels patterned after a female talker. To estimate the resolution of the auditory system for these stimuli, thresholds were measured using well-trained subjects under minimal-stimulus-uncertainty procedures. Thresholds were estimated for both increments and decrements in formant frequency for the first and second formants. Reliable measurements of threshold were obtained for most formants tested, the exceptions occurring when a harmonic of the fundamental was aligned with the center frequency of the test formant. In these cases, unusually high thresholds were obtained from some subjects and asymmetrical thresholds were measured for increments versus decrements in formant frequency. Excluding those cases, thresholds for formant frequency, delta F, are best described as a piecewise-linear function of frequency which is constant at about 14 Hz in the F1 frequency region (< 800 Hz), and increases linearly in the F2 region. In the F2 region, the resolution for formant frequency is approximately 1.5%. The present thresholds are similar to previous estimates in the F1 region, but about a factor of three lower than those in the F2 region. Comparisons of these results to those for pure tones and for complex, nonspeech stimuli are discussed.


Subject(s)
Attention , Phonetics , Pitch Discrimination , Speech Perception , Adult , Auditory Threshold , Female , Humans , Male , Psychoacoustics , Sound Spectrography
9.
J Acoust Soc Am ; 89(2): 820-9, 1991 Feb.
Article in English | MEDLINE | ID: mdl-2016431

ABSTRACT

A series of experiments on the detectability of vowels in isolation has been completed. Stimuli consisted of three sets of ten vowels: one synthetic, one from a male talker, and one from a female talker. Vowel durations ranged from 20-160 ms for each of the sets. Thresholds for detecting the vowels in isolation were obtained from well-trained, normal-hearing listeners using an adaptive-tracking paradigm. For a given duration, detection thresholds for vowels calibrated for equal rms sound pressure at the earphones differed by 22 dB across the 30 vowels. In addition, an orderly decrease in vowel thresholds was obtained for increased duration, as predicted from previous data on temporal integration. Several different analyses were performed in an attempt to explain the differential detectability across the 30 vowels. Analyses accounting for audibility reduced threshold variability significantly, but vowel thresholds still ranged over 15 dB. Vowel spectra were subsequently modeled as excitation patterns, and several detection hypotheses were examined. A simple average of excitation levels across excited critical bands provided the best prediction of the level variations needed to maintain threshold-level loudness across all vowels.


Subject(s)
Auditory Threshold , Phonetics , Speech Perception , Adult , Attention , Humans , Loudness Perception , Pitch Discrimination , Psychoacoustics , Speech Acoustics
10.
J Speech Hear Res ; 32(2): 245-51, 1989 Jun.
Article in English | MEDLINE | ID: mdl-2661917

ABSTRACT

Experimental comparisons are reported between computer-based and human judgments of speech quality for the same sets of utterances. Speech stimuli were recorded from two normal talkers, who intentionally varied the quality of their speech, and from a hearing-impaired child who was receiving speech therapy on the Indiana Speech Training Aid (ISTRA). The tape recordings were submitted for evaluation to a naive jury, an expert jury, and the ISTRA System, a microcomputer equipped with a speaker-dependent speech recognition board that generated scores representing how well utterance matched a stored template. Correlational analyses of these data indicated that humans were slightly better at judging speech quality than was the computer, but that the computer was much more reliable. These results demonstrate that computer-based speech evaluation may be a reasonable substitute for human judgments for certain types of speech drill.


Subject(s)
Computer Systems , Diagnosis, Computer-Assisted , Speech Production Measurement/methods , Humans , Speech Perception , Speech Therapy/methods , Therapy, Computer-Assisted
11.
J Acoust Soc Am ; 85(4): 1726-40, 1989 Apr.
Article in English | MEDLINE | ID: mdl-2708689

ABSTRACT

The perception of subphonemic differences between vowels was investigated using multidimensional scaling techniques. Three experiments were conducted with natural-sounding synthetic stimuli generated by linear predictive coding (LPC) formant synthesizers. In the first experiment, vowel sets near the pairs (i-I), (epsilon-ae), or (u-U) were synthesized containing 11 vowels each. Listeners judged the dissimilarities between all pairs of vowels within a set several times. These perceptual differences were mapped into distances between the vowels in an n-dimensional space using two-way multidimensional scaling. Results for each vowel set showed that the physical stimulus space, which was specified by the two parameters F1 and F2, was always mapped into a two-dimensional perceptual space. The best metric for modeling the perceptual distances was the Euclidean distance between F1 and F2 in barks. The second experiment investigated the perception of the same vowels from the first experiment, but embedded in a consonantal context. Following the same procedures as experiment 1, listeners' perception of the (bv) dissimilarities was not different from their perception of the isolated vowel dissimilarities. The third experiment investigated dissimilarity judgments for the three vowels (ae-alpha-lambda) located symmetrically in the F1 X F2 vowel space. While the perceptual space was again two dimensional, the influence of phonetic identity on vowel difference judgments was observed. Implications for determining metrics for subphonemic vowel differences using multidimensional scaling are discussed.


Subject(s)
Phonetics , Sound Spectrography , Speech Perception , Adult , Attention , Female , Humans , Pitch Perception
12.
J Acoust Soc Am ; 84(6): 2266-70, 1988 Dec.
Article in English | MEDLINE | ID: mdl-3225355

ABSTRACT

Pastore [J. Acoust. Soc. Am. 84, 2262-2266 (1988)] has written a lengthy response to Kewley-Port, Watson, and Foyle [J. Acoust. Soc. Am. 83, 1133-1145 (1988)]. In this reply to Pastore's letter, several of his arguments are addressed, and new data are reported which support the conclusion of the original article. That conclusion is, basically, that the temporal acuity of the auditory system does not appear to be the origin of categorical perception of speech or nonspeech sounds differing in temporal onsets.


Subject(s)
Attention , Noise , Perceptual Masking , Semantics , Speech Perception , Humans
13.
J Acoust Soc Am ; 83(3): 1133-45, 1988 Mar.
Article in English | MEDLINE | ID: mdl-3356818

ABSTRACT

Experiments were conducted to determine the underlying resolving power of the auditory system for temporal changes at the onset of speech and nonspeech stimuli. Stimulus sets included a bilabial VOT continuum and an analogous nonspeech continuum similar to the "noise-buzz" stimuli used by Miller et al. [J. Acoust. Soc. Am. 60, 410-417 (1976)]. The main difference between these and earlier experiments was that efforts were made to minimize both the trial-to-trial stimulus uncertainty and the cognitive load inherent in some of the testing procedures. Under conditions of minimal psychophysical uncertainty, not only does discrimination performance improve overall, but the local maximum, usually interpreted as evidence of categorical perception, is eliminated. Instead, discrimination performance for voice onset time (VOT) or noise lead time (NLT) is very accurate for short onset times and generally decreases with increasing onset time. This result suggests that "categorization" of familiar sounds is not the result of a psychoacoustic threshold (as Miller et al. have suggested) but rather of processing at a more central level of the auditory system.


Subject(s)
Speech Discrimination Tests , Speech Perception/physiology , Acoustic Stimulation , Humans , Noise , Time Factors , Voice
15.
J Acoust Soc Am ; 75(4): 1168-76, 1984 Apr.
Article in English | MEDLINE | ID: mdl-6725765

ABSTRACT

Previous studies have reported that rise time of sawtooth waveforms may be discriminated in either a categorical-like manner under some experimental conditions or according to Weber's law under other conditions. In the present experiments, rise time discrimination was examined with two experimental procedures: the traditional labeling and ABX tasks used in speech perception studies and an adaptive tracking procedure used in psychophysical studies. Rise time varied from 0 to 80 ms in 10-ms intervals for sawtooth signals of 1-s duration. Discrimination functions for subjects who simply discriminated the signals on any basis whatsoever as well as functions for subjects who practiced labeling the endpoint stimuli as " pluck " and "bow" before ABX discrimination were not categorical in the ABX task. In the adaptive tracking procedure, the Weber fraction obtained from the jnds of rise time was found to be a constant above 20-ms rise time. The results from the two discrimination paradigms were then compared by predicting a jnd for rise time from the ABX discrimination data by reference to the underlying psychometric function. Using this method of analysis, discrimination results from previous studies were shown to be quite similar to the discrimination results observed in this study. Taken together the results demonstrate clearly that rise time discrimination of sawtooth signals follows predictions derived from Weber's law.


Subject(s)
Loudness Perception , Pitch Discrimination , Adult , Humans , Psychoacoustics
16.
J Acoust Soc Am ; 73(5): 1779-93, 1983 May.
Article in English | MEDLINE | ID: mdl-6223060

ABSTRACT

Two recent accounts of the acoustic cues which specify place of articulation in syllable-initial stop consonants claim that they are located in the initial portions of the CV waveform and are context-free. Stevens and Blumstein [J. Acoust. Soc. Am. 64, 1358-1368 (1978)] have described the perceptually relevant spectral properties of these cues as static, while Kewley-Port [J. Acoust. Soc. Am. 73, 322-335 (1983)] describes these cues as dynamic. Three perceptual experiments were conducted to test predictions derived from these accounts. Experiment 1 confirmed that acoustic cues for place of articulation are located in the initial 20-40 ms of natural stop-vowel syllables. Next, short synthetic CV's modeled after natural syllables were generated using either a digital, parallel-resonance synthesizer in experiment 2 or linear prediction synthesis in experiment 3. One set of synthetic stimuli preserved the static spectral properties proposed by Stevens and Blumstein. Another set of synthetic stimuli preserved the dynamic properties suggested by Kewley-Port. Listeners in both experiments identified place of articulation significantly better from stimuli which preserved dynamic acoustic properties than from those based on static onset spectra. Evidently, the dynamic structure of the initial stop-vowel articulatory gesture can be preserved in context-free acoustic cues which listeners use to identify place of articulation.


Subject(s)
Phonetics , Speech Acoustics , Speech Perception , Speech , Communication Aids for Disabled , Computers , Cues , Humans , Sound Spectrography , Time Factors
17.
J Acoust Soc Am ; 73(1): 322-35, 1983 Jan.
Article in English | MEDLINE | ID: mdl-6826902

ABSTRACT

Running spectral displays derived from linear prediction analysis were used to examine the initial 40 ms of stop-vowel CV syllables for possible acoustic correlates to place of articulation. Known spectral and temporal properties associated with the stop consonant release gesture were used to define a set of three-time-varying features observable in the visual displays. Judges identified place of articulation using these proposed features from running spectra of the syllables /b,d,g/paired with eight vowels produced by three talkers. Average correct identification of place was 88%; identification was better for the male talkers (92%) than the one female talker (78%). Post hoc analyses suggested, however, that simple rules could be incorporated in the feature definitions to account for differences in vocal tract size. The nature of the information contained in linear prediction running spectra was analyzed further to take account of known properties of the peripheral auditory system. The three proposed time-varying features were shown to be displayed robustly in auditory filtered running spectra. The advantages of describing acoustic correlates for place from the dynamically varying temporal and spectral information in running spectra is discussed with regard to the static template matching approach advocated recently by Blumstein and Stevens [J. Acoust. Soc. Am. 66, 1001-1017 (1979)].


Subject(s)
Phonetics , Sound Spectrography/methods , Speech Acoustics , Speech , Cues , Female , Humans , Male , Time Factors
18.
J Acoust Soc Am ; 72(2): 379-89, 1982 Aug.
Article in English | MEDLINE | ID: mdl-7119280

ABSTRACT

Formant transitions have been considered important context-dependent acoustic cues to place of articulation in stop-vowel syllables. However, the bulk of earlier research supporting their perceptual importance has been conducted primarily with synthetic speech stimuli. The present study examined the acoustic correlates of place of articulation in the voiced formant transitions from natural speech. Linear prediction analysis was used to provide detailed temporal and spectral measurements of the formant transitions for /b,d,g/ paired with eight vowels produced by one talker. Measurements of the transition onset and steady state frequencies, durations, and derived formant loci for F1, F2, and F3 are reported. Analysis of these measures showed little evidence of context invariant acoustic correlates of place. When vowel context was known, most transition parameters were not reliable acoustic correlates of place except for the F2 transition and a two-dimensional representation of F2 X F3 onset frequencies. The results indicated that the information contained in the formant transitions in these natural stop-vowel syllables was not sufficient to distinguish place across all the vowel contexts studied.


Subject(s)
Speech Acoustics , Speech , Humans , Phonetics , Time Factors , Voice
SELECTION OF CITATIONS
SEARCH DETAIL
...