Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
J Acoust Soc Am ; 99(6): 3707-17, 1996 Jun.
Article in English | MEDLINE | ID: mdl-8655802

ABSTRACT

Desirable characteristics of a vocal-tract parametrization include accuracy, low dimensionality, and generalizability across speakers and languages. A low-dimensional, speaker-independent linear parametrization of vowel tongue shapes can be obtained using the PARAFAC three-mode factor analysis procedure [Harshman et al., J. Acoust. Soc. Am. 62, 693-707 (1977)]. Harshman et al. applied PARAFAC to midsagittal x-ray vowel data from five English speakers, reporting that two speaker-independent factors are required to accurately represent the tongue shape measured along anatomically normalized vocal-tract diameter grid lines. Subsequently, the cross-linguistic generality of this parametrization was brought into question by the application of PARAFAC to Icelandic vowel data, where three nonorthogonal factors were reported [Jackson, J. Acoust. Soc. Am. 84, 124-143 (1988)]. This solution is shown to be degenerate; a reanalysis of Jackson's Icelandic data produces two factors that match Harshman et al.'s factors for English vowels, contradicting Jackson's distinction between English and Icelandic language-specific "articulatory primes". To obtain vowel factors not constrained by artificial measurement grid lines, x-ray tongue shape traces of six English speakers were marked with 13 equally spaced points. PARAFAC analysis of this unconstrained (x,y) coordinate data results in two factors that are clearly interpretable in terms of the traditional vowel quality dimensions front/back, high/low.


Subject(s)
Phonetics , Speech/physiology , Tongue/physiology , Cross-Cultural Comparison , Humans , Iceland , Linguistics , Speech Production Measurement , United States
2.
J Acoust Soc Am ; 92(2 Pt 1): 688-700, 1992 Aug.
Article in English | MEDLINE | ID: mdl-1506525

ABSTRACT

This paper describes a method for inferring articulatory parameters from acoustics with a neural network trained on paired acoustic and articulatory data. An x-ray microbeam recorded the vertical movements of the lower lip, tongue tip, and tongue dorsum of three speakers saying the English stop consonants in repeated Ce syllables. A neural network was then trained to map from simultaneously recorded acoustic data to the articulatory data. To evaluate learning, acoustics from the training set were passed through the neural network. To evaluate generalization, acoustics from speakers or consonants excluded from the training set were passed through the network. The articulatory trajectories thus inferred were a good fit to the actual movements in both the learning and generalization conditions, as judged by root-mean-square error and correlation. Inferred trajectories were also matched to templates of lower lip, tongue tip, and tongue dorsum release gestures extracted from the original data. This technique correctly recognized from 94.4% to 98.9% of all gestures in the learning and cross-speaker generalization conditions, and 75% of gestures underlying consonants excluded from the training set. In addition, greater regularity was observed for movements of articulators that were critical in the formation of each consonant.


Subject(s)
Gestures , Neural Networks, Computer , Phonetics , Signal Processing, Computer-Assisted/instrumentation , Sound Spectrography/instrumentation , Speech Articulation Tests/instrumentation , Adult , Humans , Male , Reference Values
3.
J Acoust Soc Am ; 85(2): 913-25, 1989 Feb.
Article in English | MEDLINE | ID: mdl-2926007

ABSTRACT

From a sample of young male Californians, ten speakers were selected whose voices were approximately normally distributed with respect to the "easy-to-remember" versus "hard-to-remember" judgments of a group of raters. A separate group of listeners each heard one of the voices, and, after delays of 1, 2, or 4 weeks, tried to identify the voice they had heard, using an open-set, independent-judgment task. Distributions of the results did not differ from the distributions expected under the hypothesis of independent judgments. For both "heard previously" and "not heard previously" responses, there was a trend toward increasing accuracy as a function of increasing listener certainty. Overall, heard previously responses were less accurate than not heard previously responses. For heard previously responses, there was a trend toward decreasing accuracy as a function of delay between hearing a voice and trying to identify it. Information-theoretic analysis showed loss of information as a function of delay and provided means to quantify the effects of patterns of voice confusability. Signal-detection analysis revealed the similarity of results from diverse experimental paradigms. A "prototype" model is advanced to explain the fact that certain voices are preferentially selected as having been heard previously. The model also unites several previously unconnected findings in the literature on voice recognition and makes testable predictions.


Subject(s)
Memory , Voice , Adult , Humans , Judgment , Male , Models, Psychological , Models, Statistical , Speech Perception , Time Factors
5.
J Acoust Soc Am ; 54(4): 1105-8, 1973 Oct.
Article in English | MEDLINE | ID: mdl-4757456

Subject(s)
Speech , Cues , Humans , Reaction Time
SELECTION OF CITATIONS
SEARCH DETAIL
...