Search | VHL Regional Portal

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

Park, Soo Jin; Yeung, Gary; Vesselinova, Neda; Kreiman, Jody; Keating, Patricia A; Alwan, Abeer.

J Acoust Soc Am ; 144(1): 375, 2018 07.

Article in English | MEDLINE | ID: mdl-30075658

ABSTRACT

Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.

Subject(s)

Speech Acoustics , Speech Perception/physiology , Speech/physiology , Voice/physiology , Adolescent , Adult , Comprehension/physiology , Female , Humans , Language , Male , Voice Quality , Young Adult

Similarity structure in visual speech perception and optical phonetic signals.

Jiang, Jintao; Auer, Edward T; Alwan, Abeer; Keating, Patricia A; Bernstein, Lynne E.

Percept Psychophys ; 69(7): 1070-83, 2007 Oct.

Article in English | MEDLINE | ID: mdl-18038946

ABSTRACT

A complete understanding of visual phonetic perception (lipreading) requires linking perceptual effects to physical stimulus properties. However, the talking face is a highly complex stimulus, affording innumerable possible physical measurements. In the search for isomorphism between stimulus properties and phoneticeffects, second-order isomorphism was examined between theperceptual similarities of video-recorded perceptually identified speech syllables and the physical similarities among the stimuli. Four talkers produced the stimulus syllables comprising 23 initial consonants followed by one of three vowels. Six normal-hearing participants identified the syllables in a visual-only condition. Perceptual stimulus dissimilarity was quantified using the Euclidean distances between stimuli in perceptual spaces obtained via multidimensional scaling. Physical stimulus dissimilarity was quantified using face points recorded in three dimensions by an optical motion capture system. The variance accounted for in the relationship between the perceptual and the physical dissimilarities was evaluated using both the raw dissimilarities and the weighted dissimilarities. With weighting and the full set of 3-D optical data, the variance accounted for ranged between 46% and 66% across talkers and between 49% and 64% across vowels. The robust second-order relationship between the sparse 3-D point representation of visible speech and the perceptual effects suggests that the 3-D point representation is a viable basis for controlled studies of first-order relationships between visual phonetic perception and physical stimulus attributes.

Subject(s)

Phonetics , Speech Perception , Visual Perception , Adolescent , Adult , Female , Humans , Male , Models, Psychological , Reaction Time

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL