Search | VHL Regional Portal

Schema learning for the cocktail party problem.

Woods, Kevin J P; McDermott, Josh H.

Proc Natl Acad Sci U S A ; 115(14): E3313-E3322, 2018 04 03.

Article in English | MEDLINE | ID: mdl-29563229

ABSTRACT

The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.

Subject(s)

Attention/physiology , Auditory Perception/physiology , Learning/physiology , Noise , Sound Localization/physiology , Acoustic Stimulation , Cues , Humans

Headphone screening to facilitate web-based auditory experiments.

Woods, Kevin J P; Siegel, Max H; Traer, James; McDermott, Josh H.

Atten Percept Psychophys ; 79(7): 2064-2072, 2017 Oct.

Article in English | MEDLINE | ID: mdl-28695541

ABSTRACT

Psychophysical experiments conducted remotely over the internet permit data collection from large numbers of participants but sacrifice control over sound presentation and therefore are not widely employed in hearing research. To help standardize online sound presentation, we introduce a brief psychophysical test for determining whether online experiment participants are wearing headphones. Listeners judge which of three pure tones is quietest, with one of the tones presented 180° out of phase across the stereo channels. This task is intended to be easy over headphones but difficult over loudspeakers due to phase-cancellation. We validated the test in the lab by testing listeners known to be wearing headphones or listening over loudspeakers. The screening test was effective and efficient, discriminating between the two modes of listening with a small number of trials. When run online, a bimodal distribution of scores was obtained, suggesting that some participants performed the task over loudspeakers despite instructions to use headphones. The ability to detect and screen out these participants mitigates concerns over sound quality for online experiments, a first step toward opening auditory perceptual research to the possibilities afforded by crowdsourcing.

Subject(s)

Acoustic Stimulation/methods , Auditory Perception/physiology , Hearing Tests/instrumentation , Hearing Tests/methods , Internet , Adult , Female , Hearing/physiology , Humans , Male

Attentive Tracking of Sound Sources.

Woods, Kevin J P; McDermott, Josh H.

Curr Biol ; 25(17): 2238-46, 2015 Aug 31.

Article in English | MEDLINE | ID: mdl-26279234

ABSTRACT

Auditory scenes often contain concurrent sound sources, but listeners are typically interested in just one of these and must somehow select it for further processing. One challenge is that real-world sounds such as speech vary over time and as a consequence often cannot be separated or selected based on particular values of their features (e.g., high pitch). Here we show that human listeners can circumvent this challenge by tracking sounds with a movable focus of attention. We synthesized pairs of voices that changed in pitch and timbre over random, intertwined trajectories, lacking distinguishing features or linguistic information. Listeners were cued beforehand to attend to one of the voices. We measured their ability to extract this cued voice from the mixture by subsequently presenting the ending portion of one voice and asking whether it came from the cued voice. We found that listeners could perform this task but that performance was mediated by attention-listeners who performed best were also more sensitive to perturbations in the cued voice than in the uncued voice. Moreover, the task was impossible if the source trajectories did not maintain sufficient separation in feature space. The results suggest a locus of attention that can follow a sound's trajectory through a feature space, likely aiding selection and segregation amid similar distractors.

Subject(s)

Attention , Cues , Speech Perception , Adult , Female , Humans , Male , Perceptual Masking , Sound Spectrography , Speech Acoustics , Young Adult

Multiple levels of linguistic and paralinguistic features contribute to voice recognition.

Zarate, Jean Mary; Tian, Xing; Woods, Kevin J P; Poeppel, David.

Sci Rep ; 5: 11475, 2015 Jun 19.

Article in English | MEDLINE | ID: mdl-26088739

ABSTRACT

Voice or speaker recognition is critical in a wide variety of social contexts. In this study, we investigated the contributions of acoustic, phonological, lexical, and semantic information toward voice recognition. Native English speaking participants were trained to recognize five speakers in five conditions: non-speech, Mandarin, German, pseudo-English, and English. We showed that voice recognition significantly improved as more information became available, from purely acoustic features in non-speech to additional phonological information varying in familiarity. Moreover, we found that the recognition performance is transferable between training and testing in phonologically familiar conditions (German, pseudo-English, and English), but not in unfamiliar (Mandarin) or non-speech conditions. These results provide evidence suggesting that bottom-up acoustic analysis and top-down influence from phonological processing collaboratively govern voice recognition.

Subject(s)

Auditory Perception , Linguistics , Voice , Acoustic Stimulation , Adolescent , Adult , Female , Humans , Language , Male , Reproducibility of Results , Speech , Young Adult

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL