Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Lang Speech ; : 238309241258162, 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38877720

RESUMO

Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating VOORnaam or voorNAAM). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts.

2.
Psychon Bull Rev ; 2023 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-37783898

RESUMO

Statistical learning - the ability to extract distributional regularities from input - is suggested to be key to language acquisition. Yet, evidence for the human capacity for statistical learning comes mainly from studies conducted in carefully controlled settings without auditory distraction. While such conditions permit careful examination of learning, they do not reflect the naturalistic language learning experience, which is replete with auditory distraction - including competing talkers. Here, we examine how statistical language learning proceeds in a virtual cocktail party environment, where the to-be-learned input is presented alongside a competing speech stream with its own distributional regularities. During exposure, participants in the Dual Talker group concurrently heard two novel languages, one produced by a female talker and one by a male talker, with each talker virtually positioned at opposite sides of the listener (left/right) using binaural acoustic manipulations. Selective attention was manipulated by instructing participants to attend to only one of the two talkers. At test, participants were asked to distinguish words from part-words for both the attended and the unattended languages. Results indicated that participants' accuracy was significantly higher for trials from the attended vs. unattended language. Further, the performance of this Dual Talker group was no different compared to a control group who heard only one language from a single talker (Single Talker group). We thus conclude that statistical learning is modulated by selective attention, being relatively robust against the additional cognitive load provided by competing speech, emphasizing its efficiency in naturalistic language learning situations.

3.
J Exp Psychol Hum Percept Perform ; 49(4): 549-565, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37184938

RESUMO

When recognizing spoken words, listeners are confronted by variability in the speech signal caused by talker differences. Previous research has focused on segmental talker variability; less is known about how suprasegmental variability is handled. Here we investigated the use of perceptual learning to deal with between-talker differences in lexical stress. Two groups of participants heard Dutch minimal stress pairs (e.g., VOORnaam vs. voorNAAM, "first name" vs. "respectable") spoken by two male talkers. Group 1 heard Talker 1 use only F0 to signal stress (intensity and duration values were ambiguous), while Talker 2 used only intensity (F0 and duration were ambiguous). Group 2 heard the reverse talker-cue mappings. After training, participants were tested on words from both talkers containing conflicting stress cues ("mixed items"; e.g., one spoken by Talker 1 with F0 signaling initial stress and intensity signaling final stress). We found that listeners used previously learned information about which talker used which cue to interpret the mixed items. For example, the mixed item described above tended to be interpreted as having initial stress by Group 1 but as having final stress by Group 2. This demonstrates that listeners learn how individual talkers signal stress and use that knowledge in spoken-word recognition. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Assuntos
Sinais (Psicologia) , Percepção da Fala , Humanos , Masculino , Aprendizagem , Fala , Idioma
4.
J Cogn Neurosci ; 35(8): 1262-1278, 2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37172122

RESUMO

While listening to meaningful speech, auditory input is processed more rapidly near the end (vs. beginning) of sentences. Although several studies have shown such word-to-word changes in auditory input processing, it is still unclear from which processing level these word-to-word dynamics originate. We investigated whether predictions derived from sentential context can result in auditory word-processing dynamics during sentence tracking. We presented healthy human participants with auditory stimuli consisting of word sequences, arranged into either predictable (coherent sentences) or less predictable (unstructured, random word sequences) 42-Hz amplitude-modulated speech, and a continuous 25-Hz amplitude-modulated distractor tone. We recorded RTs and frequency-tagged neuroelectric responses (auditory steady-state responses) to individual words at multiple temporal positions within the sentences, and quantified sentential context effects at each position while controlling for individual word characteristics (i.e., phonetics, frequency, and familiarity). We found that sentential context increasingly facilitates auditory word processing as evidenced by accelerated RTs and increased auditory steady-state responses to later-occurring words within sentences. These purely top-down contextually driven auditory word-processing dynamics occurred only when listeners focused their attention on the speech and did not transfer to the auditory processing of the concurrent distractor tone. These findings indicate that auditory word-processing dynamics during sentence tracking can originate from sentential predictions. The predictions depend on the listeners' attention to the speech, and affect only the processing of the parsed speech, not that of concurrently presented auditory streams.


Assuntos
Percepção da Fala , Processamento de Texto , Humanos , Percepção da Fala/fisiologia , Percepção Auditiva , Idioma , Fonética
5.
Atten Percept Psychophys ; 84(7): 2303-2318, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35996057

RESUMO

Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often referred to as "rate-dependent speech perception," has been suggested to be the result of a robust, low-level perceptual process, typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening conditions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations, eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study contributes towards understanding the consequences of different types of listening environments on the functioning of low-level perceptual processes that listeners use during speech perception.


Assuntos
Ruído , Percepção da Fala , Percepção Auditiva , Humanos , Idioma , Fala
6.
Lang Speech ; 65(2): 472-490, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34227417

RESUMO

Individuals vary in how they produce speech. This variability affects both the segments (vowels and consonants) and the suprasegmental properties of their speech (prosody). Previous literature has demonstrated that listeners can adapt to variability in how different talkers pronounce the segments of speech. This study shows that listeners can also adapt to variability in how talkers produce lexical stress. Experiment 1 demonstrates a selective adaptation effect in lexical stress perception: repeatedly hearing Dutch trochaic words biased perception of a subsequent lexical stress continuum towards more iamb responses. Experiment 2 demonstrates a recalibration effect in lexical stress perception: when ambiguous suprasegmental cues to lexical stress were disambiguated by lexical orthographic context as signaling a trochaic word in an exposure phase, Dutch participants categorized a subsequent test continuum as more trochee-like. Moreover, the selective adaptation and recalibration effects generalized to novel words, not encountered during exposure. Together, the experiments demonstrate that listeners also flexibly adapt to variability in the suprasegmental properties of speech, thus expanding our understanding of the utility of listener adaptation in speech perception. Moreover, the combined outcomes speak for an architecture of spoken word recognition involving abstract prosodic representations at a prelexical level of analysis.


Assuntos
Percepção da Fala , Sinais (Psicologia) , Humanos , Idioma , Fonética , Fala , Percepção da Fala/fisiologia
7.
Brain Res ; 1769: 147605, 2021 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-34363790

RESUMO

One of the challenges in speech perception is that listeners must deal with considerable segmental and suprasegmental variability in the acoustic signal due to differences between talkers. Most previous studies have focused on how listeners deal with segmental variability. In this EEG experiment, we investigated whether listeners track talker-specific usage of suprasegmental cues to lexical stress to recognize spoken words correctly. In a three-day training phase, Dutch participants learned to map non-word minimal stress pairs onto different object referents (e.g., USklot meant "lamp"; usKLOT meant "train"). These non-words were produced by two male talkers. Critically, each talker used only one suprasegmental cue to signal stress (e.g., Talker A used only F0 and Talker B only intensity). We expected participants to learn which talker used which cue to signal stress. In the test phase, participants indicated whether spoken sentences including these non-words were correct ("The word for lamp is…"). We found that participants were slower to indicate that a stimulus was correct if the non-word was produced with the unexpected cue (e.g., Talker A using intensity). That is, if in training Talker A used F0 to signal stress, participants experienced a mismatch between predicted and perceived phonological word-forms if, at test, Talker A unexpectedly used intensity to cue stress. In contrast, the N200 amplitude, an event-related potential related to phonological prediction, was not modulated by the cue mismatch. Theoretical implications of these contrasting results are discussed. The behavioral findings illustrate talker-specific prediction of prosodic cues, picked up through perceptual learning during training.


Assuntos
Eletroencefalografia/métodos , Narração , Percepção da Fala/fisiologia , Estimulação Acústica , Adolescente , Adulto , Sinais (Psicologia) , Potenciais Evocados/fisiologia , Feminino , Humanos , Idioma , Masculino , Pessoa de Meia-Idade , Tempo de Reação/fisiologia , Adulto Jovem
8.
Behav Res Methods ; 53(5): 1945-1953, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33694079

RESUMO

Many studies of speech perception assess the intelligibility of spoken sentence stimuli by means of transcription tasks ('type out what you hear'). The intelligibility of a given stimulus is then often expressed in terms of percentage of words correctly reported from the target sentence. Yet scoring the participants' raw responses for words correctly identified from the target sentence is a time-consuming task, and hence resource-intensive. Moreover, there is no consensus among speech scientists about what specific protocol to use for the human scoring, limiting the reliability of human scores. The present paper evaluates various forms of fuzzy string matching between participants' responses and target sentences, as automated metrics of listener transcript accuracy. We demonstrate that one particular metric, the token sort ratio, is a consistent, highly efficient, and accurate metric for automated assessment of listener transcripts, as evidenced by high correlations with human-generated scores (best correlation: r = 0.940) and a strong relationship to acoustic markers of speech intelligibility. Thus, fuzzy string matching provides a practical tool for assessment of listener transcript accuracy in large-scale speech intelligibility studies. See https://tokensortratio.netlify.app for an online implementation.


Assuntos
Nomes , Percepção da Fala , Cognição , Humanos , Reprodutibilidade dos Testes , Inteligibilidade da Fala
9.
Proc Biol Sci ; 288(1943): 20202419, 2021 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-33499783

RESUMO

Beat gestures-spontaneously produced biphasic movements of the hand-are among the most frequently encountered co-speech gestures in human communication. They are closely temporally aligned to the prosodic characteristics of the speech signal, typically occurring on lexically stressed syllables. Despite their prevalence across speakers of the world's languages, how beat gestures impact spoken word recognition is unclear. Can these simple 'flicks of the hand' influence speech perception? Across a range of experiments, we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress (e.g. distinguishing OBject from obJECT), and in turn can influence what vowels listeners hear. Thus, we provide converging evidence for a manual McGurk effect: relatively simple and widely occurring hand movements influence which speech sounds we hear.


Assuntos
Gestos , Percepção da Fala , Humanos , Idioma , Fonética , Fala
10.
Behav Res Methods ; 53(2): 744-756, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-32869139

RESUMO

Despite advances in automatic speech recognition (ASR), human input is still essential for producing research-grade segmentations of speech data. Conventional approaches to manual segmentation are very labor-intensive. We introduce POnSS, a browser-based system that is specialized for the task of segmenting the onsets and offsets of words, which combines aspects of ASR with limited human input. In developing POnSS, we identified several sub-tasks of segmentation, and implemented each of these as separate interfaces for the annotators to interact with to streamline their task as much as possible. We evaluated segmentations made with POnSS against a baseline of segmentations of the same data made conventionally in Praat. We observed that POnSS achieved comparable reliability to segmentation using Praat, but required 23% less annotator time investment. Because of its greater efficiency without sacrificing reliability, POnSS represents a distinct methodological advance for the segmentation of speech data.


Assuntos
Processamento de Imagem Assistida por Computador , Fala , Humanos , Reprodutibilidade dos Testes
11.
J Neurosci ; 40(49): 9467-9475, 2020 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-33097640

RESUMO

Neural oscillations track linguistic information during speech comprehension (Ding et al., 2016; Keitel et al., 2018), and are known to be modulated by acoustic landmarks and speech intelligibility (Doelling et al., 2014; Zoefel and VanRullen, 2015). However, studies investigating linguistic tracking have either relied on non-naturalistic isochronous stimuli or failed to fully control for prosody. Therefore, it is still unclear whether low-frequency activity tracks linguistic structure during natural speech, where linguistic structure does not follow such a palpable temporal pattern. Here, we measured electroencephalography (EEG) and manipulated the presence of semantic and syntactic information apart from the timescale of their occurrence, while carefully controlling for the acoustic-prosodic and lexical-semantic information in the signal. EEG was recorded while 29 adult native speakers (22 women, 7 men) listened to naturally spoken Dutch sentences, jabberwocky controls with morphemes and sentential prosody, word lists with lexical content but no phrase structure, and backward acoustically matched controls. Mutual information (MI) analysis revealed sensitivity to linguistic content: MI was highest for sentences at the phrasal (0.8-1.1 Hz) and lexical (1.9-2.8 Hz) timescales, suggesting that the delta-band is modulated by lexically driven combinatorial processing beyond prosody, and that linguistic content (i.e., structure and meaning) organizes neural oscillations beyond the timescale and rhythmicity of the stimulus. This pattern is consistent with neurophysiologically inspired models of language comprehension (Martin, 2016, 2020; Martin and Doumas, 2017) where oscillations encode endogenously generated linguistic content over and above exogenous or stimulus-driven timing and rhythm information.SIGNIFICANCE STATEMENT Biological systems like the brain encode their environment not only by reacting in a series of stimulus-driven responses, but by combining stimulus-driven information with endogenous, internally generated, inferential knowledge and meaning. Understanding language from speech is the human benchmark for this. Much research focuses on the purely stimulus-driven response, but here, we focus on the goal of language behavior: conveying structure and meaning. To that end, we use naturalistic stimuli that contrast acoustic-prosodic and lexical-semantic information to show that, during spoken language comprehension, oscillatory modulations reflect computations related to inferring structure and meaning from the acoustic signal. Our experiment provides the first evidence to date that compositional structure and meaning organize the oscillatory response, above and beyond prosodic and lexical controls.


Assuntos
Psicolinguística , Estimulação Acústica , Adulto , Compreensão/fisiologia , Ritmo Delta/fisiologia , Eletroencefalografia , Feminino , Humanos , Masculino , Processos Mentais/fisiologia , Semântica , Percepção da Fala , Adulto Jovem
12.
J Exp Psychol Hum Percept Perform ; 46(10): 1148-1163, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32614215

RESUMO

To comprehend speech sounds, listeners tune in to speech rate information in the proximal (immediately adjacent), distal (nonadjacent), and global context (further removed preceding and following sentences). Effects of global contextual speech rate cues on speech perception have been shown to follow constraints not found for proximal and distal speech rate. Therefore, listeners may process such global cues at distinct time points during word recognition. We conducted a printed-word eye-tracking experiment to compare the time courses of distal and global rate effects. Results indicated that the distal rate effect emerged immediately after target sound presentation, in line with a general-auditory account. The global rate effect, however, arose more than 200 ms later than the distal rate effect, indicating that distal and global context effects involve distinct processing mechanisms. Results are interpreted in a 2-stage model of acoustic context effects. This model posits that distal context effects involve very early perceptual processes, while global context effects arise at a later stage, involving cognitive adjustments conditioned by higher-level information. (PsycInfo Database Record (c) 2020 APA, all rights reserved).


Assuntos
Sinais (Psicologia) , Psicolinguística , Percepção da Fala/fisiologia , Adolescente , Adulto , Tecnologia de Rastreamento Ocular , Feminino , Humanos , Masculino , Fatores de Tempo , Adulto Jovem
13.
J Cogn Neurosci ; 32(8): 1428-1437, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32427072

RESUMO

Recent neuroimaging evidence suggests that the frequency of entrained oscillations in auditory cortices influences the perceived duration of speech segments, impacting word perception [Kösem, A., Bosker, H. R., Takashima, A., Meyer, A., Jensen, O., & Hagoort, P. Neural entrainment determines the words we hear. Current Biology, 28, 2867-2875, 2018]. We further tested the causal influence of neural entrainment frequency during speech processing, by manipulating entrainment with continuous transcranial alternating current stimulation (tACS) at distinct oscillatory frequencies (3 and 5.5 Hz) above the auditory cortices. Dutch participants listened to speech and were asked to report their percept of a target Dutch word, which contained a vowel with an ambiguous duration. Target words were presented either in isolation (first experiment) or at the end of spoken sentences (second experiment). We predicted that the tACS frequency would influence neural entrainment and therewith how speech is perceptually sampled, leading to a perceptual overestimation or underestimation of the vowel's duration. Whereas results from Experiment 1 did not confirm this prediction, results from Experiment 2 suggested a small effect of tACS frequency on target word perception: Faster tACS leads to more long-vowel word percepts, in line with the previous neuroimaging findings. Importantly, the difference in word perception induced by the different tACS frequencies was significantly larger in Experiment 1 versus Experiment 2, suggesting that the impact of tACS is dependent on the sensory context. tACS may have a stronger effect on spoken word perception when the words are presented in continuous speech as compared to when they are isolated, potentially because prior (stimulus-induced) entrainment of brain oscillations might be a prerequisite for tACS to be effective.


Assuntos
Córtex Auditivo , Estimulação Transcraniana por Corrente Contínua , Percepção Auditiva , Audição , Humanos , Fala
14.
J Acoust Soc Am ; 147(2): 721, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32113258

RESUMO

Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech produce more pronounced amplitude modulations in noise vs in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech.

15.
Q J Exp Psychol (Hove) ; 73(10): 1523-1536, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32160814

RESUMO

Spoken words are highly variable and therefore listeners interpret speech sounds relative to the surrounding acoustic context, such as the speech rate of a preceding sentence. For instance, a vowel midway between short /ɑ/ and long /a:/ in Dutch is perceived as short /ɑ/ in the context of preceding slow speech, but as long /a:/ if preceded by a fast context. Despite the well-established influence of visual articulatory cues on speech comprehension, it remains unclear whether visual cues to speech rate also influence subsequent spoken word recognition. In two "Go Fish"-like experiments, participants were presented with audio-only (auditory speech + fixation cross), visual-only (mute videos of talking head), and audiovisual (speech + videos) context sentences, followed by ambiguous target words containing vowels midway between short /ɑ/ and long /a:/. In Experiment 1, target words were always presented auditorily, without visual articulatory cues. Although the audio-only and audiovisual contexts induced a rate effect (i.e., more long /a:/ responses after fast contexts), the visual-only condition did not. When, in Experiment 2, target words were presented audiovisually, rate effects were observed in all three conditions, including visual-only. This suggests that visual cues to speech rate in a context sentence influence the perception of following visual target cues (e.g., duration of lip aperture), which at an audiovisual integration stage bias participants' target categorisation responses. These findings contribute to a better understanding of how what we see influences what we hear.


Assuntos
Sinais (Psicologia) , Fonética , Percepção da Fala , Fala , Estimulação Acústica , Acústica , Adolescente , Adulto , Feminino , Humanos , Masculino , Estimulação Luminosa , Adulto Jovem
16.
Sci Rep ; 10(1): 5607, 2020 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-32221376

RESUMO

Two fundamental properties of perception are selective attention and perceptual contrast, but how these two processes interact remains unknown. Does an attended stimulus history exert a larger contrastive influence on the perception of a following target than unattended stimuli? Dutch listeners categorized target sounds with a reduced prefix "ge-" marking tense (e.g., ambiguous between gegaan-gaan "gone-go"). In 'single talker' Experiments 1-2, participants perceived the reduced syllable (reporting gegaan) when the target was heard after a fast sentence, but not after a slow sentence (reporting gaan). In 'selective attention' Experiments 3-5, participants listened to two simultaneous sentences from two different talkers, followed by the same target sounds, with instructions to attend only one of the two talkers. Critically, the speech rates of attended and unattended talkers were found to equally influence target perception - even when participants could watch the attended talker speak. In fact, participants' target perception in 'selective attention' Experiments 3-5 did not differ from participants who were explicitly instructed to divide their attention equally across the two talkers (Experiment 6). This suggests that contrast effects of speech rate are immune to selective attention, largely operating prior to attentional stream segregation in the auditory processing hierarchy.


Assuntos
Atenção/fisiologia , Percepção da Fala/fisiologia , Adulto , Feminino , Humanos , Idioma , Masculino , Fonética , Adulto Jovem
17.
Psychol Rev ; 127(2): 281-304, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31886696

RESUMO

That speakers can vary their speaking rate is evident, but how they accomplish this has hardly been studied. Consider this analogy: When walking, speed can be continuously increased, within limits, but to speed up further, humans must run. Are there multiple qualitatively distinct speech "gaits" that resemble walking and running? Or is control achieved by continuous modulation of a single gait? This study investigates these possibilities through simulations of a new connectionist computational model of the cognitive process of speech production, EPONA, that borrows from Dell, Burger, and Svec's (1997) model. The model has parameters that can be adjusted to fit the temporal characteristics of speech at different speaking rates. We trained the model on a corpus of disyllabic Dutch words produced at different speaking rates. During training, different clusters of parameter values (regimes) were identified for different speaking rates. In a 1-gait system, the regimes used to achieve fast and slow speech are qualitatively similar, but quantitatively different. In a multiple gait system, there is no linear relationship between the parameter settings associated with each gait, resulting in an abrupt shift in parameter values to move from speaking slowly to speaking fast. After training, the model achieved good fits in all three speaking rates. The parameter settings associated with each speaking rate were not linearly related, suggesting the presence of cognitive gaits. Thus, we provide the first computationally explicit account of the ability to modulate the speech production system to achieve different speaking styles. (PsycINFO Database Record (c) 2020 APA, all rights reserved).


Assuntos
Função Executiva , Modelos Teóricos , Redes Neurais de Computação , Psicolinguística , Fala , Humanos
18.
Atten Percept Psychophys ; 82(3): 1318-1332, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31338824

RESUMO

Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words that are ambiguous between /bɪt/ (with low F1) and /bɛt/ (with high F1) are more likely to be perceived as "bet" after a "low F1" sentence, but as "bit" after a "high F1" sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham (J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447-1457, 2018b) reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic "cocktail party" settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.


Assuntos
Atenção , Percepção da Fala , Adulto , Percepção Auditiva , Feminino , Humanos , Masculino , Fonética , Fala , Adulto Jovem
19.
J Exp Psychol Learn Mem Cogn ; 46(3): 549-562, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31343252

RESUMO

During spoken language comprehension, listeners make use of both knowledge-based and signal-based sources of information, but little is known about how cues from these distinct levels of representational hierarchy are weighted and integrated online. In an eye-tracking experiment using the visual world paradigm, we investigated the flexible weighting and integration of morphosyntactic gender marking (a knowledge-based cue) and contextual speech rate (a signal-based cue). We observed that participants used the morphosyntactic cue immediately to make predictions about upcoming referents, even in the presence of uncertainty about the cue's reliability. Moreover, we found speech rate normalization effects in participants' gaze patterns even in the presence of preceding morphosyntactic information. These results demonstrate that cues are weighted and integrated flexibly online, rather than adhering to a strict hierarchy. We further found rate normalization effects in the looking behavior of participants who showed a strong behavioral preference for the morphosyntactic gender cue. This indicates that rate normalization effects are robust and potentially automatic. We discuss these results in light of theories of cue integration and the two-stage model of acoustic context effects. (PsycINFO Database Record (c) 2020 APA, all rights reserved).


Assuntos
Compreensão/fisiologia , Sinais (Psicologia) , Psicolinguística , Percepção da Fala/fisiologia , Adulto , Medições dos Movimentos Oculares , Feminino , Humanos , Masculino , Adulto Jovem
20.
J Acoust Soc Am ; 146(1): 179, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31370593

RESUMO

Speech can be produced at different rates. Listeners take this rate variation into account by normalizing vowel duration for contextual speech rate: An ambiguous Dutch word /m?t/ is perceived as short /mɑt/ when embedded in a slow context, but long /ma:t/ in a fast context. While some have argued that this rate normalization involves low-level automatic perceptual processing, there is also evidence that it arises at higher-level cognitive processing stages, such as decision making. Prior research on rate-dependent speech perception has only used explicit recognition tasks to investigate the phenomenon, involving both perceptual processing and decision making. This study tested whether speech rate normalization can be observed without explicit decision making, using a cross-modal repetition priming paradigm. Results show that a fast precursor sentence makes an embedded ambiguous prime (/m?t/) sound (implicitly) more /a:/-like, facilitating lexical access to the long target word "maat" in a (explicit) lexical decision task. This result suggests that rate normalization is automatic, taking place even in the absence of an explicit recognition task. Thus, rate normalization is placed within the realm of everyday spoken conversation, where explicit categorization of ambiguous sounds is rare.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...