Search | VHL Regional Portal

1.

What do you learn from a single cue? Dimensional reweighting and cue reassociation from experience with a newly unreliable phonetic cue.

Kapatsinski, Vsevolod; Bramlett, Adam A; Idemaru, Kaori.

Cognition ; 249: 105818, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38772253

ABSTRACT

In language comprehension, we use perceptual cues to infer meanings. Some of these cues reside on perceptual dimensions. For example, the difference between bear and pear is cued by a difference in voice onset time (VOT), which is a continuous perceptual dimension. The present paper asks whether, and when, experience with a single value on a dimension behaving unexpectedly is used by the learner to reweight the whole dimension. We show that learners reweight the whole VOT dimension when exposed to a single VOT value (e.g., 45 ms) and provided with feedback indicating that the speaker intended to produce a /b/ 50% of the time and a /p/ the other 50% of the time. Importantly, dimensional reweighting occurs only if 1) the 50/50 feedback is unexpected for the VOT value, and 2) there is another dimension that is predictive of feedback. When no predictive dimension is available, listeners reassociate the experienced VOT value with the more surprising outcome but do not downweight the entire VOT dimension. These results provide support for perceptual representations of speech sounds that combine cues and dimensions, for viewing perceptual learning in speech as a combination of error-driven cue reassociation and dimensional reweighting, and for considering dimensional reweighting to be reallocation of attention that occurs only when there is evidence that reallocating attention would improve prediction accuracy (Harmon, Z., Idemaru, K., & Kapatsinski, V. 2019. Learning mechanisms in cue reweighting. Cognition, 189, 76-88.).

Subject(s)

Cues , Learning , Phonetics , Speech Perception , Humans , Speech Perception/physiology , Learning/physiology , Adult , Young Adult , Female , Male

2.

Speaking rate normalization across different talkers in the perception of Japanese stop and vowel length contrasts.

Kawahara, Shigeto; Kato, Misaki; Idemaru, Kaori.

JASA Express Lett ; 2(3): 035204, 2022 03.

Article in English | MEDLINE | ID: mdl-36154627

ABSTRACT

Perception of duration is critically influenced by the speaking rate of the surrounding context. However, to what extent this speaking rate normalization is talker-specific is understudied. This experiment investigated whether Japanese listeners' perception of temporally contrastive phonemes is influenced by the speaking rate of the surrounding context, and more importantly, whether the effect of the contextual speaking rate persists across different talkers for different types of contrasts: a singleton-geminate stop contrast and short-long vowel contrast in Japanese. The results suggest that listeners generalized their rate-based adjustments to different talkers' speech regardless of whether the target contrasts depended on silent closure duration or vowel duration. Our results thus support the view that speaking rate normalization is an obligatory process that happens in the early phase of perception.

Subject(s)

Phonetics , Speech Perception , Japan , Speech , Speech Production Measurement

3.

Effects of First Language Background and Learning Experience in Perceiving Mandarin Lexical Tones: Learners and Nonlearners From English- and Japanese-Speaking Backgrounds.

Tsukada, Kimiko; Idemaru, Kaori.

J Speech Lang Hear Res ; 65(2): 829-842, 2022 02 09.

Article in English | MEDLINE | ID: mdl-35015971

ABSTRACT

PURPOSE: This research compared individuals from two first language (L1) backgrounds (English and Japanese) to determine how they may differ in their perception of Mandarin tones (Tones 1 vs. 2 [T1-T2], Tones 1 vs. 3 [T1-T3], Tones 1 vs. 4 [T1-T4], Tones 2 vs. 3 [T2-T3], Tones 2 vs. 4 [T2-T4], Tones 3 vs. 4 [T3-T4]) on account of their L1. METHOD: The participants included two groups of learners of Mandarin (23 English speakers, 18 Japanese speakers), two groups of nonlearners of Mandarin (24 English speakers, 21 Japanese speakers), and a control group of 10 Mandarin speakers. A four-alternative forced-choice discrimination task that included 360 trials was presented in three blocks of 120 trials. RESULTS: The native Mandarin group was more accurate in their tonal discrimination of all six tone pairs than all the nonnative groups. While Japanese nonlearners generally outperformed English nonlearners in their overall perception of Mandarin lexical tones, L1-based differences were less extensive for the two groups of learners. Both learner groups were least accurate on T2-T3 and most accurate on T3-T4. CONCLUSION: The results suggest that with classroom experience, English speakers can overcome their initial disadvantage and learn lexical tones in a new language as successfully as speakers of Japanese with classroom experience.

Subject(s)

Language , Speech Perception , Humans , Japan , Learning , Pitch Perception

4.

Rethinking the frequency code: a meta-analytic review of the role of acoustic body size in communicative phenomena.

Winter, Bodo; Oh, Grace Eunhae; Hübscher, Iris; Idemaru, Kaori; Brown, Lucien; Prieto, Pilar; Grawunder, Sven.

Philos Trans R Soc Lond B Biol Sci ; 376(1840): 20200400, 2021 12 20.

Article in English | MEDLINE | ID: mdl-34719247

ABSTRACT

The widely cited frequency code hypothesis attempts to explain a diverse range of communicative phenomena through the acoustic projection of body size. The set of phenomena includes size sound symbolism (using /i/ to signal smallness in words such as teeny), intonational phonology (using rising contours to signal questions) and the indexing of social relations via vocal modulation, such as lowering one's voice pitch to signal dominance. Among other things, the frequency code is commonly interpreted to suggest that polite speech should be universally signalled via high pitch owing to the association of high pitch with small size and submissiveness. We present a cross-cultural meta-analysis of polite speech of 101 speakers from seven different languages. While we find evidence for cross-cultural variation, voice pitch is on average lower when speakers speak politely, contrary to what the frequency code predicts. We interpret our findings in the light of the fact that pitch has a multiplicity of possible communicative meanings. Cultural and contextual variation determines which specific meanings become manifest in a specific interactional context. We use the evidence from our meta-analysis to propose an updated view of the frequency code hypothesis that is based on the existence of many-to-many mappings between speech acoustics and communicative interpretations. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.

Subject(s)

Speech Perception , Voice , Acoustics , Body Size , Language , Speech

5.

Perceptual tracking of distinct distributional regularities within a single voice.

Idemaru, Kaori; Vaughn, Charlotte.

J Acoust Soc Am ; 148(6): EL427, 2020 12.

Article in English | MEDLINE | ID: mdl-33379901

ABSTRACT

The speech signal is inherently variable and listeners need to recalibrate when local, short-term distributions of acoustic dimensions deviate from long-term representation. The present experiment investigated the specificity of this perceptual adjustment, addressing whether the perceptual system is capable of tracking differing simultaneous short-term acoustic distributions of the same speech categories, conditioned by context. The results indicated that instead of aggregating over the contextual variation, listeners tracked separate distributional statistics for instances of speech categories experienced in different phonetic/lexical contexts, suggesting that perceptual learning is not only influenced by distributional statistics, but also by external factors such as contextual information.

6.

Generalization of dimension-based statistical learning.

Idemaru, Kaori; Holt, Lori L.

Atten Percept Psychophys ; 82(4): 1744-1762, 2020 May.

Article in English | MEDLINE | ID: mdl-31907842

ABSTRACT

Recent research demonstrates that the relationship between an acoustic dimension and speech categories is not static. Rather, it is influenced by the evolving distribution of dimensional regularity experienced across time, and specific to experienced individual sounds. Three studies examine the nature of this perceptual, dimension-based statistical learning of artificially accented [b] and [p] speech categories in online word recognition by testing generalization of learning across contexts, and testing the effect of a larger word list across which learning is induced. The results indicate that whereas learning of accented [b] and [p] generalizes across contexts, generalization to contexts not experienced in the accent is weaker even for the same speech categories [b] and [p] spoken by the same speaker. The results support a rich model of speech representation that is sensitive to context-dependent variation in the way the acoustic dimensions are related to speech categories.

Subject(s)

Learning , Acoustic Stimulation , Acoustics , Generalization, Psychological , Humans , Phonetics , Speech , Speech Perception

7.

Loudness Trumps Pitch in Politeness Judgments: Evidence from Korean Deferential Speech.

Idemaru, Kaori; Winter, Bodo; Brown, Lucien; Oh, Grace Eunhae.

Lang Speech ; 63(1): 123-148, 2020 Mar.

Article in English | MEDLINE | ID: mdl-30732514

ABSTRACT

Social meaning is not conveyed through words alone, but also through how words are produced phonetically. This paper investigates the role of loudness and pitch in determining the perception of politeness-related judgments in Korean. It has been proposed that high pitch is universally associated with polite or deferential social meanings. In contrast to this, Experiment 1 examined the perceptual effect of pitch and found no effect. Experiment 2 tested the effect of loudness, and found that listeners associate quieter speech with deference. Finally, Experiment 3 investigated the simultaneous effects of loudness and pitch, and found again that loudness had a consistent effect, whereas pitch only had a weak effect. Analyses of individual differences suggest that in contrast to loudness, which is interpreted uniformly across Korean listeners, pitch has more variegated social meanings: Some listeners associate high pitch with deferential meaning, others associate low pitch with deferential meaning. Thus, we find loudness to be a more unambiguous indicator of deferential speech than pitch. These findings shed light on how different acoustic properties contribute to the indexing of social stances, and they suggest that the role of pitch in conveying politeness may have been overstated in past research.

Subject(s)

Judgment , Loudness Perception , Phonetics , Pitch Perception , Speech Perception , Adult , Female , Humans , Language , Male , Republic of Korea , Young Adult

8.

Learning mechanisms in cue reweighting.

Harmon, Zara; Idemaru, Kaori; Kapatsinski, Vsevolod.

Cognition ; 189: 76-88, 2019 08.

Article in English | MEDLINE | ID: mdl-30928780

ABSTRACT

Feedback has been shown to be effective in shifting attention across perceptual cues to a phonological contrast in speech perception (Francis, Baldwin & Nusbaum, 2000). However, the learning mechanisms behind this process remain obscure. We compare the predictions of supervised error-driven learning (Rescorla & Wagner, 1972) and reinforcement learning (Sutton & Barto, 1998) using computational simulations. Supervised learning predicts downweighting of an informative cue when the learner receives evidence that it is no longer informative. In contrast, reinforcement learning suggests that a reduction in cue weight requires positive evidence for the informativeness of an alternative cue. Experimental evidence supports the latter prediction, implicating reinforcement learning as the mechanism behind the effect of feedback on cue weighting in speech perception. Native English listeners were exposed to either bimodal or unimodal VOT distributions spanning the unaspirated/aspirated boundary (bear/pear). VOT is the primary cue to initial stop voicing in English. However, lexical feedback in training indicated that VOT was no longer predictive of voicing. Reduction in the weight of VOT was observed only when participants could use an alternative cue, F0, to predict voicing. Frequency distributions had no effect on learning. Overall, the results suggest that attention shifting in learning the phonetic cues to phonological categories is accomplished using simple reinforcement learning principles that also guide the choice of actions in other domains.

Subject(s)

Cues , Feedback, Psychological/physiology , Psycholinguistics , Reinforcement, Psychology , Speech Perception/physiology , Supervised Machine Learning , Computer Simulation , Humans , Phonetics

9.

Acoustic Sources of Accent in Second Language Japanese Speech.

Idemaru, Kaori; Wei, Peipei; Gubbins, Lucy.

Lang Speech ; 62(2): 333-357, 2019 Jun.

Article in English | MEDLINE | ID: mdl-29764295

ABSTRACT

This study reports an exploratory analysis of the acoustic characteristics of second language (L2) speech which give rise to the perception of a foreign accent. Japanese speech samples were collected from American English and Mandarin Chinese speakers ( n = 16 in each group) studying Japanese. The L2 participants and native speakers ( n = 10) provided speech samples modeling after six short sentences. Segmental (vowels and stops) and prosodic features (rhythm, tone, and fluency) were examined. Native Japanese listeners ( n = 10) rated the samples with regard to degrees of foreign accent. The analyses predicting accent ratings based on the acoustic measurements indicated that one of the prosodic features in particular, tone (defined as high and low patterns of pitch accent and intonation in this study), plays an important role in robustly predicting accent rating in L2 Japanese across the two first language (L1) backgrounds. These results were consistent with the prediction based on phonological and phonetic comparisons between Japanese and English, as well as Japanese and Mandarin Chinese. The results also revealed L1-specific predictors of perceived accent in Japanese. The findings of this study contribute to the growing literature that examines sources of perceived foreign accent.

Subject(s)

Multilingualism , Phonetics , Speech Acoustics , Speech Perception , Voice Quality , Acoustics , Adolescent , Adult , Female , Humans , Male , Pattern Recognition, Physiological , Speech Production Measurement , Young Adult

10.

Re-Examining Phonetic Variability in Native and Non-Native Speech.

Vaughn, Charlotte; Baese-Berk, Melissa; Idemaru, Kaori.

Phonetica ; 76(5): 327-358, 2019.

Article in English | MEDLINE | ID: mdl-30086539

ABSTRACT

BACKGROUND/AIMS: Non-native speech is frequently characterized as being more variable than native speech. However, the few studies that have directly investigated phonetic variability in the speech of second language learners have considered a limited subset of native/non-native language pairings and few linguistic features. METHODS: The present study examines group-level withinspeaker variability and central tendencies in acoustic properties of vowels andstops produced by learners of Japanese from two native language backgrounds, English and Mandarin, as well as native Japanese speakers. RESULTS: Results show that non-native speakers do not always exhibit more phonetic variability than native speakers, but rather that patterns of variability are specific to individual linguistic features and their instantiations in L1 and L2. CONCLUSION: Adopting this more nuanced approach to variability offers important enhancements to several areas of linguistic theory.

Subject(s)

Multilingualism , Phonetics , Speech Acoustics , Speech , Adolescent , Adult , Female , Humans , Language , Linguistics , Male , Speech Production Measurement , Young Adult

11.

Specificity of dimension-based statistical learning in word recognition.

Idemaru, Kaori; Holt, Lori L.

J Exp Psychol Hum Percept Perform ; 40(3): 1009-21, 2014 Jun.

Article in English | MEDLINE | ID: mdl-24364708

ABSTRACT

Speech perception flexibly adapts to short-term regularities of ambient speech input. Recent research demonstrates that the function of an acoustic dimension for speech categorization at a given time is relative to its relationship to the evolving distribution of dimensional regularity across time, and not simply to a fixed value along the dimension. Two experiments examine the nature of this dimension-based statistical learning in online word recognition, testing generalization of learning across phonetic categories. While engaged in a word recognition task guided by perceptually unambiguous voice-onset time (VOT) acoustics signaling stop voicing in either bilabial rhymes, beer and pier, or alveolar rhymes, deer and tear, listeners were exposed incidentally to an artificial "accent" deviating from English norms in its correlation of the pitch onset of the following vowel (F0) with VOT (Experiment 1). Exposure to the change in the correlation of F0 with VOT led listeners to down-weight reliance on F0 in voicing categorization, indicating dimension-based statistical learning. This learning was observed only for the "accented" contrast varying in its F0/VOT relationship during exposure; learning did not generalize to the other place of articulation. Another group of listeners experienced competing F0/VOT correlations across place of articulation such that the global correlation for voicing was stable, but locally correlations across voicing pairs were opposing (e.g., "accented" beer and pier, "canonical" deer and tear, Experiment 2). Listeners showed dimension-based learning only for the accented pair, not the canonical pair, indicating that they are able to track separate acoustic statistics across place of articulation, that is, for /b-p/ and /d-t/. This suggests that dimension-based learning does not operate obligatorily at the phonological level of stop voicing.

Subject(s)

Phonetics , Recognition, Psychology , Speech Acoustics , Speech Perception , Verbal Learning , Generalization, Psychological , Memory, Short-Term , Paired-Associate Learning , Perceptual Distortion , Psychoacoustics , Semantics

12.

The developmental trajectory of children's perception and production of English /r/-/l/.

Idemaru, Kaori; Holt, Lori L.

J Acoust Soc Am ; 133(6): 4232-46, 2013 Jun.

Article in English | MEDLINE | ID: mdl-23742374

ABSTRACT

The English /l-r/ distinction is difficult to learn for some second language learners as well as for native-speaking children. This study examines the use of the second (F2) and third (F3) formants in the production and perception of /l/ and /r/ sounds in 4-, 4.5-, 5.5-, and 8.5-yr-old English-speaking children. The children were tested with elicitation and repetition tasks as well as word recognition tasks. The results indicate that whereas young children's /l/ and /r/ in both production and perception show fairly high accuracy and were well defined along the primary acoustic parameter that differentiates them, F3 frequency, these children were still developing in regard to the integration of the secondary cue, F2 frequency. The pattern of development is consistent with the distribution of these features in the ambient input relative to the /l/ and /r/ category distinction: F3 is robust and reliable, whereas F2 is less reliable in distinguishing /l/ and /r/. With delayed development of F2, cue weighting of F3 and F2 for the English /l-r/ categorization seems to continue to develop beyond 8 or 9 yr of age. These data are consistent with a rather long trajectory of phonetic development whereby native categories are refined and tuned well into childhood.

Subject(s)

Language Development , Phonation , Phonetics , Speech Perception , Age Factors , Child , Child, Preschool , Cues , Female , Humans , Male , Sound Spectrography , Speech Acoustics , Speech Production Measurement

13.

Individual differences in cue weights are stable across time: the case of Japanese stop lengths.

Idemaru, Kaori; Holt, Lori L; Seltman, Howard.

J Acoust Soc Am ; 132(6): 3950-64, 2012 Dec.

Article in English | MEDLINE | ID: mdl-23231125

ABSTRACT

Speech categories are defined by multiple acoustic dimensions, and listeners give differential weighting to dimensions in phonetic categorization. The informativeness (predictive strength) of dimensions for categorization is considered an important factor in determining perceptual weighting. However, it is unknown how the perceptual system weighs acoustic dimensions with similar informativeness. This study investigates perceptual weighting of two acoustic dimensions with similar informativeness, exploiting the absolute and relative durations that are nearly equivalent in signaling Japanese singleton and geminate stop categories. In the perception experiments, listeners showed strong individual differences in their perceptual weighting of absolute and relative durations. Furthermore, these individual patterns were stable over repeated testing across as long as 2 months and were resistant to perturbation through short-term manipulation of speech input. Listeners own speech productions were not predictive of how they weighted relative and absolute duration. Despite the theoretical advantage of relative (as opposed to absolute) duration cues across contexts, relative cues are not utilized by all listeners. Moreover, examination of individual differences in cue weighting is a useful tool in exposing the complex relationship between perceptual cue weighting and language regularities.

Subject(s)

Cues , Phonetics , Speech Acoustics , Speech Perception , Voice Quality , Adult , Audiometry, Speech , Female , Humans , Japan , Male , Regression Analysis , Sound Spectrography , Speech Production Measurement , Time Factors , Young Adult

14.

Word recognition reflects dimension-based statistical learning.

Idemaru, Kaori; Holt, Lori L.

J Exp Psychol Hum Percept Perform ; 37(6): 1939-56, 2011 Dec.

Article in English | MEDLINE | ID: mdl-22004192

ABSTRACT

Speech processing requires sensitivity to long-term regularities of the native language yet demands listeners to flexibly adapt to perturbations that arise from talker idiosyncrasies such as nonnative accent. The present experiments investigate whether listeners exhibit dimension-based statistical learning of correlations between acoustic dimensions defining perceptual space for a given speech segment. While engaged in a word recognition task guided by a perceptually unambiguous voice-onset time (VOT) acoustics to signal beer, pier, deer, or tear, listeners were exposed incidentally to an artificial "accent" deviating from English norms in its correlation of the pitch onset of the following vowel (F0) to VOT. Results across four experiments are indicative of rapid, dimension-based statistical learning; reliance on the F0 dimension in word recognition was rapidly down-weighted in response to the perturbation of the correlation between F0 and VOT dimensions. However, listeners did not simply mirror the short-term input statistics. Instead, response patterns were consistent with a lingering influence of sensitivity to the long-term regularities of English. This suggests that the very acoustic dimensions defining perceptual space are not fixed and, rather, are dynamically and rapidly adjusted to the idiosyncrasies of local experience, such as might arise from nonnative-accent, dialect, or dysarthria. The current findings extend demonstrations of "object-based" statistical learning across speech segments to include incidental, online statistical learning of regularities residing within a speech segment.

Subject(s)

Learning , Speech Perception , Acoustic Stimulation , Humans , Psycholinguistics , Speech , Time Factors

15.

Relational timing in the production and perception of Japanese singleton and geminate stops.

Idemaru, Kaori; Guion-Anderson, Susan.

Phonetica ; 67(1-2): 25-46, 2010.

Article in English | MEDLINE | ID: mdl-20798568

ABSTRACT

This work examines the production and perception of the Japanese singleton versus geminate stop contrast in order to investigate properties that distinguish the contrast in the face of variability due to speech rate. The acoustic study found two local relational durations, the ratio of the stop to the preceding mora and the ratio of the stop to the following vowel, to be stable across speaking rates and to accurately classify singleton and geminate productions. However, the subsequent perception study demonstrated an influential role of the preceding mora duration and a marginal role of the following vowel duration on listeners' categorization. These results demonstrate that Japanese listeners can take advantage of relative duration in the perception of the stop length contrast, and that relative strength of simultaneously available acoustic cues does not necessarily translate into equal perceptual importance.

Subject(s)

Language , Phonetics , Semantics , Speech Acoustics , Speech Perception , Speech Production Measurement , Verbal Behavior , Cues , Humans , Japan , Sound Spectrography

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL