Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Lang Speech ; 64(1): 3-23, 2021 Mar.
Article in English | MEDLINE | ID: mdl-31957542

ABSTRACT

This paper presents the results of three perceptual experiments investigating the role of auditory and visual channels for the identification of statements and echo questions in Brazilian Portuguese. Ten Brazilian speakers (five male) were video-recorded (frontal view of the face) while they produced a sentence ("Como você sabe"), either as a statement (meaning "As you know.") or as an echo question (meaning "As you know?"). Experiments were set up including the two different intonation contours. Stimuli were presented in conditions with clear and degraded audio as well as congruent and incongruent information from both channels. Results show that Brazilian listeners were able to distinguish statements and questions prosodically and visually, with auditory cues being dominant over visual ones. In noisy conditions, the visual channel improved the interpretation of prosodic cues robustly, while it degraded them in conditions where the visual information was incongruent with the auditory information. This study shows that auditory and visual information are integrated during speech perception, also when applied to prosodic patterns.


Subject(s)
Acoustic Stimulation/methods , Facial Expression , Phonetics , Photic Stimulation/methods , Speech Perception/physiology , Adult , Brazil , Cues , Female , Humans , Language , Male
2.
J Acoust Soc Am ; 143(1): 109, 2018 01.
Article in English | MEDLINE | ID: mdl-29390730

ABSTRACT

Acoustic variation in expressive speech at the syllable level is studied. As emotions or attitudes can be conveyed by short spoken words, analysis of paradigmatic variations in vowels is an important issue to characterize the expressive content of such speech segments. The corpus contains 160 sentences produced under seven expressive conditions (Neutral, Anger, Fear, Surprise, Sensuality, Joy, Sadness) acted by a French female speaker (a total of 1120 sentences, 13 140 vowels). Eleven base acoustic parameters are selected for voice source and vocal tract related feature analysis. An acoustic description of the expressions is drawn, using the dimensions of melodic range, intensity, noise, spectral tilt, vocalic space, and dynamic features. The first three functions of a discriminant analysis explain 95% of the variance in the data. These statistical dimensions are consistently associated with acoustic dimensions. Covariation of intensity and F0 explains over 80% of the variance, followed by noise features (8%), covariation of spectral tilt, and F0 (7%). On the basis of isolated vowels alone, expressions are classified with a mean accuracy of 78%.

3.
J Acoust Soc Am ; 135(6): 3601-12, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24907823

ABSTRACT

Cantor Digitalis, a real-time formant synthesizer controlled by a graphic tablet and a stylus, is used for assessment of melodic precision and accuracy in singing synthesis. Melodic accuracy and precision are measured in three experiments for groups of 20 and 28 subjects. The task of the subjects is to sing musical intervals and short melodies, at various tempi, using chironomy (hand-controlled singing), mute chironomy (without audio feedback), and their own voices. The results show the high accuracy and precision obtained by all the subjects for chironomic control of singing synthesis. Some subjects performed significantly better in chironomic singing compared to natural singing, although other subjects showed comparable proficiency. For the chironomic condition, mean note accuracy is less than 12 cents and mean interval accuracy is less than 25 cents for all the subjects. Comparing chironomy and mute chironomy shows that the skills used for writing and drawing are used for chironomic singing, but that the audio feedback helps in interval accuracy. Analysis of blind chironomy (without visual reference) indicates that a visual feedback helps greatly in both note and interval accuracy and precision. This study demonstrates the capabilities of chironomy as a precise and accurate mean for controlling singing synthesis.

4.
Lang Speech ; 55(Pt 2): 263-93, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22783635

ABSTRACT

This study focuses on prosodic evolution in the French news announcer style, based on acoustic and perceptual analysis of French audiovisual archives. A 10-hour corpus covering six decades of broadcast news is investigated automatically. Two prosodic features, which may give an impression of emphatic style, are explored: word-initial stress and penultimate vowel lengthening, especially before a pause. Objective measurements suggest that the following features have decreased since the 40s: mean pitch, pitch rise associated with initial stress, vowel duration characterizing an emphatic initial stress, and prepausal penultimate lengthening. The onsets of stressed initial syllables have become longer while speech rate (measured at the phonemic level) has not changed. This puzzling outcome raises interesting questions for research on French prosody, suggesting that the durational correlates of word-initial stress have changed over time, in the French news announcer style. Three perceptual experiments were conducted using prosody transplantation (copy of fundamental frequency and duration parameters on a synthetic voice), delexicalization and imitation. Rather than manipulating the parameters of,say, word-initial stress, we selected a subset of the corpus to represent the different decades under investigation. Results show that, among other factors, fundamental frequency and duration correlates of prosody contribute to distinguishing early recordings from more recent ones.The higher the pitch and the greater the pitch movements associated with word-initial stress, the more the speech samples are perceived as dating back to the 40s or 50s.


Subject(s)
Phonetics , Speech Acoustics , Speech Perception , Voice Quality , Adult , Audiometry, Speech , Female , Humans , Male , Pattern Recognition, Automated , Signal Processing, Computer-Assisted , Speech Production Measurement , Time Factors
5.
J Acoust Soc Am ; 129(3): 1594-604, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21428522

ABSTRACT

Intonation stylization is studied using "chironomy," i.e., the analogy between hand gestures and prosodic movements. An intonation mimicking paradigm is used. The task of the ten subjects is to copy the intonation pattern of sentences with the help of a stylus on a graphic tablet, using a system for real-time manual intonation modification. Gestural imitation is compared to vocal imitation of the same sentences (seven for a male speaker, seven for a female speaker). Distance measures between gestural copies, vocal imitations, and original sentences are computed for performance assessment. Perceptual testing is also used for assessing the quality of gestural copies. The perceptual difference between natural and stylized contours is measured using a mean opinion score paradigm for 15 subjects. The results indicate that intonation contours can be stylized with accuracy by chironomic imitation. The results of vocal imitation and chironomic imitation are comparable, but subjects show better imitation results in vocal imitation. The best stylized contours using chironomy seems perceptually indistinguishable or almost indistinguishable from natural contours, particularly for female speech. This indicates that chironomic stylization is effective, and that hand movements can be analogous to intonation movements.


Subject(s)
Gestures , Hand , Imitative Behavior , Speech Acoustics , Adult , Analysis of Variance , Biomechanical Phenomena , Computer Graphics , Cues , Female , Humans , Male , Pattern Recognition, Physiological , Pitch Discrimination , Sound Spectrography , Time Factors , Young Adult
6.
Lang Speech ; 52(Pt 2-3): 223-43, 2009.
Article in English | MEDLINE | ID: mdl-19624031

ABSTRACT

Whereas several studies have explored the expression of emotions, little is known on how the visual and audio channels are combined during production of what we call the more controlled social affects, for example, "attitudinal" expressions. This article presents a perception study of the audovisual expression of 12 Japanese and 6 French attitudes in order to understand the contribution of audio and visual modalities for affective communication. The relative importance of each modality in the perceptual decoding of the expressions of four speakers is analyzed as a first step towards a deeper comprehension of their influence on the expression of social affects. Then, the audovisual productions of two speakers (one for each language) are acoustically (F0, duration and intensity) and visually (in terms of Action Units) analyzed, in order to match the relation between objective parameters and listeners' perception of these social affects. The most pertinent objective features, either acoustic or visual, are then discussed, in a bilingual perspective: for example, the relative influence of fundamental frequency for attitudinal expression in both languages is discussed, and the importance of a certain aspect of the voice quality dimension in Japanese is underlined.


Subject(s)
Affect , Auditory Perception , Language , Social Behavior , Speech Perception , Visual Perception , Adolescent , Adult , Culture , Female , France , Humans , Japan , Male , Psycholinguistics , Speech , Speech Acoustics , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...