Search | VHL Regional Portal

1.

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

Cavalcanti, Julio Cesar; da Silva, Ronaldo Rodrigues; Eriksson, Anders; Barbosa, Plinio A.

Front Artif Intell ; 7: 1287877, 2024.

Article in English | MEDLINE | ID: mdl-38405218

ABSTRACT

This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

2.

NEOS: An odour-induced affect scale for use in the cosmetic industry.

Barbosa, Plinio A; Semenzim, Thaís Bellintani; Marques, Lucas Murrins; Serpa, Alexandre Luiz de Oliveira; Yoshimine, Elise; Tobo, Patricia.

Int J Cosmet Sci ; 46(1): 51-61, 2024 Feb.

Article in English | MEDLINE | ID: mdl-37594727

ABSTRACT

This work proposes an odour-induced affect scale for use in the cosmetic industry that relies on the approach that produced the UniGEOS, a universal odour-related emotional scale from the Swiss Center for Affective Sciences. The Natura Emotion and Odor Scale (NEOS) was built on experiments conducted with a larger set of participants (491) and a set of 35 scents that combine seven commercial perfumes from Natura & Co cosmetic company with 28 odours from different olfactory classes important for the cosmetic industry. The results showed the stability of 60 Emotion-Related terms in Brazilian Portuguese split into five emotion-related dimensions: Romance, Attention, Energy, Well-being and Negative feelings. The association of the scents evoking these five dimensions has direct implications in the design of new products.

Ce travail propose une échelle d'affect induite par des odeurs passible d'être utilisée dans l'industrie cosmétique. Cette échelle s'appuie sur l'approche qui a produit l'UniGEOS, une échelle affective universelle liée aux odeurs du Swiss Center for Affective Sciences. L'échelle Natura Emotion and Odor Scale (NEOS) a été construite sur la base d'expériences menées avec un plus grand nombre de participants (491) et un ensemble de 35 arômes combinant sept parfums commerciaux de la compagnie Natura & Co avec 28 odeurs de différentes classes olfactives importantes pour l'industrie cosmétique. Les résultats ont montré la stabilité de 60 termes liés aux émotions en portugais brésilien, répartis en cinq dimensions liées aux émotions : Romance, Attention, Énergie, Bien-être et Sentiments négatifs. L'association des arômes évoquant ces cinq dimensions a des implications directes dans la conception de nouveaux produits.

Subject(s)

Cosmetics , Odorants , Humans , Emotions , Smell , Brazil

3.

On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

Front Psychol ; 14: 1101187, 2023.

Article in English | MEDLINE | ID: mdl-37138997

ABSTRACT

This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetry concerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

4.

Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

PLoS One ; 17(1): e0262800, 2022.

Article in English | MEDLINE | ID: mdl-35061853

ABSTRACT

The purpose of this study was to assess the speaker-discriminatory potential of a set of speech timing parameters while probing their suitability for forensic speaker comparison applications. The recordings comprised of spontaneous dialogues between twin pairs through mobile phones while being directly recorded with professional headset microphones. Speaker comparisons were performed with twins speakers engaged in a dialogue (i.e., intra-twin pairs) and among all subjects (i.e., cross-twin pairs). The participants were 20 Brazilian Portuguese speakers, ten male identical twin pairs from the same dialectal area. A set of 11 speech timing parameters was extracted and analyzed, including speech rate, articulation rate, syllable duration (V-V unit), vowel duration, and pause duration. Three system performance estimates were considered for assessing the suitability of the parameters for speaker comparison purposes, namely global Cllr, EER, and AUC values. These were interpreted while also taking into consideration the analysis of effect sizes. Overall, speech rate and articulation rate were found the most reliable parameters, displaying the largest effect sizes for the factor "speaker" and the best system performance outcomes, namely lowest Cllr, EER, and highest AUC values. Conversely, smaller effect sizes were found for the other parameters, which is compatible with a lower explanatory potential of the speaker identity on the duration of such units and a possibly higher linguistic control regarding their temporal variation. In addition, there was a tendency for speech timing estimates based on larger temporal intervals to present larger effect sizes and better speaker-discriminatory performance. Finally, identical twin pairs were found remarkably similar in their speech temporal patterns at the macro and micro levels while engaging in a dialogue, resulting in poor system discriminatory performance. Possible underlying factors for such a striking convergence in identical twins' speech timing patterns are presented and discussed.

Subject(s)

Speech , Twins, Monozygotic/psychology , Adult , Forensic Psychology , Humans , Male , Phonetics , Speech Perception , Tape Recording , Time Factors , Young Adult

5.

Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

J Voice ; 2021 Oct 07.

Article in English | MEDLINE | ID: mdl-34629229

ABSTRACT

OBJECTIVE: To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). PARTICIPANTS: A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recruited. METHOD: the participants were recorded directly through professional microphones while taking part in a spontaneous dialogue over mobile phones. Acoustic measurements were performed in connected speech samples, and in lengthened vowels, at least 160 ms long produced during spontaneous speech. RESULTS: f0 baseline, central tendency, and extreme values were found mostly discriminatory in intra-twin pair and cross-pair comparisons. These were also the estimates displaying the largest effect sizes. Overall, only three identical twins were found statistically different regarding their f0 patterns in connected speech, but not for lengthened vowel-based f0 metrics. Estimates of f0 variation and modulation were found the least discriminatory across speakers, which may signal the control of speaking style and dialect on dynamic patterns of f0. Concerning system performance, the base value of f0 (f0 baseline) was found the most reliable metric, displaying the lowest equal error rate (EER). CONCLUSIONS: the outcomes suggest that, although identical twins were very closely related regarding their f0 patterns, some pairs could still be differentiated acoustically, only in connected speech. Such findings reinforce the relevance of analyzing long-term f0 metrics for speaker comparison purposes, with particular consideration to f0 baseline. Furthermore, f0 differences across subjects were suggested as more expressive in connected speech than in lengthened vowels.

6.

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

PLoS One ; 16(2): e0246645, 2021.

Article in English | MEDLINE | ID: mdl-33600430

ABSTRACT

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels' acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

Subject(s)

Speech Acoustics , Speech/physiology , Verbal Behavior/physiology , Acoustics , Adult , Brazil , Forensic Sciences/methods , Humans , Language , Male , Phonetics , Psychoacoustics , Speech Perception/physiology , Twins, Monozygotic

7.

R.H. Stetson, Motor Phonetics: A Study of Speech Movements in Action, 2nd ed., Amsterdam, North Holland Publishing Co., 1951.

Barbosa, Plinio A.

Phonetica ; 74(4): 255-258, 2017.

Article in English | MEDLINE | ID: mdl-29131119

8.

Iluminating some methodological issues concerning speech timing research from a comparison between european and brazilian portuguese

Barbosa, Plínio A.

Cad. estud. linguist ; (39): 41-50, jul.-dez.2000. tab

Article in English | Index Psychology - journals | ID: psi-17073

ABSTRACT

Algumas questões metodológicas resferentes à pesquisa sobre a organização temporal da fala são apresentadas e discutidas. A partir de uma comparação de dois corpora de frases lidas por dois locutores brasileiros e uma locutora portuguesa, avalia-se duas técnicas que visam à determinação do tipo rítimico da línguas: uma técnica dinâmica voltada para aspectos prodódicos e uma técnica voltada para aspectos segmentais. Discutem-se a importância da condição ceteris paribus e os critérios de mensuração nos experimentos em Fonética Acústica, no intuito de iluminar o uso adequado de ambas as técnicas. A primeira parece, no entanto, dar melhor conta daquilo que se conhece da tipologia das línguas do mundo. A segunda pode ser melhorada se alguns cuidados metodológicos básicos são levados em conta. O uso conjunto das duas técnicas poderá ser de grande utilidade para o estudo de tipologia rítmica(AU)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL