Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Data Brief ; 48: 109205, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37383770

ABSTRACT

This speech dataset is primarily designed to investigate linguistic and speaker information in fricative sounds in Russian. Acoustic recordings were obtained from 59 students (30 females and 29 males) between 18 and 30 years. Eighteen participants were recorded in a second session. The participants were born and lived since their early childhood in St. Petersburg. The participants did not report any speech or hearing impairment. The recording sessions were conducted at the phonetic laboratory of the Phonetic Institute in St. Petersburg, in an audiometric booth using the recording program Speech-Recorder version 3.28.0 at a sample rate of 44.1 kHz (16-bit encoding). During the recordings, a clip-on microphone (Sennheiser MKE 2-P) was placed at a distance of 15cm from the speakers' mouth and connected through an audio interface (Zoom U-22) to a laptop computer. The participants were instructed to read 198 randomized sentences from a computer screen. The fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sʲ], [ɕ], [vʲ], [zʲ] were embedded into those sentences. Two sentence structures were designed to obtain each real-word lexemes produced in three different contexts. The first type of sentence is a so-called carrier sentence with the structure of "She said "X" and not "Y" ". Minimal pairs of real words, containing one of the 11 tested fricatives were placed in both "X" and "Y" positions. The second type of pre-designed sentence was a natural language sentence including each of the lexemes. All raw audio files were first automatically pre-processed by applying the online tool Munich Automatic Segmentation system. Then, the files of the first recording session were filtered below 80 and above 20050 Hz, and the boundaries were manually corrected using Praat. The dataset consists of 22,561 fricative tokens. The number of observations per sound differs across categories, because of their natural distribution. The dataset is made available as a collection of audio files in wav format along with companion Praat TextGrid files for each sentence. Target fricatives are furthermore available as individual wav files. The whole dataset can be accessed with the DOI https://doi.org/10.48656/4q9c-gz16. Additionally, the experimental design allows the investigation of other sound categories. The number of speakers recorded gives further possibilities for phonetic-oriented speaker identification studies.

2.
J Acoust Soc Am ; 153(4): 2285, 2023 04 01.
Article in English | MEDLINE | ID: mdl-37092935

ABSTRACT

Acoustic variation is central to the study of speaker characterization. In this respect, specific phonemic classes such as vowels have been particularly studied, compared to fricatives. Fricatives exhibit important aperiodic energy, which can extend over a high-frequency range beyond that conventionally considered in phonetic analyses, often limited up to 12 kHz. We adopt here an extended frequency range up to 20.05 kHz to study a corpus of 15 812 fricatives produced by 59 speakers in Russian, a language offering a rich inventory of fricatives. We extracted two sets of parameters: the first is composed of 11 parameters derived from the frequency spectrum and duration (acoustic set) while the second is composed of 13 mel frequency cepstral coefficients (MFCCs). As a first step, we implemented machine learning methods to evaluate the potential of each set to predict gender and speaker identity. We show that gender can be predicted with a good performance by the acoustic set and even more so by MFCCs (accuracy of 0.72 and 0.88, respectively). MFCCs also predict individuals to some extent (accuracy = 0.64) unlike the acoustic set. In a second step, we provide a detailed analysis of the observed intra- and inter-speaker acoustic variation.


Subject(s)
Phonetics , Speech Acoustics , Humans , Acoustics , Language , Russia
3.
J Acoust Soc Am ; 150(3): 1806, 2021 09.
Article in English | MEDLINE | ID: mdl-34598630

ABSTRACT

This paper shows that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s], and [ʃ] using a small set of acoustic cues. From a data sample of 6320 tokens of read sentences produced by 40 participants, temporal and spectral measurements are extracted from the full sound, the noise duration, and the middle 30 ms windows. Furthermore, 13 mel-frequency cepstral coefficients (MFCCs) are computed from the middle 30 ms window. Classifiers based on single decision trees, random forests, support vector machines, and neural networks are trained and tested to distinguish between these three fricatives. The results demonstrate that, first, the three acoustic cue extraction techniques are similar in terms of classification accuracy (93% and 99%) but that the spectral measurements extracted from the full frication noise duration result in slightly better accuracy. Second, the center of gravity and the spectral spread are sufficient for the classification of [f], [s], and [ʃ] irrespective of contextual and speaker variation. Third, MFCCs show a marginally higher predictive power over spectral cues (<2%). This suggests that both sets of measures provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application.


Subject(s)
Cues , Speech Acoustics , Acoustics , Humans , Russia , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL
...