Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Data Brief ; 42: 108275, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35669006

ABSTRACT

The LaMIT database consists in recordings of 100 Italian sentences. The sentences in the database were designed so to include all phonemes of the Italian language, and also take into account the typical frequency of each phoneme in written Italian. Four native adult speakers of Standard Italian, raised and living in Rome, Italy, two female and two male, pronounced the sentences in two different recording sessions; two repetitions for each sentence per speaker were therefore collected, for a total of 800 recordings. The database was specifically created for application in the LaMIT project, that focuses on the application to the Italian language of the Lexical Access model proposed by Ken Stevens for American English. The model relies on the detection of specific acoustic discontinuities called landmarks and other acoustic cues to features that characterize each phoneme. Each recording was thus processed to generate a set of labeling files that identify both predicted landmarks and other cues, and actual landmarks/cues. The labeling files, compiled according to the labeling syntax used in the Praat speech processing software, are also made available as part of the LAMIT database.

2.
J Acoust Soc Am ; 147(6): EL471, 2020 06.
Article in English | MEDLINE | ID: mdl-32611168

ABSTRACT

This study examines the acoustic realizations of American English intervocalic flaps in the TIMIT corpus, using the landmark-critical feature-cue-based framework. Three different acoustic patterns of flaps are described: (i) both closure and release landmarks present, (ii) only the closure landmark present, and (iii) both landmarks deleted. The patterns occur consistently across several phonological and morphological conditions but vary with sociolinguistic factors, including speaker dialect and gender. This method of analysing speech at the level of acoustic landmarks and other individual cues to distinctive features contributes to a deeper understanding of how speakers and listeners employ systematic variation in phonetic detail in speech processing.


Subject(s)
Cues , Speech Perception , Acoustics , Language , Phonetics , Speech Acoustics , United States
3.
J Acoust Soc Am ; 145(5): EL379, 2019 05.
Article in English | MEDLINE | ID: mdl-31153305

ABSTRACT

Irregular pitch periods (IPPs) are associated with grammatically, pragmatically, and clinically significant types of nonmodal phonation, but are challenging to identify. Automatic detection of IPPs is desirable because accurately hand-identifying IPPs is time-consuming and requires training. The authors evaluated an algorithm developed for creaky voice analysis to automatically identify IPPs in recordings of American English conversational speech. To determine a perceptually relevant threshold probability, frame-by-frame creak probabilities were compared to hand labels, yielding a threshold of approximately 0.02. These results indicate a generally good agreement between hand-labeled IPPs and automatic detection, calling for future work investigating effects of linguistic and prosodic context.


Subject(s)
Phonation/physiology , Pitch Perception/physiology , Speech Perception/physiology , Voice Quality/physiology , Adolescent , Adult , Female , Humans , Sound Spectrography/methods , Speech/physiology , Speech Acoustics , Young Adult
4.
Biomed Res Int ; 2013: 758731, 2013.
Article in English | MEDLINE | ID: mdl-24288686

ABSTRACT

This paper investigates the effectiveness of measures related to vocal tract characteristics in classifying normal and pathological speech. Unlike conventional approaches that mainly focus on features related to the vocal source, vocal tract characteristics are examined to determine if interaction effects between vocal folds and the vocal tract can be used to detect pathological speech. Especially, this paper examines features related to formant frequencies to see if vocal tract characteristics are affected by the nature of the vocal fold-related pathology. To test this hypothesis, stationary fragments of vowel /aa/ produced by 223 normal subjects, 472 vocal fold polyp subjects, and 195 unilateral vocal cord paralysis subjects are analyzed. Based on the acoustic-articulatory relationships, phonation for pathological subjects is found to be associated with measures correlated with a raised tongue body or an advanced tongue root. Vocal tract-related features are also found to be statistically significant from the Kruskal-Wallis test in distinguishing normal and pathological speech. Classification results demonstrate that combining the formant measurements with vocal fold-related features results in improved performance in differentiating vocal pathologies including vocal polyps and unilateral vocal cord paralysis, which suggests that measures related to vocal tract characteristics may provide additional information in diagnosing vocal disorders.


Subject(s)
Phonation , Speech Recognition Software , Vocal Cords/physiopathology , Voice Disorders , Voice , Adult , Female , Humans , Male , Middle Aged , Voice Disorders/diagnosis , Voice Disorders/physiopathology
5.
J Acoust Soc Am ; 133(4): 1862-6, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23556555

ABSTRACT

Voice quality features such as harmonic structure and spectral tilt are investigated in classifying vocalic segments into one of five boundary tones in the tones and break indices system. Static and nonstatic features are examined, and performance is compared with features related to duration, pitch, and amplitude, along with adjacent segment characteristics. From statistical tests, voice quality features are found to be significant for classifying prosodic boundary tones, and especially for distinguishing low-tone boundaries. Classification results using features selected from Kruskal-Wallis tests, Akaike information criterion values, and from sequential forward search show that using voice quality features leads to lower balanced error rates.


Subject(s)
Acoustics , Speech Acoustics , Speech Production Measurement , Voice Quality , Female , Humans , Male , Models, Statistical , Signal Processing, Computer-Assisted , Time Factors
6.
J Acoust Soc Am ; 131(3): EL197-202, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22423808

ABSTRACT

This paper describes acoustic cues for classification of consonant voicing in a distinctive feature-based speech recognition system. Initial acoustic cues are selected by studying consonant production mechanisms. Spectral representations, band-limited energies, and correlation values, along with Mel-frequency cepstral coefficients features (MFCCs) are also examined. Analysis of variance is performed to assess relative significance of features. Overall, 82.2%, 80.6%, and 78.4% classification rates are obtained on the TIMIT database for stops, fricatives, and affricates, respectively. Combining acoustic parameters with MFCCs shows performance improvement in all cases. Also, performance in the NTIMIT telephone channel speech shows that acoustic parameters are more robust than MFCCs.


Subject(s)
Cues , Phonetics , Speech Acoustics , Speech Recognition Software , Telephone , Voice/physiology , Analysis of Variance , Humans
7.
J Acoust Soc Am ; 131(2): 1536-46, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22352523

ABSTRACT

Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs.


Subject(s)
Phonetics , Recognition, Psychology/physiology , Speech Acoustics , Speech Perception/physiology , Telephone , Algorithms , Cues , Humans , Sound Spectrography , Speech Recognition Software
8.
J Acoust Soc Am ; 126(3): EL100-6, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19739699

ABSTRACT

This paper proposes an efficient method to improve speaker recognition performance by dynamically controlling the ratio of phoneme class information. It utilizes the fact that each phoneme contains different amounts of speaker discriminative information that can be measured by mutual information. After classifying phonemes into five classes, the optimal ratio of each class in both training and testing processes is adjusted using a non-linear optimization technique, i.e., the Nelder-Mead method. Speaker identification results verify that the proposed method achieves 18% improvement in terms of error rate compared to a baseline system.


Subject(s)
Models, Theoretical , Pattern Recognition, Automated , Pattern Recognition, Physiological , Phonetics , Speech , Algorithms , Animals , Information Theory , Nonlinear Dynamics
9.
J Acoust Soc Am ; 122(3): EL88, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17927313

ABSTRACT

The perceptual relevance of adopting the temporal envelope to model the frequency band of 4-7 kHz (highband) in wideband speech signal is described in this letter. Based on theoretical work in psychoacoustics, we find out that the temporal envelope can indeed be a perceptual cue for the high-band signal, i.e., a noiseless sound can be obtained if the temporal envelope is roughly preserved. Subjective listening tests verify that transparent quality can be obtained if the model is used for the 4.5-7 kHz band. The proposed model has the benefits of offering flexible scalability and reducing the cost for quantization in coding applications.


Subject(s)
Auditory Perception/physiology , Hearing/physiology , Speech Perception/physiology , Speech/physiology , Humans , Models, Biological , Perceptual Masking , Psychoacoustics , Sound Spectrography , Speech Intelligibility
10.
IEEE Trans Syst Man Cybern B Cybern ; 37(4): 980-92, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17702294

ABSTRACT

To replace compromised biometric templates, cancelable biometrics has recently been introduced. The concept is to transform a biometric signal or feature into a new one for enrollment and matching. For making cancelable fingerprint templates, previous approaches used either the relative position of a minutia to a core point or the absolute position of a minutia in a given fingerprint image. Thus, a query fingerprint is required to be accurately aligned to the enrolled fingerprint in order to obtain identically transformed minutiae. In this paper, we propose a new method for making cancelable fingerprint templates that do not require alignment. For each minutia, a rotation and translation invariant value is computed from the orientation information of neighboring local regions around the minutia. The invariant value is used as the input to two changing functions that output two values for the translational and rotational movements of the original minutia, respectively, in the cancelable template. When a template is compromised, it is replaced by a new one generated by different changing functions. Our approach preserves the original geometric relationships (translation and rotation) between the enrolled and query templates after they are transformed. Therefore, the transformed templates can be used to verify a person without requiring alignment of the input fingerprint images. In our experiments, we evaluated the proposed method in terms of two criteria: performance and changeability. When evaluating the performance, we examined how verification accuracy varied as the transformed templates were used for matching. When evaluating the changeability, we measured the dissimilarities between the original and transformed templates, and between two differently transformed templates, which were obtained from the same original fingerprint. The experimental results show that the two criteria mutually affect each other and can be controlled by varying the control parameters of the changing functions.


Subject(s)
Algorithms , Artificial Intelligence , Dermatoglyphics/classification , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Humans , Subtraction Technique
11.
J Acoust Soc Am ; 118(4): 2579-87, 2005 Oct.
Article in English | MEDLINE | ID: mdl-16266178

ABSTRACT

Acoustic cues related to the voice source, including harmonic structure and spectral tilt, were examined for relevance to prosodic boundary detection. The measurements considered here comprise five categories: duration, pitch, harmonic structure, spectral tilt, and amplitude. Distributions of the measurements and statistical analysis show that the measurements may be used to differentiate between prosodic categories. Detection experiments on the Boston University Radio Speech Corpus show equal error detection rates around 70% for accent and boundary detection, using only the acoustic measurements described, without any lexical or syntactic information. Further investigation of the detection results shows that duration and amplitude measurements, and, to a lesser degree, pitch measurements, are useful for detecting accents, while all voice source measurements except pitch measurements are useful for boundary detection.


Subject(s)
Cues , Speech Acoustics , Speech/physiology , Analysis of Variance , Humans , Speech Production Measurement
SELECTION OF CITATIONS
SEARCH DETAIL
...