Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
J Speech Lang Hear Res ; 61(1): 1-24, 2018 01 22.
Article in English | MEDLINE | ID: mdl-29222538

ABSTRACT

Purpose: The aim of the study was to address the reported inconsistencies in the relationship between objective acoustic measures and perceptual ratings of vocal quality. Method: This tutorial moves away from the more widely examined problems related to obtaining the perceptual ratings and the acoustic measures and centers in less scrutinized issues regarding the procedure to establish the correspondence. Expressions for the most common measure of association between perceptual and acoustic measures (Pearson's r) are derived using a multiple linear regression model. The particular case where the multiple linear regression involves only roughness and breathiness is discussed to illustrate the issues. Results: Most problems reported regarding inconsistent findings in the relationship between given acoustic measures and particular perceptual ratings could be linked to sample properties not directly related to the actual relationship. The influential sample properties are the collinearity between the regressors in the multiple linear regression and their relative variances. Recommendations on how to rule out this possible cause of inconsistency are given, varying in scope from data collection, reporting, manipulation, and results interpretation. Conclusions: The problems described can be extended to more general cases than the exemplified roughness and breathiness sample's coverage. Ruling out this possible cause of inconsistency would increase the validity of the results reported.


Subject(s)
Speech Production Measurement/methods , Voice Quality , Auditory Perception , Humans , Linear Models , Speech Acoustics , Voice Disorders/diagnosis
2.
Logoped Phoniatr Vocol ; 41(3): 106-16, 2016 Oct.
Article in English | MEDLINE | ID: mdl-26016644

ABSTRACT

Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.


Subject(s)
Hoarseness/diagnosis , Pattern Recognition, Automated , Signal Processing, Computer-Assisted , Speech Acoustics , Speech Production Measurement/methods , Voice Quality , Acoustics , Adolescent , Adult , Aged , Aged, 80 and over , Chronic Disease , Female , Hoarseness/physiopathology , Humans , Male , Middle Aged , Predictive Value of Tests , Reading , Regression Analysis , Reproducibility of Results , Support Vector Machine , Young Adult
3.
Comput Math Methods Med ; 2015: 316325, 2015.
Article in English | MEDLINE | ID: mdl-26136813

ABSTRACT

Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text "The North Wind and the Sun" were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.


Subject(s)
Hoarseness/diagnosis , Signal Processing, Computer-Assisted , Sound Spectrography/methods , Speech , Voice Disorders/diagnosis , Voice Quality , Adolescent , Adult , Aged , Aged, 80 and over , Child , Female , Humans , Male , Middle Aged , Regression Analysis , Reproducibility of Results , Software , Speech Perception , Speech Therapy , Young Adult
4.
Folia Phoniatr Logop ; 66(6): 219-26, 2014.
Article in English | MEDLINE | ID: mdl-25659422

ABSTRACT

OBJECTIVE: Automatic intelligibility assessment using automatic speech recognition is usually language specific. In this study, a language-independent approach is proposed. It uses models that are trained with Flemish speech, and it is applied to assess chronically hoarse German speakers. The research questions are here: is it possible to construct suitable acoustic features that generalize to other languages and a speech disorder, and is the generated model for intelligibility also suitable for specific subtypes of that disorder, i.e. functional and organic dysphonia? PATIENTS AND METHODS: 73 German-speaking persons with chronic hoarseness read the text 'Der Nordwind und die Sonne'. Perceptual intelligibility scores were used as ground truth during the training of an automatic model that converts speaker level acoustic measurements into intelligibility scores. Cross-validation is used to assess model performance. RESULTS: The interrater agreement for all patients (n = 73) and for the functional and organic dysphonia subgroups (n = 45 and n = 24) are r = 0.82, r = 0.83 and r = 0.75, respectively. The automatic assessment based on phonologically based acoustic models revealed correlations between perceptual and automatic intelligibility ratings of r = 0.79 (all patients), r = 0.78 (functional dysphonia) and r = 0.80 (organic dysphonia). CONCLUSION: The automatic, objective measurement of intelligibility is a valuable instrument in an evidence-based clinical practice.


Subject(s)
Hoarseness/diagnosis , Hoarseness/psychology , Language , Speech Intelligibility , Speech Recognition Software , Adult , Aged , Aged, 80 and over , Chronic Disease , Dysphonia/diagnosis , Female , Hoarseness/etiology , Humans , Male , Middle Aged , Phonetics , Speech Acoustics , Young Adult
5.
J Voice ; 26(4): 416-24, 2012 Jul.
Article in English | MEDLINE | ID: mdl-21940144

ABSTRACT

OBJECTIVES/HYPOTHESIS: Automatic voice evaluation is usually performed on stable sections of sustained vowels, which often cannot capture hoarseness properly. The measures cepstral peak prominence (CPP) and smoothed CPP (CPPS) do not require exact determination of the cycles of fundamental frequency like established perturbation-based measures. They can also be applied to text recordings. In this study, they were compared with perceptual evaluation of voice quality and the German roughness-breathiness-hoarseness (RBH) scheme. STUDY DESIGN: Retrospective data analysis. METHODS: Seventy-three hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text "The North Wind and the Sun". The text recordings were evaluated perceptually by five speech therapists and physicians according to the RBH scale. The criterion "overall quality" was measured on a 4-point scale and a visual analog scale. For the human-machine correlation, the automatic measures of the Praat program (vowels only) and the "cpps" software were compared with the experts' ratings. The experiments were repeated for speakers with jitter ≤5% or shimmer ≤5% (n=47). RESULTS: For the entire group (n=73), the best human-machine results for most of the rating criteria were obtained for text-based CPP and CPPS (up to |ρ|=0.73). For the 47 selected speakers, the correlation was remarkably worse for all measures but still best for text-based CPP and CPPS (|ρ|≤0.50). CONCLUSIONS: Cepstrum analysis should be performed on a text recording. Then, it outperforms all perturbation-based measures, and it can be a meaningful objective support for perceptual analysis.


Subject(s)
Hoarseness , Speech Acoustics , Adult , Aged , Aged, 80 and over , Chronic Disease , Female , Humans , Male , Middle Aged , Retrospective Studies , Speech Perception , Young Adult
6.
J Voice ; 26(3): 390-7, 2012 May.
Article in English | MEDLINE | ID: mdl-21820272

ABSTRACT

OBJECTIVE: One aspect of voice and speech evaluation after laryngeal cancer is acoustic analysis. Perceptual evaluation by expert raters is a standard in the clinical environment for global criteria such as overall quality or intelligibility. So far, automatic approaches evaluate acoustic properties of pathologic voices based on voiced/unvoiced distinction and fundamental frequency analysis of sustained vowels. Because of the high amount of noisy components and the increasing aperiodicity of highly pathologic voices, a fully automatic analysis of fundamental frequency is difficult. We introduce a purely data-driven system for the acoustic analysis of pathologic voices based on recordings of a standard text. METHODS: Short-time segments of the speech signal are analyzed in the spectral domain, and speaker models based on this information are built. These speaker models act as a clustered representation of the acoustic properties of a person's voice and are thus characteristic for speakers with different kinds and degrees of pathologic conditions. The system is evaluated on two different data sets with speakers reading standardized texts. One data set contains 77 speakers after laryngeal cancer treated with partial removal of the larynx. The other data set contains 54 totally laryngectomized patients, equipped with a Provox shunt valve. Each speaker was rated by five expert listeners regarding three different criteria: strain, voice quality, and speech intelligibility. RESULTS/CONCLUSION: We show correlations for each data set with r and ρ≥0.8 between the automatic system and the mean value of the five raters. The interrater correlation of one rater to the mean value of the remaining raters is in the same range. We thus assume that for selected evaluation criteria, the system can serve as a validated objective support for acoustic voice and speech analysis.


Subject(s)
Laryngeal Neoplasms/surgery , Laryngectomy , Models, Statistical , Speech Acoustics , Speech Intelligibility , Speech Production Measurement/methods , Voice Disorders/surgery , Voice Quality , Adult , Aged , Aged, 80 and over , Automation , Germany , Humans , Laryngeal Neoplasms/complications , Laryngeal Neoplasms/physiopathology , Laryngectomy/adverse effects , Larynx, Artificial , Middle Aged , Observer Variation , Predictive Value of Tests , Reading , Regression Analysis , Reproducibility of Results , Signal Processing, Computer-Assisted , Speech, Alaryngeal/instrumentation , Time Factors , Treatment Outcome , Voice Disorders/diagnosis , Voice Disorders/etiology , Voice Disorders/physiopathology
7.
Logoped Phoniatr Vocol ; 36(4): 175-81, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21875389

ABSTRACT

Objective assessment of intelligibility on the telephone is desirable for voice and speech assessment and rehabilitation. A total of 82 patients after partial laryngectomy read a standardized text which was synchronously recorded by a headset and via telephone. Five experienced raters assessed intelligibility perceptually on a five-point scale. Objective evaluation was performed by support vector regression on the word accuracy (WA) and word correctness (WR) of a speech recognition system, and a set of prosodic features. WA and WR alone exhibited correlations to human evaluation between |r| = 0.57 and |r| = 0.75. The correlation was r = 0.79 for headset and r = 0.86 for telephone recordings when prosodic features and WR were combined. The best feature subset was optimal for both signal qualities. It consists of WR, the average duration of the silent pauses before a word, the standard deviation of the fundamental frequency on the entire sample, the standard deviation of jitter, and the ratio of the durations of the voiced sections and the entire recording.


Subject(s)
Laryngectomy/adverse effects , Signal Processing, Computer-Assisted , Speech Acoustics , Speech Intelligibility , Speech Perception , Speech Recognition Software , Telephone , Voice Quality , Adult , Aged , Aged, 80 and over , Automation , Cluster Analysis , Female , Germany , Humans , Male , Markov Chains , Middle Aged , Sound Spectrography , Speech Production Measurement , Speech, Alaryngeal , Support Vector Machine , Time Factors
8.
Folia Phoniatr Logop ; 63(3): 122-8, 2011.
Article in English | MEDLINE | ID: mdl-20938191

ABSTRACT

Treatment of small carcinoma of the larynx may lead to voice handicap and restricted quality of life. The relationship between the two is revealed. Sixty-five patients aged 62.1 ± 10.0 years rated their voice handicap and quality of life after treatment of T1 (n = 35) or T2 (n = 30) laryngeal carcinoma during regular out-patient examinations. For the self-assessment of the voice, the Voice Handicap Index (VHI) and the disease-independent Short Form-36 Health Survery (SF-36) questionnaires were used. Voice handicap (total score 38.9 ± 26.0) did not differ in the two tested groups, T1 and T2, and the data of SF-36 (physical score 43.0 ± 10.7; mental score 50.2 ± 9.1) showed significant differences for the mental score. Patients rated their voice handicap worse than healthy persons did after treatment of laryngeal carcinoma. VHI and SF-36 data were strongly correlated. Voice handicap is significantly related to the quality of life, especially affecting the mental domain. Thus, the rehabilitation of voice disorders should have a beneficial impact on quality of life.


Subject(s)
Carcinoma/therapy , Laryngeal Neoplasms/therapy , Postoperative Complications/etiology , Quality of Life , Voice Disorders/etiology , Adult , Aged , Aged, 80 and over , Carcinoma/drug therapy , Carcinoma/pathology , Carcinoma/radiotherapy , Carcinoma/surgery , Chemotherapy, Adjuvant/adverse effects , Diagnostic Self Evaluation , Emotions , Female , Humans , Laryngeal Neoplasms/drug therapy , Laryngeal Neoplasms/pathology , Laryngeal Neoplasms/radiotherapy , Laryngeal Neoplasms/surgery , Male , Middle Aged , Postoperative Complications/psychology , Radiotherapy, Adjuvant/adverse effects , Severity of Illness Index , Surveys and Questionnaires , Tumor Burden , Voice Disorders/psychology
9.
Folia Phoniatr Logop ; 61(2): 112-6, 2009.
Article in English | MEDLINE | ID: mdl-19321983

ABSTRACT

OBJECTIVE: The Hoarseness Diagram, a program for voice quality analysis used in German-speaking countries, was compared with an automatic speech recognition system with a module for prosodic analysis. The latter computed prosodic features on the basis of a text recording. We examined whether voice analysis of sustained vowels and text analysis correlate in tracheoesophageal speakers. PATIENTS AND METHODS: Test speakers were 24 male laryngectomees with tracheoesophageal substitute speech, age 60.6 +/- 8.9 years. Each person read the German version of the text 'The North Wind and the Sun'. Additionally, five sustained vowels were recorded from each patient. The fundamental frequency (F(0)) detected by both programs was compared for all vowels. The correlation between the measures obtained by the Hoarseness Diagram and the features from the prosody module was computed. RESULTS: Both programs have problems in determining the F(0) of highly pathologic voices. Parameters like jitter, shimmer, F(0), and irregularity as computed by the Hoarseness Diagram from vowels show correlations of about -0.8 with prosodic features obtained from the text recordings. CONCLUSION: Voice properties can reliably be evaluated both on the basis of vowel and text recordings. Text analysis, however, also offers possibilities for the automatic evaluation of running speech since it realistically represents everyday speech.


Subject(s)
Phonetics , Speech Recognition Software , Speech, Alaryngeal/psychology , Hoarseness/diagnosis , Humans , Male , Middle Aged , Reading , Speech Acoustics , Voice Disorders/diagnosis , Voice Quality
10.
Folia Phoniatr Logop ; 61(1): 12-7, 2009.
Article in English | MEDLINE | ID: mdl-19122460

ABSTRACT

OBJECTIVE: Tracheoesophageal voice is state-of-the-art in voice rehabilitation after laryngectomy. Intelligibility on a telephone is an important evaluation criterion as it is a crucial part of social life. An objective measure of intelligibility when talking on a telephone is desirable in the field of postlaryngectomy speech therapy and its evaluation. PATIENTS AND METHODS: Based upon successful earlier studies with broadband speech, an automatic speech recognition (ASR) system was applied to 41 recordings of postlaryngectomy patients. Recordings were available in different signal qualities; quality was the crucial criterion for this study. RESULTS: Compared to the intelligibility rating of 5 human experts, the ASR system had a correlation coefficient of r = -0.87 and Krippendorff's alpha of 0.65 when broadband speech was processed. The rater group alone achieved alpha = 0.66. With the test recordings in telephone quality, the system reached r = -0.79 and alpha = 0.67. CONCLUSION: For medical purposes, a comprehensive diagnostic approach to (substitute) voice has to cover both subjective and objective tests. An automatic recognition system such as the one proposed in this study can be used for objective intelligibility rating with results comparable to those of human experts. This holds for broadband speech as well as for automatic evaluation via telephone.


Subject(s)
Speech Production Measurement/methods , Speech Recognition Software , Speech, Alaryngeal , Aged , Electronic Data Processing/methods , Female , Humans , Male , Middle Aged , Reproducibility of Results , Speech Intelligibility , Telephone
11.
Eur Arch Otorhinolaryngol ; 264(11): 1315-21, 2007 Nov.
Article in English | MEDLINE | ID: mdl-17571273

ABSTRACT

In comparison with laryngeal voice, substitute voice after laryngectomy is characterized by restricted aero-acoustic properties. Until now, an objective means of prosodic differences between substitute and normal voices does not exist. In a pilot study, we applied an automatic prosody analysis module to 18 speech samples of laryngectomees (age: 64.2 +/- 8.3 years) and 18 recordings of normal speakers of the same age (65.4 +/- 7.6 years). Ninety-five different features per word based upon the speech energy, fundamental frequency F(0) and duration measures on words, pauses and voiced/voiceless sections were measured. These reflect aspects of loudness, pitch and articulation rate. Subjective evaluation of the 18 patients' voices was performed by a panel of five experts on the criteria "noise", "speech effort", "roughness", "intelligibility", "match of breath and sense units" and "overall quality". These ratings were compared to the automatically computed features. Several of them could be identified being twice as high for the laryngectomees compared to the normal speakers, and vice versa. Comparing the evaluation data of the human experts and the automatic rating, correlation coefficients of up to 0.84 were measured. The automatic analysis serves as a good means to objectify and quantify the global speech outcome of laryngectomees. Even better results are expected when both the computation of the features and the comparison method to the human ratings will have been revised and adapted to the special properties of the substitute voices.


Subject(s)
Electronic Data Processing , Speech, Alaryngeal , Speech, Esophageal , Tracheoesophageal Fistula , Voice Quality , Aged , Humans , Laryngectomy , Male , Middle Aged
12.
Int J Pediatr Otorhinolaryngol ; 70(10): 1741-7, 2006 Oct.
Article in English | MEDLINE | ID: mdl-16814875

ABSTRACT

OBJECTIVE: Cleft lip and palate (CLP) may cause functional limitations even after adequate surgical and non-surgical treatment, speech disorders being one of them. Interindividually, they vary a lot, showing typical articulation specifics such as nasal emission and shift of articulation and therefore a diminished intelligibility. Until now, an objective means to determine and quantify the intelligibility does not exist. METHOD: An automatic speech recognition system, a new method, was applied on recordings of a standard test to evaluate articulation disorders (psycholinguistic analysis of speech disorders of children PLAKSS) of 31 children at the age of 10.1+/-3.8 years. Two had an isolated cleft lip, 20 a unilateral cleft lip and palate, 4 a bilateral cleft lip and palate, and 5 an isolated cleft palate. The speech recognition system was trained with adults and children without speech disorders and adapted to the speech of children with CLP. In this study, the automatic speech evaluation focussed on the word accuracy which represents the percentage of correctly recognized words. Results were confronted to a perceptive evaluation of intelligibility that was performed by a panel of three experts. RESULTS: The automatic speech recognition yielded word accuracies between 1.2 and 75.8% (mean 48.0+/-19.6%). The word accuracy was lowest for children with isolated cleft palate (36.9+/-23.3) and highest for children with isolated cleft lip (72.8+/-2.9). For children with unilateral cleft lip and palate it was 48.0+/-18.6 and for children with bilateral cleft lip and palate 49.3+/-9.4. The automatic evaluation complied with the experts' subjective evaluation of intelligibility (p<0.01). The multi-rater kappa of the experts alone differed only slightly from the multi-rater kappa of experts and recognizer. CONCLUSION: Automatic speech recognition may serve as a good means to objectify and quantify global speech outcome of children with cleft lip and palate.


Subject(s)
Cleft Lip/physiopathology , Cleft Palate/physiopathology , Speech Disorders/diagnosis , Speech Intelligibility , Adolescent , Audiometry, Speech , Child , Child, Preschool , Cleft Lip/complications , Cleft Palate/complications , Female , Humans , Male , Pilot Projects , Regression Analysis , Speech Disorders/etiology
13.
Eur Arch Otorhinolaryngol ; 263(2): 188-93, 2006 Feb.
Article in English | MEDLINE | ID: mdl-16001246

ABSTRACT

Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7+/-12.1%) with sufficient discrimination. It complied with experts' subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages.


Subject(s)
Laryngeal Neoplasms/surgery , Laryngectomy/rehabilitation , Larynx, Artificial , Speech Intelligibility , Speech Recognition Software , Voice Quality , Humans , Male , Middle Aged
SELECTION OF CITATIONS
SEARCH DETAIL
...