Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
J Voice ; 2023 Feb 09.
Article in English | MEDLINE | ID: mdl-36774263

ABSTRACT

Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional mappings of the data reveal that neural embeddings capture information about the change in vocal characteristics of a speaker during prolonged voice usage. We show that vocal fatigue can be reliably predicted using all three types of neural embeddings after 40 minutes of continuous speaking when temporal smoothing and normalization are applied to the extracted embeddings. We employ support vector machines for classification and achieve accuracy scores of 81% using x-vectors, 85% using ECAPA-TDNN embeddings, and 82% using wav2vec 2.0 embeddings as input features. We obtain an accuracy score of 76%, when the trained system is applied to a different speaker and recording environment without any adaptation.

3.
ORL J Otorhinolaryngol Relat Spec ; 79(5): 282-294, 2017.
Article in English | MEDLINE | ID: mdl-29131113

ABSTRACT

PURPOSE: To assess whether postlingual onset and shorter duration of deafness before cochlear implant (CI) provision predict higher speech intelligibility results of CI users. METHODS: For an objective judgement of speech intelligibility, we used an automatic speech recognition system computing the word recognition rate (WR) of 50 adult CI users and 50 age-matched control individuals. All subjects were recorded reading a standardized text. Subjects were divided into three groups: pre- or perilingual deafness (A), both >2 years before implantation, postlingual deafness <2 years before implantation (B), or postlingual deafness >2 years before implantation (C). RESULTS: CI users with short duration of postlingual deafness (B) had a significantly higher WR (median 74%) than CI users with long duration of postlingual deafness (C; 68%, p < 0.001) or pre-/perilingual onset (A; 56%, p < 0.001). Compared to their control groups only CI users with short duration of postlingual deafness reached similar WR, others showed significantly lower WR. Other factors such as hearing loss onset, duration of CI use, or duration of amplified hearing showed no consistent influence on speech quality. CONCLUSIONS: The speech production quality of adult CI users shows dependencies on the onset and duration of deafness. These features need to be considered while planning rehabilitation.


Subject(s)
Cochlear Implantation/methods , Hearing Loss/therapy , Speech Intelligibility/physiology , Speech Perception/physiology , Adolescent , Adult , Aged , Aged, 80 and over , Cochlear Implants , Female , Hearing Loss/physiopathology , Humans , Male , Middle Aged , Speech , Speech Production Measurement/methods , Time Factors , Young Adult
4.
Int J Prosthodont ; 27(1): 61-9, 2014.
Article in English | MEDLINE | ID: mdl-24392479

ABSTRACT

PURPOSE: Tooth loss and its prosthetic rehabilitation significantly affect speech intelligibility. However, little is known about the influence of speech deficiencies on oral health-related quality of life (OHRQoL). The aim of this study was to investigate whether speech intelligibility enhancement through prosthetic rehabilitation significantly influences OHRQoL in patients wearing complete maxillary dentures. Speech intelligibility by means of an automatic speech recognition system (ASR) was prospectively evaluated and compared with subjectively assessed Oral Health Impact Profile (OHIP) scores. MATERIALS AND METHODS: Speech was recorded in 28 edentulous patients 1 week prior to the fabrication of new complete maxillary dentures and 6 months thereafter. Speech intelligibility was computed based on the word accuracy (WA) by means of an ASR and compared with a matched control group. One week before and 6 months after rehabilitation, patients assessed themselves for OHRQoL. RESULTS: Speech intelligibility improved significantly after 6 months. Subjects reported a significantly higher OHRQoL after maxillary rehabilitation with complete dentures. No significant correlation was found between the OHIP sum score or its subscales to the WA. CONCLUSION: Speech intelligibility enhancement achieved through the fabrication of new complete maxillary dentures might not be in the forefront of the patients' perception of their quality of life. For the improvement of OHRQoL in patients wearing complete maxillary dentures, food intake and mastication as well as freedom from pain play a more prominent role.


Subject(s)
Denture, Complete, Upper , Oral Health , Quality of Life , Speech Intelligibility/physiology , Adult , Aged , Aged, 80 and over , Case-Control Studies , Denture Bases , Denture Design , Denture Retention , Denture, Complete, Upper/psychology , Eating/physiology , Female , Follow-Up Studies , Humans , Interpersonal Relations , Irritable Mood , Jaw, Edentulous/rehabilitation , Male , Middle Aged , Polymethyl Methacrylate/chemistry , Prospective Studies , Speech Recognition Software
5.
Int J Prosthodont ; 25(1): 24-32, 2012.
Article in English | MEDLINE | ID: mdl-22259792

ABSTRACT

PURPOSE: A completely edentulous or partially edentulous maxilla involving missing anterior teeth may impact speech production and lead to reduced speech intelligibility. The aim of this study was to prospectively evaluate the effect of a dental prosthetic rehabilitation on speech intelligibility in patients with a toothless or interrupted maxillary arch by means of an automatic, standardized speech recognition system. MATERIALS AND METHODS: The speech intelligibility of 45 patients with complete tooth loss or a loss including missing anterior teeth in the maxilla was evaluated by means of a polyphone-based automatic speech recognition system that assessed the percentage of correctly recognized words (word accuracy). To replace inadequate maxillary removable dentures, 20 patients from the overall sample had been rehabilitated with complete dentures and 25 patients with telescopic prostheses. Speech recordings were made in four recording sessions (with and without existing prostheses and then at 1 week and 6 months after placement of newly fabricated prostheses). RESULTS: Significantly higher speech intelligibility was observed in both patient groups compared to the original results without the dentures inserted. After 6 months of adaptation, both groups had reached a level of speech quality that was comparable to the healthy control group. However, patients receiving new telescopic prostheses showed significantly higher levels of speech intelligibility compared to those receiving new complete dentures. Within 6 months, speech intelligibility did not significantly improve from the level found 1 week after insertion of new prostheses for both groups. CONCLUSION: Patients benefit from the fabrication of new dentures in terms of speech intelligibility, regardless of the type of prosthesis. However, telescopic crown prostheses yield significantly better speech quality compared to complete dentures.


Subject(s)
Denture, Complete, Upper , Denture, Overlay , Denture, Partial , Jaw, Edentulous, Partially/rehabilitation , Jaw, Edentulous/rehabilitation , Maxilla/pathology , Speech Intelligibility/physiology , Adaptation, Physiological/physiology , Adult , Aged , Aged, 80 and over , Chromium Alloys/chemistry , Dental Materials/chemistry , Denture Design , Denture Precision Attachment , Denture Retention , Female , Follow-Up Studies , Humans , Jaw, Edentulous, Partially/classification , Male , Middle Aged , Polymethyl Methacrylate/chemistry , Prospective Studies , Speech Production Measurement/methods , Speech Recognition Software , Treatment Outcome
7.
J Voice ; 26(3): 390-7, 2012 May.
Article in English | MEDLINE | ID: mdl-21820272

ABSTRACT

OBJECTIVE: One aspect of voice and speech evaluation after laryngeal cancer is acoustic analysis. Perceptual evaluation by expert raters is a standard in the clinical environment for global criteria such as overall quality or intelligibility. So far, automatic approaches evaluate acoustic properties of pathologic voices based on voiced/unvoiced distinction and fundamental frequency analysis of sustained vowels. Because of the high amount of noisy components and the increasing aperiodicity of highly pathologic voices, a fully automatic analysis of fundamental frequency is difficult. We introduce a purely data-driven system for the acoustic analysis of pathologic voices based on recordings of a standard text. METHODS: Short-time segments of the speech signal are analyzed in the spectral domain, and speaker models based on this information are built. These speaker models act as a clustered representation of the acoustic properties of a person's voice and are thus characteristic for speakers with different kinds and degrees of pathologic conditions. The system is evaluated on two different data sets with speakers reading standardized texts. One data set contains 77 speakers after laryngeal cancer treated with partial removal of the larynx. The other data set contains 54 totally laryngectomized patients, equipped with a Provox shunt valve. Each speaker was rated by five expert listeners regarding three different criteria: strain, voice quality, and speech intelligibility. RESULTS/CONCLUSION: We show correlations for each data set with r and ρ≥0.8 between the automatic system and the mean value of the five raters. The interrater correlation of one rater to the mean value of the remaining raters is in the same range. We thus assume that for selected evaluation criteria, the system can serve as a validated objective support for acoustic voice and speech analysis.


Subject(s)
Laryngeal Neoplasms/surgery , Laryngectomy , Models, Statistical , Speech Acoustics , Speech Intelligibility , Speech Production Measurement/methods , Voice Disorders/surgery , Voice Quality , Adult , Aged , Aged, 80 and over , Automation , Germany , Humans , Laryngeal Neoplasms/complications , Laryngeal Neoplasms/physiopathology , Laryngectomy/adverse effects , Larynx, Artificial , Middle Aged , Observer Variation , Predictive Value of Tests , Reading , Regression Analysis , Reproducibility of Results , Signal Processing, Computer-Assisted , Speech, Alaryngeal/instrumentation , Time Factors , Treatment Outcome , Voice Disorders/diagnosis , Voice Disorders/etiology , Voice Disorders/physiopathology
8.
J Oral Maxillofac Surg ; 69(5): 1493-500, 2011 May.
Article in English | MEDLINE | ID: mdl-21216061

ABSTRACT

PURPOSE: Treatment of oral carcinomas often causes reduced speech intelligibility. It was the aim of this study to objectively evaluate the speech intelligibility of patients after multimodal therapy for oral squamous cell carcinoma (OSCC) with a computer-based, automatic speech recognition system. MATERIALS AND METHODS: The speech intelligibility of 59 patients after multimodal tumor treatment for OSCC, located at the lateral tongue, floor of the mouth, or the alveolar crest of the lower jaw, was objectively analyzed by a computer-based speech recognition system that calculates the percentage of correct word recognition (WR). RESULTS: The patients' WR was significantly reduced compared with a healthy control group without speech impairment (P ≤ .001). Higher T-classification was associated with a reduced WR (P < .01). Tumors located at the tongue showed a significantly higher WR than tumors at the floor of the mouth or the alveolar crest (P ≤ .001). Surgical resection and reconstruction of the lower jaw bone significantly reduced the WR (P ≤ .001) compared with cases without osseous tumor infiltration. CONCLUSIONS: Speech intelligibility after treatment for OSCC, objectively quantified by a standardized automatic speech recognition system, is reduced for increasing tumor size, increasing resection volume, and tumor localization near the lower jaw. Surgical reconstruction techniques seem to have an impact on speech intelligibility.


Subject(s)
Carcinoma, Squamous Cell/surgery , Mouth Neoplasms/surgery , Speech Intelligibility/physiology , Speech Recognition Software , Adolescent , Adult , Aged , Aged, 80 and over , Alveolectomy/methods , Cohort Studies , Cross-Sectional Studies , Female , Humans , Male , Mandible/surgery , Mandibular Neoplasms/surgery , Middle Aged , Mouth Floor/surgery , Neck Dissection , Neoadjuvant Therapy , Neoplasm Staging , Radiotherapy, Adjuvant , Plastic Surgery Procedures , Speech Therapy , Surgical Flaps , Tongue Neoplasms/surgery , Young Adult
9.
J Acoust Soc Am ; 126(5): 2589-602, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19894838

ABSTRACT

Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical and nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time and manpower. Hence, there is a need for an easy to apply and reliable automatic method. To create a reference for an automatic system, speech data of 58 children with CLP were assessed perceptually by experienced speech therapists for characteristic phonetic disorders at the phoneme level. The first part of the article aims to detect such characteristics by a semiautomatic procedure and the second to evaluate a fully automatic, thus simple, procedure. The methods are based on a combination of speech processing algorithms. The semiautomatic method achieves moderate to good agreement (kappa approximately 0.6) for the detection of all phonetic disorders. On a speaker level, significant correlations between the perceptual evaluation and the automatic system of 0.89 are obtained. The fully automatic system yields a correlation on the speaker level of 0.81 to the perceptual evaluation. This correlation is in the range of the inter-rater correlation of the listeners. The automatic speech evaluation is able to detect phonetic disorders at an experts'level without any additional human postprocessing.


Subject(s)
Articulation Disorders/diagnosis , Articulation Disorders/etiology , Cleft Lip/complications , Cleft Palate/complications , Models, Biological , Algorithms , Child , Humans , Phonation , Phonetics , Psycholinguistics , Speech Therapy
10.
Folia Phoniatr Logop ; 61(2): 112-6, 2009.
Article in English | MEDLINE | ID: mdl-19321983

ABSTRACT

OBJECTIVE: The Hoarseness Diagram, a program for voice quality analysis used in German-speaking countries, was compared with an automatic speech recognition system with a module for prosodic analysis. The latter computed prosodic features on the basis of a text recording. We examined whether voice analysis of sustained vowels and text analysis correlate in tracheoesophageal speakers. PATIENTS AND METHODS: Test speakers were 24 male laryngectomees with tracheoesophageal substitute speech, age 60.6 +/- 8.9 years. Each person read the German version of the text 'The North Wind and the Sun'. Additionally, five sustained vowels were recorded from each patient. The fundamental frequency (F(0)) detected by both programs was compared for all vowels. The correlation between the measures obtained by the Hoarseness Diagram and the features from the prosody module was computed. RESULTS: Both programs have problems in determining the F(0) of highly pathologic voices. Parameters like jitter, shimmer, F(0), and irregularity as computed by the Hoarseness Diagram from vowels show correlations of about -0.8 with prosodic features obtained from the text recordings. CONCLUSION: Voice properties can reliably be evaluated both on the basis of vowel and text recordings. Text analysis, however, also offers possibilities for the automatic evaluation of running speech since it realistically represents everyday speech.


Subject(s)
Phonetics , Speech Recognition Software , Speech, Alaryngeal/psychology , Hoarseness/diagnosis , Humans , Male , Middle Aged , Reading , Speech Acoustics , Voice Disorders/diagnosis , Voice Quality
SELECTION OF CITATIONS
SEARCH DETAIL
...