Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Speech Lang Hear Res ; 64(6S): 2134-2153, 2021 06 18.
Artigo em Inglês | MEDLINE | ID: mdl-33979177

RESUMO

Purpose This study aimed to evaluate a novel communication system designed to translate surface electromyographic (sEMG) signals from articulatory muscles into speech using a personalized, digital voice. The system was evaluated for word recognition, prosodic classification, and listener perception of synthesized speech. Method sEMG signals were recorded from the face and neck as speakers with (n = 4) and without (n = 4) laryngectomy subvocally recited (silently mouthed) a speech corpus comprising 750 phrases (150 phrases with variable phrase-level stress). Corpus tokens were then translated into speech via personalized voice synthesis (n = 8 synthetic voices) and compared against phrases produced by each speaker when using their typical mode of communication (n = 4 natural voices, n = 4 electrolaryngeal [EL] voices). Naïve listeners (n = 12) evaluated synthetic, natural, and EL speech for acceptability and intelligibility in a visual sort-and-rate task, as well as phrasal stress discriminability via a classification mechanism. Results Recorded sEMG signals were processed to translate sEMG muscle activity into lexical content and categorize variations in phrase-level stress, achieving a mean accuracy of 96.3% (SD = 3.10%) and 91.2% (SD = 4.46%), respectively. Synthetic speech was significantly higher in acceptability and intelligibility than EL speech, also leading to greater phrasal stress classification accuracy, whereas natural speech was rated as the most acceptable and intelligible, with the greatest phrasal stress classification accuracy. Conclusion This proof-of-concept study establishes the feasibility of using subvocal sEMG-based alternative communication not only for lexical recognition but also for prosodic communication in healthy individuals, as well as those living with vocal impairments and residual articulatory function. Supplemental Material https://doi.org/10.23641/asha.14558481.


Assuntos
Percepção da Fala , Voz , Eletromiografia , Humanos , Laringectomia , Fala , Inteligibilidade da Fala
2.
J Neural Eng ; 15(4): 046031, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29855428

RESUMO

OBJECTIVE: Speech is among the most natural forms of human communication, thereby offering an attractive modality for human-machine interaction through automatic speech recognition (ASR). However, the limitations of ASR-including degradation in the presence of ambient noise, limited privacy and poor accessibility for those with significant speech disorders-have motivated the need for alternative non-acoustic modalities of subvocal or silent speech recognition (SSR). APPROACH: We have developed a new system of face- and neck-worn sensors and signal processing algorithms that are capable of recognizing silently mouthed words and phrases entirely from the surface electromyographic (sEMG) signals recorded from muscles of the face and neck that are involved in the production of speech. The algorithms were strategically developed by evolving speech recognition models: first for recognizing isolated words by extracting speech-related features from sEMG signals, then for recognizing sequences of words from patterns of sEMG signals using grammar models, and finally for recognizing a vocabulary of previously untrained words using phoneme-based models. The final recognition algorithms were integrated with specially designed multi-point, miniaturized sensors that can be arranged in flexible geometries to record high-fidelity sEMG signal measurements from small articulator muscles of the face and neck. MAIN RESULTS: We tested the system of sensors and algorithms during a series of subvocal speech experiments involving more than 1200 phrases generated from a 2200-word vocabulary and achieved an 8.9%-word error rate (91.1% recognition rate), far surpassing previous attempts in the field. SIGNIFICANCE: These results demonstrate the viability of our system as an alternative modality of communication for a multitude of applications including: persons with speech impairments following a laryngectomy; military personnel requiring hands-free covert communication; or the consumer in need of privacy while speaking on a mobile phone in public.


Assuntos
Algoritmos , Eletromiografia/métodos , Eletromiografia/tendências , Percepção da Fala/fisiologia , Interface para o Reconhecimento da Fala/tendências , Adulto , Músculos Faciais/fisiologia , Feminino , Humanos , Masculino , Músculos do Pescoço/fisiologia , Adulto Jovem
3.
IEEE/ACM Trans Audio Speech Lang Process ; 25(12): 2386-2398, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29552581

RESUMO

Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speech-to-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, 8 individuals at least 6 months after total laryngectomy were recorded using 8 sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2,500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full 8-sensor set (averaging 9.5% for the top 4 participants), and 13.6% when reducing the sensor set to 4 locations per individual (n=7). This study provides a compelling proof-of-concept for sEMG-based alaryngeal speech recognition, with the strong potential to further improve recognition performance.

4.
Artigo em Inglês | MEDLINE | ID: mdl-22255424

RESUMO

sEMG based silent speech recognition systems seek to bypass the limitations of acoustic speech recognition by measuring and interpreting muscle activity of the facial and neck musculature involved in speech production. However, this speech recognition modality introduces unique challenges of its own. This paper describes signal acquisition and processing strategies that we have employed to address these challenges during our development of a silent speech recognition system.


Assuntos
Eletromiografia/métodos , Processamento de Sinais Assistido por Computador , Fala , Humanos
5.
Clin Linguist Phon ; 24(9): 742-58, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20687828

RESUMO

This study investigated the relationship between acoustic spectral/cepstral measures and listener severity ratings in normal and disordered voice samples. CAPE-V sentence samples and the vowel /a/were elicited from eight normal speakers and 24 patients with varying degrees of dysphonia severity. Samples were analysed for measures of the cepstral peak prominence (CPP), the ratio of low-to-high spectral energy, and their respective standard deviations. Perceptual ratings of overall severity were also obtained for all samples. Results showed that all acoustic variables combined in a four-factor model which correlated with perceived severity with R = 0.81 (R(2) = 0.65). For the vowel /a/, a five-factor model incorporating all acoustic variables and gender correlated with perceived severity with R = 0.96 (R(2) = 0.91). Results indicate that a strong relationship between perceptual and acoustic estimates of dysphonia severity can be achieved in both continuous speech and vowel contexts using a model incorporating spectral/cepstral measures.


Assuntos
Acústica , Percepção Auditiva , Distúrbios da Fala/psicologia , Qualidade da Voz , Adulto , Idoso , Disfonia/diagnóstico , Disfonia/psicologia , Feminino , Humanos , Julgamento , Idioma , Percepção Sonora , Masculino , Pessoa de Meia-Idade , Valores de Referência , Fala
6.
J Speech Lang Hear Res ; 48(4): 766-79, 2005 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16378472

RESUMO

A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified acoustic abnormalities to the perception of degraded quality in EL speech. Ten normal listeners evaluated the sound quality of EL speech tokens that had been acoustically enhanced by (a) increased low-frequency energy, (b) EL-noise reduction, and (c) fundamental frequency variation to mimic normal pitch intonation in relation to nonenhanced EL speech, normal speech, and normal monotonous speech (fundamental frequency variation removed). In comparing all possible combinations of token pairs, listeners were asked to identify which one of each pair sounded most like normal natural speech, and then to rate on a visual analog scale how different the chosen token was from normal speech. The results indicate that although EL speech can be most improved by removing the EL noise and providing proper pitch information, the resulting quality is still well below that of normal natural speech or even that of monotonous natural speech. This suggests that, in addition to the widely acknowledged acoustic abnormalities examined in this investigation, there are other attributes that contribute significantly to the unnatural quality of EL speech. Such additional factors need to be clearly identified and remedied before EL speech can be made to more closely approximate the sound quality of normal natural speech.


Assuntos
Fonética , Acústica da Fala , Percepção da Fala , Voz Alaríngea , Qualidade da Voz , Feminino , Humanos , Laringectomia , Laringe Artificial , Masculino , Modelos Biológicos
7.
J Acoust Soc Am ; 114(2): 1035-47, 2003 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-12942982

RESUMO

Measurements of the neck frequency response function (NFRF), defined as the ratio of the spectrum of the estimated volume velocity that excites the vocal tract to the spectrum of the acceleration delivered to the neck wall, were made at three different positions on the necks of nine laryngectomized subjects (five males and four females) and four normal laryngeal speakers (two males and two females). A minishaker driven by broadband noise provided excitation to the necks of subjects as they configured their vocal tracts to mimic the production of the vowels /a/, /ae/, and /I/. The sound pressure at the lips was measured with a microphone and an impedance head mounted on the shaker measured the acceleration. The neck wall passed low-frequency sound energy better than high-frequency sound energy, and thus the NFRF was accurately modeled as a low-pass filter. The NFRFs of the different subject groups (female laryngeal, male laryngeal speakers, laryngectomized males, and laryngectomized females) differed from each other in terms of corner frequency and gain, with both types of male subjects presenting NFRFs with larger overall gains. In addition, there was a notable amount of intersubject variability within groups. Because the NFRF is an estimate of how sound energy passes through the neck wall, these results should aid in the design of improved neck-type electrolarynx devices.


Assuntos
Laringectomia , Laringe Artificial , Pescoço , Estimulação Acústica/instrumentação , Idoso , Desenho de Equipamento , Feminino , Humanos , Lábio/fisiologia , Masculino , Pessoa de Meia-Idade , Modelos Biológicos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...