Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Speech Lang Hear Res ; 66(8S): 3206-3221, 2023 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-37146629

RESUMO

PURPOSE: Current electromagnetic tongue tracking devices are not amenable for daily use and thus not suitable for silent speech interface and other applications. We have recently developed MagTrack, a novel wearable electromagnetic articulograph tongue tracking device. This study aimed to validate MagTrack for potential silent speech interface applications. METHOD: We conducted two experiments: (a) classification of eight isolated vowels in consonant-vowel-consonant form and (b) continuous silent speech recognition. In these experiments, we used data from healthy adult speakers collected with MagTrack. The performance of vowel classification was measured by accuracies. The continuous silent speech recognition was measured by phoneme error rates. The performance was then compared with results using data collected with commercial electromagnetic articulograph in a prior study. RESULTS: The isolated vowel classification using MagTrack achieved an average accuracy of 89.74% when leveraging all MagTrack signals (x, y, z coordinates; orientation; and magnetic signals), which outperformed the accuracy using commercial electromagnetic articulograph data (only y, z coordinates) in our previous study. The continuous speech recognition from two subjects using MagTrack achieved phoneme error rates of 73.92% and 66.73%, respectively. The commercial electromagnetic articulograph achieved 64.53% from the same subject (66.73% using MagTrack data). CONCLUSIONS: MagTrack showed comparable results with the commercial electromagnetic articulograph when using the same localized information. Adding raw magnetic signals would improve the performance of MagTrack. Our preliminary testing demonstrated the potential for silent speech interface as a lightweight wearable device. This work also lays the foundation to support MagTrack's potential for other applications including visual feedback-based speech therapy and second language learning.


Assuntos
Percepção da Fala , Fala , Adulto , Humanos , Fonética , Movimento (Física) , Língua , Retroalimentação Sensorial
2.
ChemSusChem ; 16(11): e202202184, 2023 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-36814358

RESUMO

Construction of Z-scheme photocatalyst is an effective approach for using solar energy to produce hydrogen during water splitting. Herein, 2D/2D WO3 /g-C3 N4 heterojunction photocatalyst was synthesized by a convenient and green method including exfoliation and heterojunction procedures, in the reverse microemulsion system via supercritical carbon dioxide (scCO2 ). The resultant W/CN-10.3 composite exhibited enhanced photocatalytic activities towards the hydrogen evolution during water splitting with a hydrogen evolution rate of 688.51 µmol g-1 h-1 , which was more than 16 times higher than bulk g-C3 N4 with the same loading amount of Pt as cocatalyst. Due to its effective separation of photogenerated carriers and prolonged lifetime, more photoexcited electrons with high reduction ability could contribute to the production of H2 . Possible formation mechanism of 2D-2D WO3 /g-C3 N4 nanosheets via scCO2 in the reverse microemulsion system by the one-pot method has been proposed. This work provides an efficient and green strategy to synthesize 2D-2D heterojunction for the utilization in solar-to-fuel conversion.


Assuntos
Dióxido de Carbono , Energia Solar , Elétrons , Hidrogênio , Água
3.
Sensors (Basel) ; 22(16)2022 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-36015817

RESUMO

Silent speech interfaces (SSIs) convert non-audio bio-signals, such as articulatory movement, to speech. This technology has the potential to recover the speech ability of individuals who have lost their voice but can still articulate (e.g., laryngectomees). Articulation-to-speech (ATS) synthesis is an algorithm design of SSI that has the advantages of easy-implementation and low-latency, and therefore is becoming more popular. Current ATS studies focus on speaker-dependent (SD) models to avoid large variations of articulatory patterns and acoustic features across speakers. However, these designs are limited by the small data size from individual speakers. Speaker adaptation designs that include multiple speakers' data have the potential to address the issue of limited data size from single speakers; however, few prior studies have investigated their performance in ATS. In this paper, we investigated speaker adaptation on both the input articulation and the output acoustic signals (with or without direct inclusion of data from test speakers) using the publicly available electromagnetic articulatory (EMA) dataset. We used Procrustes matching and voice conversion for articulation and voice adaptation, respectively. The performance of the ATS models was measured objectively by the mel-cepstral distortions (MCDs). The synthetic speech samples were generated and are provided in the supplementary material. The results demonstrated the improvement brought by both Procrustes matching and voice conversion on speaker-independent ATS. With the direct inclusion of target speaker data in the training process, the speaker-adaptive ATS achieved a comparable performance to speaker-dependent ATS. To our knowledge, this is the first study that has demonstrated that speaker-adaptive ATS can achieve a non-statistically different performance to speaker-dependent ATS.


Assuntos
Percepção da Fala , Voz , Acústica , Humanos , Fala , Acústica da Fala
4.
Int J Speech Lang Pathol ; 20(6): 669-679, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30409057

RESUMO

Purpose: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples. Method: Twelve participants with ALS and two normal subjects produced a total of 1831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility × speaking rate) from acoustic and articulatory features of the recorded samples. Result: Acoustic, lip movement, and tongue movement information separately, yielded a R2 of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R2 (0.712) and the lowest RMSE (37.562 WPM). Conclusion: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.


Assuntos
Esclerose Lateral Amiotrófica/complicações , Distúrbios da Fala/diagnóstico , Distúrbios da Fala/etiologia , Interface para o Reconhecimento da Fala , Máquina de Vetores de Suporte , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Acústica da Fala , Inteligibilidade da Fala/fisiologia , Medida da Produção da Fala/métodos
5.
IEEE/ACM Trans Audio Speech Lang Process ; 25(12): 2323-2336, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30271809

RESUMO

Silent speech recognition (SSR) converts non-audio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing effective speaker-independent SSR approaches. Speaker-independent SSR approaches, however, are critical for reducing the amount of training data required from each speaker. In this paper, we investigate speaker-independent SSR from the movements of flesh points on tongue and lip with articulatory normalization methods that reduce the inter-speaker variation. To minimize the across-speaker physiological differences of the articulators, we propose Procrustes matching-based articulatory normalization by removing locational, rotational, and scaling differences. To further normalize the articulatory data, we apply feature-space maximum likelihood linear regression and i-vector. In this paper, we adopt a bidirectional long short term memory recurrent neural network (BLSTM) as an articulatory model to effectively model the articulatory movements with long-range articulatory history. A silent speech data set with flesh points was collected using an electromagnetic articulograph (EMA) from twelve healthy and two laryngectomized English speakers. Experimental results showed the effectiveness of our speaker-independent SSR approaches on healthy as well as laryngectomy speakers. In addition, BLSTM outperformed standard deep neural network. The best performance was obtained by BLSTM with all the three normalization approaches combined.

6.
Artigo em Inglês | MEDLINE | ID: mdl-29423453

RESUMO

Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recognizing whispered speech since articulatory motion patterns are generally not affected. In this paper, we investigated whispered speech recognition for patients with reconstructed larynx using articulatory movement data. A data set with both acoustic and articulatory motion data was collected from a patient with surgically reconstructed larynx using an electromagnetic articulograph. Two speech recognition systems, Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network-HMM (DNN-HMM), were used in the experiments. Experimental results showed adding either tongue or lip motion data to acoustic features such as mel-frequency cepstral coefficient (MFCC) significantly reduced the phone error rates on both speech recognition systems. Adding both tongue and lip data achieved the best performance.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...