Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-37938964

ABSTRACT

Dysarthria, a speech disorder often caused by neurological damage, compromises the control of vocal muscles in patients, making their speech unclear and communication troublesome. Recently, voice-driven methods have been proposed to improve the speech intelligibility of patients with dysarthria. However, most methods require a significant representation of both the patient's and target speaker's corpus, which is problematic. This study aims to propose a data augmentation-based voice conversion (VC) system to reduce the recording burden on the speaker. We propose dysarthria voice conversion 3.1 (DVC 3.1) based on a data augmentation approach, including text-to-speech and StarGAN-VC architecture, to synthesize a large target and patient-like corpus to lower the burden of recording. An objective evaluation metric of the Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC 3.1 under free-talk conditions. The DVC system without data augmentation (DVC 3.0) was used for comparison. Subjective and objective evaluation based on the experimental results indicated that the proposed DVC 3.1 system enhanced the Google ASR of two dysarthria patients by approximately [62.4%, 43.3%] and [55.9%, 57.3%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. Further, the proposed DVC 3.1 increased the speech intelligibility of two dysarthria patients by approximately [54.2%, 22.3%] and [63.4%, 70.1%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. The proposed DVC 3.1 system offers significant potential to improve the speech intelligibility performance of patients with dysarthria and enhance verbal communication quality.


Subject(s)
Dysarthria , Voice , Humans , Dysarthria/etiology , Speech Intelligibility/physiology , Laryngeal Muscles
2.
J Voice ; 2023 Jan 31.
Article in English | MEDLINE | ID: mdl-36732109

ABSTRACT

OBJECTIVE: Doctors, nowadays, primarily use auditory-perceptual evaluation, such as the grade, roughness, breathiness, asthenia, and strain scale, to evaluate voice quality and determine the treatment. However, the results predicted by individual physicians often differ, because of subjective perceptions, and diagnosis time interval, if the patient's symptoms are hard to judge. Therefore, an accurate computerized pathological voice quality assessment system will improve the quality of assessment. METHOD: This study proposes a self_attention-based system, with a deep learning technology, named self_attention-based bidirectional long-short term memory (SA BiLSTM). Different pitches [low, normal, high], and vowels [/a/, /i/, /u/], were added into the proposed model, to make it learn how professional doctors evaluate the grade, roughness, breathiness, asthenia, and strain scale, in a high dimension view. RESULTS: The experimental results showed that the proposed system provided higher performance than the baseline system. More specifically, the macro average of the F1 score, presented as decimal, was used to compare the accuracy of classification. The (G, R, and B) of the proposed system were (0.768±0.011, 0.820±0.009, and 0.815±0.009), which is higher than the baseline systems: deep neural network (0.395±0.010, 0.312±0.019, 0.321±0.014) and convolution neural network (0.421±0.052, 0.306±0.043, 0.3250±0.032) respectively. CONCLUSIONS: The proposed system, with SA BiLSTM, pitches, and vowels, provides a more accurate way to evaluate the voice. This will be helpful for clinical voice evaluations and will improve patients' benefits from voice therapy.

3.
Sensors (Basel) ; 22(19)2022 Sep 27.
Article in English | MEDLINE | ID: mdl-36236430

ABSTRACT

With the development of active noise cancellation (ANC) technology, ANC has been used to mitigate the effects of environmental noise on audiometric results. However, objective evaluation methods supporting the accuracy of audiometry for ANC exposure to different levels of noise have not been reported. Accordingly, the audio characteristics of three different ANC headphone models were quantified under different noise conditions and the feasibility of ANC in noisy environments was investigated. Steady (pink noise) and non-steady noise (cafeteria babble noise) were used to simulate noisy environments. We compared the integrity of pure-tone signals obtained from three different ANC headphone models after processing under different noise scenarios and analyzed the degree of ANC signal correlation based on the Pearson correlation coefficient compared to pure-tone signals in quiet. The objective signal correlation results were compared with audiometric screening results to confirm the correspondence. Results revealed that ANC helped mitigate the effects of environmental noise on the measured signal and the combined ANC headset model retained the highest signal integrity. The degree of signal correlation was used as a confidence indicator for the accuracy of hearing screening in noise results. It was found that the ANC technique can be further improved for more complex noisy environments.


Subject(s)
Mass Screening , Noise , Audiometry, Pure-Tone/methods , Feasibility Studies , Hearing
4.
Article in English | MEDLINE | ID: mdl-36085875

ABSTRACT

Generally, those patients with dysarthria utter a distorted sound and the restrained intelligibility of a speech for both human and machine. To enhance the intelligibility of dysarthric speech, we applied a deep learning-based speech enhancement (SE) system in this task. Conventional SE approaches are used for shrinking noise components from the noise-corrupted input, and thus improve the sound quality and intelligibility simultaneously. In this study, we are focusing on reconstructing the severely distorted signal from the dysarthric speech for improving intelligibility. The proposed SE system prepares a convolutional neural network (CNN) model in the training phase, which is then used to process the dysarthric speech in the testing phase. During training, paired dysarthric-normal speech utterances are required. We adopt a dynamic time warping technique to align the dysarthric-normal utter-ances. The gained training data are used to train a CNN - based SE model. The proposed SE system is evaluated on the Google automatic speech recognition (ASR) system and a subjective listening test. The results showed that the proposed method could notably enhance the recognition performance for more than 10% in each of ASR and human recognitions from the unprocessed dysarthric speech. Clinical Relevance- This study enhances the intelligibility and ASR accuracy from a dysarthria speech to more than 10.


Subject(s)
Dysarthria , Speech , Auditory Perception , Dysarthria/diagnosis , Humans , Neural Networks, Computer , Sound
5.
JASA Express Lett ; 2(5): 055202, 2022 05.
Article in English | MEDLINE | ID: mdl-36154065

ABSTRACT

Medical masks have become necessary of late because of the COVID-19 outbreak; however, they tend to attenuate the energy of speech signals and affect speech quality. Therefore, this study proposes an optical-based microphone approach to obtain speech signals from speakers' medical masks. Experimental results showed that the optical-based microphone approach achieved better performance (85.61%) than the two baseline approaches, namely, omnidirectional (24.17%) and directional microphones (31.65%), in the case of long-distance speech and background noise. The results suggest that the optical-based microphone method is a promising approach for acquiring speech from a medical mask.


Subject(s)
COVID-19 , Hearing Aids , Speech Perception , COVID-19/prevention & control , Equipment Design , Humans , Masks , Speech , Vibration
6.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 1972-1976, 2022 07.
Article in English | MEDLINE | ID: mdl-36086160

ABSTRACT

Envelope waveforms can be extracted from multiple frequency bands of a speech signal, and envelope waveforms carry important intelligibility information for human speech communication. This study aimed to investigate whether a deep learning-based model with features of temporal envelope information could synthesize an intelligible speech, and to study the effect of reducing the number (from 8 to 2 in this work) of temporal envelope information on the intelligibility of the synthesized speech. The objective evaluation metric of short-time objective intelligibility (STOI) showed that, on average, the synthesized speech of the proposed approach provided higher STOI (i.e., 0.8) scores in each test condition; and the human listening test showed that the average word correct rate of eight listeners was higher than 97.5%. These findings indicated that the proposed deep learning-based system can be a potential approach to synthesize a highly intelligible speech with limited envelope information in the future.


Subject(s)
Deep Learning , Speech Perception , Auditory Perception , Humans , Speech Intelligibility , Time Factors
7.
Comput Methods Programs Biomed ; 215: 106602, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35021138

ABSTRACT

BACKGROUND AND OBJECTIVE: Most dysarthric patients encounter communication problems due to unintelligible speech. Currently, there are many voice-driven systems aimed at improving their speech intelligibility; however, the intelligibility performance of these systems are affected by challenging application conditions (e.g., time variance of patient's speech and background noise). To alleviate these problems, we proposed a dysarthria voice conversion (DVC) system for dysarthric patients and investigated the benefits under challenging application conditions. METHOD: A deep learning-based voice conversion system with phonetic posteriorgram (PPG) features, called the DVC-PPG system, was proposed in this study. An objective-evaluation metric of Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC-PPG under quiet and noisy test conditions; besides, the well-known voice conversion system using mel-spectrogram, DVC-Mels, was used for comparison to verify the benefits of the proposed DVC-PPG system. RESULTS: The objective-evaluation metric of Google ASR showed the average accuracy of two subjects in the duplicate and outside test conditions while the DVC-PPG system provided higher speech recognitions rate (83.2% and 67.5%) than dysarthric speech (36.5% and 26.9%) and DVC-Mels (52.9% and 33.8%) under quiet conditions. However, the DVC-PPG system provided more stable performance than the DVC-Mels under noisy test conditions. In addition, the results of the listening test showed that the speech-intelligibility performance of DVC-PPG was better than those obtained via the dysarthria speech and DVC-Mels under the duplicate and outside conditions, respectively. CONCLUSIONS: The objective-evaluation metric and listening test results showed that the recognition rate of the proposed DVC-PPG system was significantly higher than those obtained via the original dysarthric speech and DVC-Mels system. Therefore, it can be inferred from our study that the DVC-PPG system can improve the ability of dysarthric patients to communicate with people under challenging application conditions.


Subject(s)
Speech Intelligibility , Voice , Dysarthria , Humans , Phonetics , Speech Production Measurement
8.
J Med Internet Res ; 23(10): e25460, 2021 10 28.
Article in English | MEDLINE | ID: mdl-34709193

ABSTRACT

BACKGROUND: Cochlear implant technology is a well-known approach to help deaf individuals hear speech again and can improve speech intelligibility in quiet conditions; however, it still has room for improvement in noisy conditions. More recently, it has been proven that deep learning-based noise reduction, such as noise classification and deep denoising autoencoder (NC+DDAE), can benefit the intelligibility performance of patients with cochlear implants compared to classical noise reduction algorithms. OBJECTIVE: Following the successful implementation of the NC+DDAE model in our previous study, this study aimed to propose an advanced noise reduction system using knowledge transfer technology, called NC+DDAE_T; examine the proposed NC+DDAE_T noise reduction system using objective evaluations and subjective listening tests; and investigate which layer substitution of the knowledge transfer technology in the NC+DDAE_T noise reduction system provides the best outcome. METHODS: The knowledge transfer technology was adopted to reduce the number of parameters of the NC+DDAE_T compared with the NC+DDAE. We investigated which layer should be substituted using short-time objective intelligibility and perceptual evaluation of speech quality scores as well as t-distributed stochastic neighbor embedding to visualize the features in each model layer. Moreover, we enrolled 10 cochlear implant users for listening tests to evaluate the benefits of the newly developed NC+DDAE_T. RESULTS: The experimental results showed that substituting the middle layer (ie, the second layer in this study) of the noise-independent DDAE (NI-DDAE) model achieved the best performance gain regarding short-time objective intelligibility and perceptual evaluation of speech quality scores. Therefore, the parameters of layer 3 in the NI-DDAE were chosen to be replaced, thereby establishing the NC+DDAE_T. Both objective and listening test results showed that the proposed NC+DDAE_T noise reduction system achieved similar performances compared with the previous NC+DDAE in several noisy test conditions. However, the proposed NC+DDAE_T only required a quarter of the number of parameters compared to the NC+DDAE. CONCLUSIONS: This study demonstrated that knowledge transfer technology can help reduce the number of parameters in an NC+DDAE while keeping similar performance rates. This suggests that the proposed NC+DDAE_T model may reduce the implementation costs of this noise reduction system and provide more benefits for cochlear implant users.


Subject(s)
Cochlear Implantation , Cochlear Implants , Speech Perception , Humans , Noise , Speech Intelligibility
9.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 803-807, 2020 07.
Article in English | MEDLINE | ID: mdl-33018107

ABSTRACT

Motion rehabilitation is increasingly required owing to an aging population and suffering of stroke, which means human motion analysis must be valued. Based on the concept mentioned above, a deep-learning-based system is proposed to track human motion based on three-dimensional (3D) images in this work; meanwhile, the features of traditional red green blue (RGB) images, known as two-dimensional (2D) images, were used as a comparison. The results indicate that 3D images have an advantage over 2D images due to the information of spatial relationships, which implies that the proposed system can be a potential technology for human motion analysis applications.


Subject(s)
Algorithms , Deep Learning , Aged , Humans , Imaging, Three-Dimensional , Motion
10.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 1838-1841, 2019 Jul.
Article in English | MEDLINE | ID: mdl-31946255

ABSTRACT

Dysarthria speakers suffer from poor communication, and voice conversion (VC) technology is a potential approach for improving their speech quality. This study presents a joint feature learning approach to improve a sub-band deep neural network-based VC system, termed J_SBDNN. In this study, a listening test of speech intelligibility is used to confirm the benefits of the proposed J_SBDNN VC system, with several well-known VC approaches being used for comparison. The results showed that the J_SBDNN VC system provided a higher speech intelligibility performance than other VC approaches in most test conditions. It implies that the J_SBDNN VC system could potentially be used as one of the electronic assistive technologies to improve the speech quality for a dysarthric speaker.


Subject(s)
Deep Learning , Dysarthria/therapy , Self-Help Devices , Speech Intelligibility , Voice , Humans , Speech Production Measurement
11.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 404-408, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30440419

ABSTRACT

The performance of a deep-learning-based speech enhancement (SE) technology for hearing aid users, called a deep denoising autoencoder (DDAE), was investigated. The hearing-aid speech perception index (HASPI) and the hearing- aid sound quality index (HASQI), which are two well-known evaluation metrics for speech intelligibility and quality, were used to evaluate the performance of the DDAE SE approach in two typical high-frequency hearing loss (HFHL) audiograms. Our experimental results show that the DDAE SE approach yields higher intelligibility and quality scores than two classical SE approaches. These results suggest that a deep-learning-based SE method could be used to improve speech intelligibility and quality for hearing aid users in noisy environments.


Subject(s)
Deep Learning , Hearing Aids , Auditory Perception , Hearing Loss, Sensorineural/rehabilitation , Hearing Tests , Humans , Sound , Speech Intelligibility , Speech Perception
SELECTION OF CITATIONS
SEARCH DETAIL
...