Search | VHL Regional Portal

1.

Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.

Lee, Geon Woo; Kim, Hong Kook.

Sensors (Basel) ; 24(8)2024 Apr 17.

Article in English | MEDLINE | ID: mdl-38676191

ABSTRACT

This paper addresses a joint training approach applied to a pipeline comprising speech enhancement (SE) and automatic speech recognition (ASR) models, where an acoustic tokenizer is included in the pipeline to leverage the linguistic information from the ASR model to the SE model. The acoustic tokenizer takes the outputs of the ASR encoder and provides a pseudo-label through K-means clustering. To transfer the linguistic information, represented by pseudo-labels, from the acoustic tokenizer to the SE model, a cluster-based pairwise contrastive (CBPC) loss function is proposed, which is a self-supervised contrastive loss function, and combined with an information noise contrastive estimation (infoNCE) loss function. This combined loss function prevents the SE model from overfitting to outlier samples and represents the pronunciation variability in samples with the same pseudo-label. The effectiveness of the proposed CBPC loss function is evaluated on a noisy LibriSpeech dataset by measuring both the speech quality scores and the word error rate (WER). The experimental results reveal that the proposed joint training approach using the described CBPC loss function achieves a lower WER than the conventional joint training approaches. In addition, it is demonstrated that the speech quality scores of the SE model trained using the proposed training approach are higher than those of the standalone-SE model and SE models trained using conventional joint training approaches. An ablation study is also conducted to investigate the effects of different combinations of loss functions on the speech quality scores and WER. Here, it is revealed that the proposed CBPC loss function combined with infoNCE contributes to a reduced WER and an increase in most of the speech quality scores.

Subject(s)

Noise , Speech Recognition Software , Humans , Cluster Analysis , Algorithms , Speech/physiology

2.

Informer-Based Temperature Prediction Using Observed and Numerical Weather Prediction Data.

Jun, Jimin; Kim, Hong Kook.

Sensors (Basel) ; 23(16)2023 Aug 09.

Article in English | MEDLINE | ID: mdl-37631584

ABSTRACT

This paper proposes an Informer-based temperature prediction model to leverage data from an automatic weather station (AWS) and a local data assimilation and prediction system (LDAPS), where the Informer as a variant of a Transformer was developed to better deal with time series data. Recently, deep-learning-based temperature prediction models have been proposed, demonstrating successful performances, such as conventional neural network (CNN)-based models, bi-directional long short-term memory (BLSTM)-based models, and a combination of both neural networks, CNN-BLSTM. However, these models have encountered issues due to the lack of time data integration during the training phase, which also lead to the persistence of a long-term dependency problem in the LSTM models. These limitations have culminated in a performance deterioration when the prediction time length was extended. To overcome these issues, the proposed model first incorporates time-periodic information into the learning process by generating time-periodic information and inputting it into the model. Second, the proposed model replaces the LSTM with an Informer as an alternative to mitigating the long-term dependency problem. Third, a series of fusion operations between AWS and LDAPS data are executed to examine the effect of each dataset on the temperature prediction performance. The performance of the proposed temperature prediction model is evaluated via objective measures, including the root-mean-square error (RMSE) and mean absolute error (MAE) over different timeframes, ranging from 6 to 336 h. The experiments showed that the proposed model relatively reduced the average RMSE and MAE by 0.25 °C and 0.203 °C, respectively, compared with the results of the CNN-BLSTM-based model.

3.

Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition.

Lee, Geon Woo; Kim, Hong Kook.

Sensors (Basel) ; 22(14)2022 Jul 19.

Article in English | MEDLINE | ID: mdl-35891070

ABSTRACT

In this paper, a new two-step joint optimization approach based on the asynchronous subregion optimization method is proposed for training a pipeline model composed of two different models. The first-step processing of the proposed joint optimization approach trains the front-end model only, and the second-step processing trains all the parameters of the combined model together. In the asynchronous subregion optimization method, the first-step processing only supports the goal of the front-end model. However, the first-step processing of the proposed approach works with a new loss function to make the front-end model support the goal of the back-end model. The proposed optimization approach was applied, here, to a pipeline composed of a deep complex convolutional recurrent network (DCCRN)-based speech enhancement model and a conformer-transducer-based ASR model as a front-end and a back-end, respectively. Then, the performance of the proposed two-step joint optimization approach was evaluated on the LibriSpeech automatic speech recognition (ASR) corpus in noisy environments by measuring the character error rate (CER) and word error rate (WER). In addition, an ablation study was carried out to examine the effectiveness of the proposed optimization approach on each of the processing blocks in the conformer-transducer ASR model. Consequently, it was shown from the ablation study that the conformer-transducer-based ASR model with the joint network trained only by the proposed optimization approach achieved the lowest average CER and WER. Moreover, the proposed optimization approach reduced the average CER and WER on the Test-Noisy dataset under matched noise conditions by 0.30% and 0.48%, respectively, compared to the approach of separate optimization of speech enhancement and ASR. Compared to the conventional two-step joint optimization approach, the proposed optimization approach provided average CER and WER reductions of 0.22% and 0.31%, respectively. Moreover, it was revealed that the proposed optimization approach achieved a lower average CER and WER, by 0.32% and 0.43%, respectively, than the conventional optimization approach under mismatched noise conditions.

Subject(s)

Speech Perception , Speech , Noise , Speech Recognition Software

4.

An Efficient Compression Method of Underwater Acoustic Sensor Signals for Underwater Surveillance.

Kim, Yong Guk; Kim, Dong Gwan; Kim, Kyucheol; Choi, Chang-Ho; Park, Nam In; Kim, Hong Kook.

Sensors (Basel) ; 22(9)2022 Apr 29.

Article in English | MEDLINE | ID: mdl-35591105

ABSTRACT

In this paper, we propose a new compression method using underwater acoustic sensor signals for underwater surveillance. Generally, sonar applications that are used for surveillance or ocean monitoring are composed of many underwater acoustic sensors to detect significant sources of sound. It is necessary to apply compression methods to the acquired sensor signals due to data processing and storage resource limitations. In addition, depending on the purposes of the operation and the characteristics of the operating environment, it may also be necessary to apply compression methods of low complexity. Accordingly, in this research, a low-complexity and nearly lossless compression method for underwater acoustic sensor signals is proposed. In the design of the proposed method, we adopt the concepts of quadrature mirror filter (QMF)-based sub-band splitting and linear predictive coding, and we attempt to analyze an entropy coding technique suitable for underwater sensor signals. The experiments show that the proposed method achieves better performance in terms of compression ratio and processing time than popular or standardized lossless compression techniques. It is also shown that the compression ratio of the proposed method is almost the same as that of SHORTEN with a 10-bit maximum mode, and both methods achieve a similar peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index on average.

5.

Anti-Biofilm Effects of Torilis japonica Ethanol Extracts against Staphylococcus aureus.

Kim, Geun-Seop; Park, Chae-Rin; Kim, Ji-Eun; Kim, Hong-Kook; Kim, Byeong-Soo.

J Microbiol Biotechnol ; 32(2): 220-227, 2022 Feb 28.

Article in English | MEDLINE | ID: mdl-34866130

ABSTRACT

The spread of antibiotic-resistant strains of Staphylococcus aureus, a gram-positive opportunistic pathogen, has increased due to the frequent use of antibiotics. Inhibition of the quorum-sensing systems of biofilm-producing strains using plant extracts represents an efficient approach for controlling infections. Torilis japonica is a medicinal herb showing various bioactivities; however, no studies have reported the anti-biofilm effects of T. japonica extracts against drug-resistant S. aureus. In this study, we evaluated the inhibitory effects of T. japonica ethanol extract (TJE) on biofilm production in methicillin-sensitive S. aureus (MSSA) KCTC 1927, methicillin-resistant S. aureus (MRSA) KCCM 40510, and MRSA KCCM 40511. Biofilm assays showed that TJE could inhibit biofilm formation in all strains. Furthermore, the hemolysis of sheep blood was found to be reduced when the strains were treated with TJE. The mRNA expression of agrA, sarA, icaA, hla, and RNAIII was evaluated using reverse transcription-polymerase chain reaction to determine the effect of TJE on the regulation of genes encoding quorum sensing-related virulence factors in MSSA and MRSA. The expression of hla reduced in a concentration-dependent manner upon treatment with TJE. Moreover, the expression levels of other genes were significantly reduced compared to those in the control group. In conclusion, TJE can suppress biofilm formation and virulence factor-related gene expression in MSSA and MRSA strains. The extract may therefore be used to develop treatments for infections caused by antibiotic-resistant S. aureus.

Subject(s)

Methicillin-Resistant Staphylococcus aureus , Staphylococcal Infections , Animals , Anti-Bacterial Agents/pharmacology , Biofilms , Ethanol/pharmacology , Microbial Sensitivity Tests , Plant Extracts/pharmacology , Sheep , Staphylococcal Infections/drug therapy , Staphylococcus aureus

6.

End-to-End Model-Based Detection of Infants with Autism Spectrum Disorder Using a Pretrained Model.

Lee, Jung Hyuk; Lee, Geon Woo; Bong, Guiyoung; Yoo, Hee Jeong; Kim, Hong Kook.

Sensors (Basel) ; 23(1)2022 Dec 25.

Article in English | MEDLINE | ID: mdl-36616801

ABSTRACT

In this paper, we propose an end-to-end (E2E) neural network model to detect autism spectrum disorder (ASD) from children's voices without explicitly extracting the deterministic features. In order to obtain the decisions for discriminating between the voices of children with ASD and those with typical development (TD), we combined two different feature-extraction models and a bidirectional long short-term memory (BLSTM)-based classifier to obtain the ASD/TD classification in the form of probability. We realized one of the feature extractors as the bottleneck feature from an autoencoder using the extended version of the Geneva minimalistic acoustic parameter set (eGeMAPS) input. The other feature extractor is the context vector from a pretrained wav2vec2.0-based model directly applied to the waveform input. In addition, we optimized the E2E models in two different ways: (1) fine-tuning and (2) joint optimization. To evaluate the performance of the proposed E2E models, we prepared two datasets from video recordings of ASD diagnoses collected between 2016 and 2018 at Seoul National University Bundang Hospital (SNUBH), and between 2019 and 2021 at a Living Lab. According to the experimental results, the proposed wav2vec2.0-based E2E model with joint optimization achieved significant improvements in the accuracy and unweighted average recall, from 64.74% to 71.66% and from 65.04% to 70.81%, respectively, compared with a conventional model using autoencoder-based BLSTM and the deterministic features of the eGeMAPS.

Subject(s)

Autism Spectrum Disorder , Child , Humans , Infant , Autism Spectrum Disorder/diagnosis , Memory, Long-Term , Video Recording/methods

7.

Temperature Prediction Based on Bidirectional Long Short-Term Memory and Convolutional Neural Network Combining Observed and Numerical Forecast Data.

Jeong, Seongyoep; Park, Inyoung; Kim, Hyun Soo; Song, Chul Han; Kim, Hong Kook.

Sensors (Basel) ; 21(3)2021 Jan 31.

Article in English | MEDLINE | ID: mdl-33572653

ABSTRACT

Weather is affected by a complex interplay of factors, including topography, location, and time. For the prediction of temperature in Korea, it is necessary to use data from multiple regions. To this end, we investigate the use of deep neural-network-based temperature prediction model time-series weather data obtained from an automatic weather station and image data from a regional data assimilation and prediction system (RDAPS). To accommodate such different types of data into a single model, a bidirectional long short-term memory (BLSTM) model and a convolutional neural network (CNN) model are chosen to represent the features from the time-series observed data and the RDAPS image data. The two types of features are combined to produce temperature predictions for up to 14 days in the future. The performance of the proposed temperature prediction model is evaluated by objective measures, including the root mean squared error and mean bias error. The experiments demonstrated that the proposed model combining both the observed and RDAPS image data is better in all performance measures for all prediction periods compared with the BLSTM-based model using observed data and the CNN-BLSTM-based model using RDAPS image data alone.

8.

Deep-Learning-Based Detection of Infants with Autism Spectrum Disorder Using Auto-Encoder Feature Representation.

Lee, Jung Hyuk; Lee, Geon Woo; Bong, Guiyoung; Yoo, Hee Jeong; Kim, Hong Kook.

Sensors (Basel) ; 20(23)2020 Nov 26.

Article in English | MEDLINE | ID: mdl-33256061

ABSTRACT

Autism spectrum disorder (ASD) is a developmental disorder with a life-span disability. While diagnostic instruments have been developed and qualified based on the accuracy of the discrimination of children with ASD from typical development (TD) children, the stability of such procedures can be disrupted by limitations pertaining to time expenses and the subjectivity of clinicians. Consequently, automated diagnostic methods have been developed for acquiring objective measures of autism, and in various fields of research, vocal characteristics have not only been reported as distinctive characteristics by clinicians, but have also shown promising performance in several studies utilizing deep learning models based on the automated discrimination of children with ASD from children with TD. However, difficulties still exist in terms of the characteristics of the data, the complexity of the analysis, and the lack of arranged data caused by the low accessibility for diagnosis and the need to secure anonymity. In order to address these issues, we introduce a pre-trained feature extraction auto-encoder model and a joint optimization scheme, which can achieve robustness for widely distributed and unrefined data using a deep-learning-based method for the detection of autism that utilizes various models. By adopting this auto-encoder-based feature extraction and joint optimization in the extended version of the Geneva minimalistic acoustic parameter set (eGeMAPS) speech feature data set, we acquire improved performance in the detection of ASD in infants compared to the raw data set.

Subject(s)

Autism Spectrum Disorder , Deep Learning , Autism Spectrum Disorder/diagnosis , Child , Female , Humans , Infant , Male , Speech

9.

Convolutional Recurrent Neural Network-Based Event Detection in Tunnels Using Multiple Microphones.

Kim, Nam Kyun; Jeon, Kwang Myung; Kim, Hong Kook.

Sensors (Basel) ; 19(12)2019 Jun 14.

Article in English | MEDLINE | ID: mdl-31208007

ABSTRACT

This paper proposes a sound event detection (SED) method in tunnels to prevent further uncontrollable accidents. Tunnel accidents are accompanied by crashes and tire skids, which usually produce abnormal sounds. Since the tunnel environment always has a severe level of noise, the detection accuracy can be greatly reduced in the existing methods. To deal with the noise issue in the tunnel environment, the proposed method involves the preprocessing of tunnel acoustic signals and a classifier for detecting acoustic events in tunnels. For preprocessing, a non-negative tensor factorization (NTF) technique is used to separate the acoustic event signal from the noisy signal in the tunnel. In particular, the NTF technique developed in this paper consists of source separation and online noise learning. In other words, the noise basis is adapted by an online noise learning technique for enhancement in adverse noise conditions. Next, a convolutional recurrent neural network (CRNN) is extended to accommodate the contributions of the separated event signal and noise to the event detection; thus, the proposed CRNN is composed of event convolution layers and noise convolution layers in parallel followed by recurrent layers and the output layer. Here, a set of mel-filterbank feature parameters is used as the input features. Evaluations of the proposed method are conducted on two datasets: a publicly available road audio events dataset and a tunnel audio dataset recorded in a real traffic tunnel for six months. In the first evaluation where the background noise is low, the proposed CRNN-based SED method with online noise learning reduces the relative recognition error rate by 56.25% when compared to the conventional CRNN-based method with noise. In the second evaluation, where the tunnel background noise is more severe than in the first evaluation, the proposed CRNN-based SED method yields superior performance when compared to the conventional methods. In particular, it is shown that among all of the compared methods, the proposed method with the online noise learning provides the best recognition rate of 91.07% and reduces the recognition error rates by 47.40% and 28.56% when compared to the Gaussian mixture model (GMM)-hidden Markov model (HMM)-based and conventional CRNN-based SED methods, respectively. The computational complexity measurements also show that the proposed CRNN-based SED method requires a processing time of 599 ms for both the NTF-based source separation with online noise learning and CRNN classification when the tunnel noisy signal is one second long, which implies that the proposed method detects events in real-time.

10.

Burst packet loss concealment using multiple codebooks and comfort noise for CELP-type speech coders in wireless sensor networks.

Park, Nam In; Kim, Hong Kook; Jung, Min A; Lee, Seong Ro; Choi, Seung Ho.

Sensors (Basel) ; 11(5): 5323-36, 2011.

Article in English | MEDLINE | ID: mdl-22163902

ABSTRACT

In this paper, a packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed in order to improve the quality of decoded speech under burst packet loss conditions in a wireless sensor network. Conventional receiver-based PLC algorithms in the G.729 speech codec are usually based on speech correlation to reconstruct the decoded speech of lost frames by using parameter information obtained from the previous correctly received frames. However, this approach has difficulty in reconstructing voice onset signals since the parameters such as pitch, linear predictive coding coefficient, and adaptive/fixed codebooks of the previous frames are mostly related to silence frames. Thus, in order to reconstruct speech signals in the voice onset intervals, we propose a multiple codebook-based approach that includes a traditional adaptive codebook and a new random codebook composed of comfort noise. The proposed PLC algorithm is designed as a PLC algorithm for G.729 and its performance is then compared with that of the PLC algorithm currently employed in G.729 via a perceptual evaluation of speech quality, a waveform comparison, and a preference test under different random and burst packet loss conditions. It is shown from the experiments that the proposed PLC algorithm provides significantly better speech quality than the PLC algorithm employed in G.729 under all the test conditions.

Subject(s)

Biosensing Techniques/instrumentation , Computer Communication Networks/instrumentation , Speech , Wireless Technology/instrumentation , Biosensing Techniques/methods , Humans , Voice

11.

Adaptive redundant speech transmission over wireless multimedia sensor networks based on estimation of perceived speech quality.

Kang, Jin Ah; Kim, Hong Kook.

Sensors (Basel) ; 11(9): 8469-84, 2011.

Article in English | MEDLINE | ID: mdl-22164086

ABSTRACT

An adaptive redundant speech transmission (ARST) approach to improve the perceived speech quality (PSQ) of speech streaming applications over wireless multimedia sensor networks (WMSNs) is proposed in this paper. The proposed approach estimates the PSQ as well as the packet loss rate (PLR) from the received speech data. Subsequently, it decides whether the transmission of redundant speech data (RSD) is required in order to assist a speech decoder to reconstruct lost speech signals for high PLRs. According to the decision, the proposed ARST approach controls the RSD transmission, then it optimizes the bitrate of speech coding to encode the current speech data (CSD) and RSD bitstream in order to maintain the speech quality under packet loss conditions. The effectiveness of the proposed ARST approach is then demonstrated using the adaptive multirate-narrowband (AMR-NB) speech codec and ITU-T Recommendation P.563 as a scalable speech codec and the PSQ estimation, respectively. It is shown from the experiments that a speech streaming application employing the proposed ARST approach significantly improves speech quality under packet loss conditions in WMSNs.

Subject(s)

Radio Waves , Speech , Humans

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL