Search | VHL Regional Portal

1.

Robust Human Face Emotion Classification Using Triplet-Loss-Based Deep CNN Features and SVM.

Haider, Irfan; Yang, Hyung-Jeong; Lee, Guee-Sang; Kim, Soo-Hyung.

Sensors (Basel) ; 23(10)2023 May 15.

Article in English | MEDLINE | ID: mdl-37430689

ABSTRACT

Human facial emotion detection is one of the challenging tasks in computer vision. Owing to high inter-class variance, it is hard for machine learning models to predict facial emotions accurately. Moreover, a person with several facial emotions increases the diversity and complexity of classification problems. In this paper, we have proposed a novel and intelligent approach for the classification of human facial emotions. The proposed approach comprises customized ResNet18 by employing transfer learning with the integration of triplet loss function (TLF), followed by SVM classification model. Using deep features from a customized ResNet18 trained with triplet loss, the proposed pipeline consists of a face detector used to locate and refine the face bounding box and a classifier to identify the facial expression class of discovered faces. RetinaFace is used to extract the identified face areas from the source image, and a ResNet18 model is trained on cropped face images with triplet loss to retrieve those features. An SVM classifier is used to categorize the facial expression based on the acquired deep characteristics. In this paper, we have proposed a method that can achieve better performance than state-of-the-art (SoTA) methods on JAFFE and MMI datasets. The technique is based on the triplet loss function to generate deep input image features. The proposed method performed well on the JAFFE and MMI datasets with an accuracy of 98.44% and 99.02%, respectively, on seven emotions; meanwhile, the performance of the method needs to be fine-tuned for the FER2013 and AFFECTNET datasets.

Subject(s)

Emotions , Support Vector Machine , Humans , Intelligence , Machine Learning

2.

DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection.

Dinh, My-Tham; Choi, Deok-Jai; Lee, Guee-Sang.

Sensors (Basel) ; 23(13)2023 Jun 25.

Article in English | MEDLINE | ID: mdl-37447738

ABSTRACT

Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods.

Subject(s)

Algorithms , Benchmarking , Electric Power Supplies

3.

Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst.

Trinh, Dang-Linh; Vo, Minh-Cong; Kim, Soo-Hyung; Yang, Hyung-Jeong; Lee, Guee-Sang.

Sensors (Basel) ; 23(1)2022 Dec 24.

Article in English | MEDLINE | ID: mdl-36616796

ABSTRACT

Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.

Subject(s)

Emotions , Speech Perception , Speech , Attention

4.

Effects of Multiple Filters on Liver Tumor Segmentation From CT Images.

Vo, Vi Thi-Tuong; Yang, Hyung-Jeong; Lee, Guee-Sang; Kang, Sae-Ryung; Kim, Soo-Hyung.

Front Oncol ; 11: 697178, 2021.

Article in English | MEDLINE | ID: mdl-34660267

ABSTRACT

Segmentation of liver tumors from Computerized Tomography (CT) images remains a challenge due to the natural variation in tumor shape and structure as well as the noise in CT images. A key assumption is that the performance of liver tumor segmentation depends on the characteristics of multiple features extracted from multiple filters. In this paper, we design an enhanced approach based on a two-class (liver, tumor) convolutional neural network that discriminates tumor as well as liver from CT images. First, the contrast and intensity values in CT images are adjusted and high frequencies are removed using Hounsfield units (HU) filtering and standardization. Then, the liver tumor is segmented from entire images with multiple filter U-net (MFU-net). Finally, a quantitative analysis is carried out to evaluate the segmentation results using three different methods: boundary-distance-based metrics, size-based metrics, and overlap-based metrics. The proposed method is validated on CT images from the 3Dircadb and LiTS dataset. The results demonstrate that the multiple filters are useful for extracting local and global feature simultaneously, minimizing the boundary distance errors, and our approach demonstrates better performance in heterogeneous tumor regions of CT images.

5.

EEG-Based Emotion Recognition by Convolutional Neural Network with Multi-Scale Kernels.

Phan, Tran-Dac-Thinh; Kim, Soo-Hyung; Yang, Hyung-Jeong; Lee, Guee-Sang.

Sensors (Basel) ; 21(15)2021 Jul 27.

Article in English | MEDLINE | ID: mdl-34372327

ABSTRACT

Besides facial or gesture-based emotion recognition, Electroencephalogram (EEG) data have been drawing attention thanks to their capability in countering the effect of deceptive external expressions of humans, like faces or speeches. Emotion recognition based on EEG signals heavily relies on the features and their delineation, which requires the selection of feature categories converted from the raw signals and types of expressions that could display the intrinsic properties of an individual signal or a group of them. Moreover, the correlation or interaction among channels and frequency bands also contain crucial information for emotional state prediction, and it is commonly disregarded in conventional approaches. Therefore, in our method, the correlation between 32 channels and frequency bands were put into use to enhance the emotion prediction performance. The extracted features chosen from the time domain were arranged into feature-homogeneous matrices, with their positions following the corresponding electrodes placed on the scalp. Based on this 3D representation of EEG signals, the model must have the ability to learn the local and global patterns that describe the short and long-range relations of EEG channels, along with the embedded features. To deal with this problem, we proposed the 2D CNN with different kernel-size of convolutional layers assembled into a convolution block, combining features that were distributed in small and large regions. Ten-fold cross validation was conducted on the DEAP dataset to prove the effectiveness of our approach. We achieved the average accuracies of 98.27% and 98.36% for arousal and valence binary classification, respectively.

Subject(s)

Electroencephalography , Neural Networks, Computer , Arousal , Electrodes , Emotions , Humans

6.

Esophagus Segmentation in CT Images via Spatial Attention Network and STAPLE Algorithm.

Tran, Minh-Trieu; Kim, Soo-Hyung; Yang, Hyung-Jeong; Lee, Guee-Sang; Oh, In-Jae; Kang, Sae-Ryung.

Sensors (Basel) ; 21(13)2021 Jul 02.

Article in English | MEDLINE | ID: mdl-34283090

ABSTRACT

One essential step in radiotherapy treatment planning is the organ at risk of segmentation in Computed Tomography (CT). Many recent studies have focused on several organs such as the lung, heart, esophagus, trachea, liver, aorta, kidney, and prostate. However, among the above organs, the esophagus is one of the most difficult organs to segment because of its small size, ambiguous boundary, and very low contrast in CT images. To address these challenges, we propose a fully automated framework for the esophagus segmentation from CT images. The proposed method is based on the processing of slice images from the original three-dimensional (3D) image so that our method does not require large computational resources. We employ the spatial attention mechanism with the atrous spatial pyramid pooling module to locate the esophagus effectively, which enhances the segmentation performance. To optimize our model, we use group normalization because the computation is independent of batch sizes, and its performance is stable. We also used the simultaneous truth and performance level estimation (STAPLE) algorithm to reach robust results for segmentation. Firstly, our model was trained by k-fold cross-validation. And then, the candidate labels generated by each fold were combined by using the STAPLE algorithm. And as a result, Dice and Hausdorff Distance scores have an improvement when applying this algorithm to our segmentation results. Our method was evaluated on SegTHOR and StructSeg 2019 datasets, and the experiment shows that our method outperforms the state-of-the-art methods in esophagus segmentation. Our approach shows a promising result in esophagus segmentation, which is still challenging in medical analyses.

Subject(s)

Image Processing, Computer-Assisted , Neural Networks, Computer , Algorithms , Esophagus/diagnostic imaging , Humans , Male , Tomography, X-Ray Computed

7.

Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models.

Do, Nhu-Tai; Kim, Soo-Hyung; Yang, Hyung-Jeong; Lee, Guee-Sang; Yeom, Soonja.

Sensors (Basel) ; 21(7)2021 Mar 27.

Article in English | MEDLINE | ID: mdl-33801739

ABSTRACT

Emotion recognition plays an important role in human-computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with "Conv2D+LSTM+3DCNN+Classify" architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.

Subject(s)

Awareness , Emotions , Humans , Photic Stimulation , Physical Therapy Modalities

8.

Distance Error Correction in Time-of-Flight Cameras Using Asynchronous Integration Time.

Baek, Eu-Tteum; Yang, Hyung-Jeong; Kim, Soo-Hyung; Lee, Gueesang; Jeong, Hieyong.

Sensors (Basel) ; 20(4)2020 Feb 20.

Article in English | MEDLINE | ID: mdl-32093157

ABSTRACT

A distance map captured using a time-of-flight (ToF) depth sensor has fundamental problems, such as ambiguous depth information in shiny or dark surfaces, optical noise, and mismatched boundaries. Severe depth errors exist in shiny and dark surfaces owing to excess reflection and excess absorption of light, respectively. Dealing with this problem has been a challenge due to the inherent hardware limitations of ToF, which measures the distance using the number of reflected photons. This study proposes a distance error correction method using three ToF sensors, set to different integration times to address the ambiguity in depth information. First, the three ToF depth sensors are installed horizontally at different integration times to capture distance maps at different integration times. Given the amplitude maps and error regions are estimated based on the amount of light, the estimated error regions are refined by exploiting the accurate depth information from the neighboring depth sensors that use different integration times. Moreover, we propose a new optical noise reduction filter that considers the distribution of the depth information biased toward one side. Experimental results verified that the proposed method overcomes the drawbacks of ToF cameras and provides enhanced distance maps.

9.

Separation of specular and diffuse components using tensor voting in color images.

Nguyen, Tam; Vo, Quang Nhat; Yang, Hyung-Jeong; Kim, Soo-Hyung; Lee, Guee-Sang.

Appl Opt ; 53(33): 7924-36, 2014 Nov 20.

Article in English | MEDLINE | ID: mdl-25607869

ABSTRACT

Most methods for the detection and removal of specular reflections suffer from nonuniform highlight regions and/or nonconverged artifacts induced by discontinuities in the surface colors, especially when dealing with highly textured, multicolored images. In this paper, a novel noniterative and predefined constraint-free method based on tensor voting is proposed to detect and remove the highlight components of a single color image. The distribution of diffuse and specular pixels in the original image is determined using tensors' saliency analysis, instead of comparing color information among neighbor pixels. The achieved diffuse reflectance distribution is used to remove specularity components. The proposed method is evaluated quantitatively and qualitatively over a dataset of highly textured, multicolor images. The experimental results show that our result outperforms other state-of-the-art techniques.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL