Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
PLoS One ; 19(4): e0298699, 2024.
Article in English | MEDLINE | ID: mdl-38574042

ABSTRACT

Sign language recognition presents significant challenges due to the intricate nature of hand gestures and the necessity to capture fine-grained details. In response to these challenges, a novel approach is proposed-Lightweight Attentive VGG16 with Random Forest (LAVRF) model. LAVRF introduces a refined adaptation of the VGG16 model integrated with attention modules, complemented by a Random Forest classifier. By streamlining the VGG16 architecture, the Lightweight Attentive VGG16 effectively manages complexity while incorporating attention mechanisms that dynamically concentrate on pertinent regions within input images, resulting in enhanced representation learning. Leveraging the Random Forest classifier provides notable benefits, including proficient handling of high-dimensional feature representations, reduction of variance and overfitting concerns, and resilience against noisy and incomplete data. Additionally, the model performance is further optimized through hyperparameter optimization, utilizing the Optuna in conjunction with hill climbing, which efficiently explores the hyperparameter space to discover optimal configurations. The proposed LAVRF model demonstrates outstanding accuracy on three datasets, achieving remarkable results of 99.98%, 99.90%, and 100% on the American Sign Language, American Sign Language with Digits, and NUS Hand Posture datasets, respectively.


Subject(s)
Random Forest , Sign Language , Humans , Pattern Recognition, Automated/methods , Gestures , Upper Extremity
2.
Sensors (Basel) ; 23(22)2023 Nov 10.
Article in English | MEDLINE | ID: mdl-38005472

ABSTRACT

Recent successes in deep learning have inspired researchers to apply deep neural networks to Acoustic Event Classification (AEC). While deep learning methods can train effective AEC models, they are susceptible to overfitting due to the models' high complexity. In this paper, we introduce EnViTSA, an innovative approach that tackles key challenges in AEC. EnViTSA combines an ensemble of Vision Transformers with SpecAugment, a novel data augmentation technique, to significantly enhance AEC performance. Raw acoustic signals are transformed into Log Mel-spectrograms using Short-Time Fourier Transform, resulting in a fixed-size spectrogram representation. To address data scarcity and overfitting issues, we employ SpecAugment to generate additional training samples through time masking and frequency masking. The core of EnViTSA resides in its ensemble of pre-trained Vision Transformers, harnessing the unique strengths of the Vision Transformer architecture. This ensemble approach not only reduces inductive biases but also effectively mitigates overfitting. In this study, we evaluate the EnViTSA method on three benchmark datasets: ESC-10, ESC-50, and UrbanSound8K. The experimental results underscore the efficacy of our approach, achieving impressive accuracy scores of 93.50%, 85.85%, and 83.20% on ESC-10, ESC-50, and UrbanSound8K, respectively. EnViTSA represents a substantial advancement in AEC, demonstrating the potential of Vision Transformers and SpecAugment in the acoustic domain.

3.
Sensors (Basel) ; 23(12)2023 Jun 14.
Article in English | MEDLINE | ID: mdl-37420722

ABSTRACT

Hand gesture recognition (HGR) is a crucial area of research that enhances communication by overcoming language barriers and facilitating human-computer interaction. Although previous works in HGR have employed deep neural networks, they fail to encode the orientation and position of the hand in the image. To address this issue, this paper proposes HGR-ViT, a Vision Transformer (ViT) model with an attention mechanism for hand gesture recognition. Given a hand gesture image, it is first split into fixed size patches. Positional embedding is added to these embeddings to form learnable vectors that capture the positional information of the hand patches. The resulting sequence of vectors are then served as the input to a standard Transformer encoder to obtain the hand gesture representation. A multilayer perceptron head is added to the output of the encoder to classify the hand gesture to the correct class. The proposed HGR-ViT obtains an accuracy of 99.98%, 99.36% and 99.85% for the American Sign Language (ASL) dataset, ASL with Digits dataset, and National University of Singapore (NUS) hand gesture dataset, respectively.


Subject(s)
Gestures , Pattern Recognition, Automated , Humans , Pattern Recognition, Automated/methods , Neural Networks, Computer , Upper Extremity , Sign Language , Hand
4.
Plants (Basel) ; 12(14)2023 Jul 14.
Article in English | MEDLINE | ID: mdl-37514256

ABSTRACT

Plant leaf classification involves identifying and categorizing plant species based on leaf characteristics, such as patterns, shapes, textures, and veins. In recent years, research has been conducted to improve the accuracy of plant classification using machine learning techniques. This involves training models on large datasets of plant images and using them to identify different plant species. However, these models are limited by their reliance on large amounts of training data, which can be difficult to obtain for many plant species. To overcome this challenge, this paper proposes a Plant-CNN-ViT ensemble model that combines the strengths of four pre-trained models: Vision Transformer, ResNet-50, DenseNet-201, and Xception. Vision Transformer utilizes self-attention to capture dependencies and focus on important leaf features. ResNet-50 introduces residual connections, aiding in efficient training and hierarchical feature extraction. DenseNet-201 employs dense connections, facilitating information flow and capturing intricate leaf patterns. Xception uses separable convolutions, reducing the computational cost while capturing fine-grained details in leaf images. The proposed Plant-CNN-ViT was evaluated on four plant leaf datasets and achieved remarkable accuracy of 100.00%, 100.00%, 100.00%, and 99.83% on the Flavia dataset, Folio Leaf dataset, Swedish Leaf dataset, and MalayaKew Leaf dataset, respectively.

5.
Sensors (Basel) ; 23(10)2023 May 11.
Article in English | MEDLINE | ID: mdl-37430587

ABSTRACT

Autonomous vehicles have become a topic of interest in recent times due to the rapid advancement of automobile and computer vision technology. The ability of autonomous vehicles to drive safely and efficiently relies heavily on their ability to accurately recognize traffic signs. This makes traffic sign recognition a critical component of autonomous driving systems. To address this challenge, researchers have been exploring various approaches to traffic sign recognition, including machine learning and deep learning. Despite these efforts, the variability of traffic signs across different geographical regions, complex background scenes, and changes in illumination still poses significant challenges to the development of reliable traffic sign recognition systems. This paper provides a comprehensive overview of the latest advancements in the field of traffic sign recognition, covering various key areas, including preprocessing techniques, feature extraction methods, classification techniques, datasets, and performance evaluation. The paper also delves into the commonly used traffic sign recognition datasets and their associated challenges. Additionally, this paper sheds light on the limitations and future research prospects of traffic sign recognition.

6.
Neural Netw ; 165: 19-30, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37263089

ABSTRACT

Few-shot learning aims to train a model with a limited number of base class samples to classify the novel class samples. However, to attain generalization with a limited number of samples is not a trivial task. This paper proposed a novel few-shot learning approach named Self-supervised Contrastive Learning (SCL) that enriched the model representation with multiple self-supervision objectives. Given the base class samples, the model is trained with the base class loss. Subsequently, contrastive-based self-supervision is introduced to minimize the distance between each training sample with their augmented variants to improve the sample discrimination. To recognize the distant sample, rotation-based self-supervision is proposed to enable the model to learn to recognize the rotation degree of the samples for better sample diversity. The multitask environment is introduced where each training sample is assigned with two class labels: base class label and rotation class label. Complex augmentation is put forth to help the model learn a deeper understanding of the object. The image structure of the training samples are augmented independent of the base class information. The proposed SCL is trained to minimize the base class loss, contrastive distance loss, and rotation class loss simultaneously to learn the generic features and improve the novel class performance. With the multiple self-supervision objectives, the proposed SCL outperforms state-of-the-art few-shot approaches on few-shot image classification benchmark datasets.


Subject(s)
Generalization, Psychological , Learning , Benchmarking , Rotation
7.
Sensors (Basel) ; 23(11)2023 Jun 02.
Article in English | MEDLINE | ID: mdl-37300004

ABSTRACT

Human action recognition is a constantly evolving field that is driven by numerous applications. In recent years, significant progress has been made in this area due to the development of advanced representation learning techniques. Despite this progress, human action recognition still poses significant challenges, particularly due to the unpredictable variations in the visual appearance of an image sequence. To address these challenges, we propose the fine-tuned temporal dense sampling with 1D convolutional neural network (FTDS-1DConvNet). Our method involves the use of temporal segmentation and temporal dense sampling, which help to capture the most important features of a human action video. First, the human action video is partitioned into segments through temporal segmentation. Each segment is then processed through a fine-tuned Inception-ResNet-V2 model, where max pooling is performed along the temporal axis to encode the most significant features as a fixed-length representation. This representation is then fed into a 1DConvNet for further representation learning and classification. The experiments on UCF101 and HMDB51 demonstrate that the proposed FTDS-1DConvNet outperforms the state-of-the-art methods, with a classification accuracy of 88.43% on the UCF101 dataset and 56.23% on the HMDB51 dataset.


Subject(s)
Image Processing, Computer-Assisted , Pattern Recognition, Automated , Humans , Pattern Recognition, Automated/methods , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Human Activities
8.
Sensors (Basel) ; 23(8)2023 Apr 07.
Article in English | MEDLINE | ID: mdl-37112147

ABSTRACT

Gait recognition, the task of identifying an individual based on their unique walking style, can be difficult because walking styles can be influenced by external factors such as clothing, viewing angle, and carrying conditions. To address these challenges, this paper proposes a multi-model gait recognition system that integrates Convolutional Neural Networks (CNNs) and Vision Transformer. The first step in the process is to obtain a gait energy image, which is achieved by applying an averaging technique to a gait cycle. The gait energy image is then fed into three different models, DenseNet-201, VGG-16, and a Vision Transformer. These models are pre-trained and fine-tuned to encode the salient gait features that are specific to an individual's walking style. Each model provides prediction scores for the classes based on the encoded features, and these scores are then summed and averaged to produce the final class label. The performance of this multi-model gait recognition system was evaluated on three datasets, CASIA-B, OU-ISIR dataset D, and OU-ISIR Large Population dataset. The experimental results showed substantial improvement compared to existing methods on all three datasets. The integration of CNNs and ViT allows the system to learn both the pre-defined and distinct features, providing a robust solution for gait recognition even under the influence of covariates.


Subject(s)
Machine Learning , Neural Networks, Computer , Gait , Learning , Models, Biological
9.
Sensors (Basel) ; 22(19)2022 Sep 28.
Article in English | MEDLINE | ID: mdl-36236462

ABSTRACT

Identifying an individual based on their physical/behavioral characteristics is known as biometric recognition. Gait is one of the most reliable biometrics due to its advantages, such as being perceivable at a long distance and difficult to replicate. The existing works mostly leverage Convolutional Neural Networks for gait recognition. The Convolutional Neural Networks perform well in image recognition tasks; however, they lack the attention mechanism to emphasize more on the significant regions of the image. The attention mechanism encodes information in the image patches, which facilitates the model to learn the substantial features in the specific regions. In light of this, this work employs the Vision Transformer (ViT) with an attention mechanism for gait recognition, referred to as Gait-ViT. In the proposed Gait-ViT, the gait energy image is first obtained by averaging the series of images over the gait cycle. The images are then split into patches and transformed into sequences by flattening and patch embedding. Position embedding, along with patch embedding, are applied on the sequence of patches to restore the positional information of the patches. Subsequently, the sequence of vectors is fed to the Transformer encoder to produce the final gait representation. As for the classification, the first element of the sequence is sent to the multi-layer perceptron to predict the class label. The proposed method obtained 99.93% on CASIA-B, 100% on OU-ISIR D and 99.51% on OU-LP, which exhibit the ability of the Vision Transformer model to outperform the state-of-the-art methods.


Subject(s)
Neural Networks, Computer , Pattern Recognition, Automated , Biometry/methods , Gait , Pattern Recognition, Automated/methods
10.
Diagnostics (Basel) ; 12(8)2022 Jul 29.
Article in English | MEDLINE | ID: mdl-36010179

ABSTRACT

The COVID-19 pandemic has caused a devastating impact on the social activity, economy and politics worldwide. Techniques to diagnose COVID-19 cases by examining anomalies in chest X-ray images are urgently needed. Inspired by the success of deep learning in various tasks, this paper evaluates the performance of four deep neural networks in detecting COVID-19 patients from their chest radiographs. The deep neural networks studied include VGG16, MobileNet, ResNet50 and DenseNet201. Preliminary experiments show that all deep neural networks perform promisingly, while DenseNet201 outshines other models. Nevertheless, the sensitivity rates of the models are below expectations, which can be attributed to several factors: limited publicly available COVID-19 images, imbalanced sample size for the COVID-19 class and non-COVID-19 class, overfitting or underfitting of the deep neural networks and that the feature extraction of pre-trained models does not adapt well to the COVID-19 detection task. To address these factors, several enhancements are proposed, including data augmentation, adjusted class weights, early stopping and fine-tuning, to improve the performance. Empirical results on DenseNet201 with these enhancements demonstrate outstanding performance with an accuracy of 0.999%, precision of 0.9899%, sensitivity of 0.98%, specificity of 0.9997% and F1-score of 0.9849% on the COVID-Xray-5k dataset.

11.
Sensors (Basel) ; 22(15)2022 Jul 29.
Article in English | MEDLINE | ID: mdl-35957239

ABSTRACT

Identifying people's identity by using behavioral biometrics has attracted many researchers' attention in the biometrics industry. Gait is a behavioral trait, whereby an individual is identified based on their walking style. Over the years, gait recognition has been performed by using handcrafted approaches. However, due to several covariates' effects, the competence of the approach has been compromised. Deep learning is an emerging algorithm in the biometrics field, which has the capability to tackle the covariates and produce highly accurate results. In this paper, a comprehensive overview of the existing deep learning-based gait recognition approach is presented. In addition, a summary of the performance of the approach on different gait datasets is provided.


Subject(s)
Deep Learning , Algorithms , Biometry , Gait , Humans , Walking
SELECTION OF CITATIONS
SEARCH DETAIL
...