Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Sensors (Basel) ; 23(16)2023 Aug 10.
Article in English | MEDLINE | ID: mdl-37631602

ABSTRACT

Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. The primary challenge lies in achieving real-time recognition while managing temporal dependencies that can impact performance. Existing methods employ 3D convolutional or Transformer-based architectures with hand skeleton estimation, but both have limitations. To address these challenges, a hybrid approach that combines 3D Convolutional Neural Networks (3D-CNNs) and Transformers is proposed. The method involves using a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics of hand gestures. A Transformer network with a self-attention mechanism is then employed to efficiently capture long-range temporal dependencies in the skeleton sequence. Evaluation of the Briareo and Multimodal Hand Gesture datasets resulted in accuracy scores of 95.49% and 97.25%, respectively. Notably, this approach achieves real-time performance using a standard CPU, distinguishing it from methods that require specialized GPUs. The hybrid approach's real-time efficiency and high accuracy demonstrate its superiority over existing state-of-the-art methods. In summary, the hybrid 3D-CNN and Transformer approach effectively addresses real-time recognition challenges and efficient handling of temporal dependencies, outperforming existing methods in both accuracy and speed.


Subject(s)
Electric Power Supplies , Gestures , Automation , Neural Networks, Computer , Skeleton
2.
Sensors (Basel) ; 23(6)2023 Mar 22.
Article in English | MEDLINE | ID: mdl-36992039

ABSTRACT

Along with society's development, transportation has become a key factor in human daily life, increasing the number of vehicles on the streets. Consequently, the task of finding free parking slots in metropolitan areas can be dramatically challenging, increasing the chance of getting involved in an accident and the carbon footprint, and negatively affecting the driver's health. Therefore, technological resources to deal with parking management and real-time monitoring have become key players in this scenario to speed up the parking process in urban areas. This work proposes a new computer-vision-based system that detects vacant parking spaces in challenging situations using color imagery processed by a novel deep-learning algorithm. This is based on a multi-branch output neural network that maximizes the contextual image information to infer the occupancy of every parking space. Every output infers the occupancy of a specific parking slot using all the input image information, unlike existing approaches, which only use a neighborhood around every slot. This allows it to be very robust to changing illumination conditions, different camera perspectives, and mutual occlusions between parked cars. An extensive evaluation has been performed using several public datasets, proving that the proposed system outperforms existing approaches.

3.
PLoS One ; 14(10): e0223320, 2019.
Article in English | MEDLINE | ID: mdl-31581266

ABSTRACT

Visual hand gesture recognition systems are promising technologies for Human Computer Interaction, as they allow a more immersive and intuitive interaction. Most of these systems are based on the analysis of skeleton information, which is in turn inferred from color, depth, or near-infrared imagery. However, the robust extraction of skeleton information from images is only possible for a subset of hand poses, which restricts the range of gestures that can be recognized. In this paper, a real-time hand gesture recognition system based on a near-infrared device is presented, which directly analyzes the infrared imagery to infer static and dynamic gestures, without using skeleton information. Thus, a much wider range of hand gestures can be recognized in comparison with skeleton-based approaches. To validate the proposed system, a new dataset of near-infrared imagery has been created, from which good results that outperform other state-of-the-art strategies have been obtained.


Subject(s)
Gestures , Optical Imaging , Pattern Recognition, Automated , Hand , Humans , Image Processing, Computer-Assisted , Optical Imaging/methods , Pattern Recognition, Automated/methods , Recognition, Psychology
4.
IEEE Trans Image Process ; 27(7): 3288-3299, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29641407

ABSTRACT

There has been a significant increase in the availability of 3D players and displays in the last years. Nonetheless, the amount of 3D content has not experimented an increment of such magnitude. To alleviate this problem, many algorithms for converting images and videos from 2D to 3D have been proposed. Here, we present an automatic learning-based 2D-3D image conversion approach, based on the key hypothesis that color images with similar structure likely present a similar depth structure. The presented algorithm estimates the depth of a color query image using the prior knowledge provided by a repository of color + depth images. The algorithm clusters this database attending to their structural similarity, and then creates a representative of each color-depth image cluster that will be used as prior depth map. The selection of the appropriate prior depth map corresponding to one given color query image is accomplished by comparing the structural similarity in the color domain between the query image and the database. The comparison is based on a K-Nearest Neighbor framework that uses a learning procedure to build an adaptive combination of image feature descriptors. The best correspondences determine the cluster, and in turn the associated prior depth map. Finally, this prior estimation is enhanced through a segmentation-guided filtering that obtains the final depth map estimation. This approach has been tested using two publicly available databases, and compared with several state-of-the-art algorithms in order to prove its efficiency.

5.
Sensors (Basel) ; 14(2): 1961-87, 2014 Jan 24.
Article in English | MEDLINE | ID: mdl-24469352

ABSTRACT

Low-cost systems that can obtain a high-quality foreground segmentation almost independently of the existing illumination conditions for indoor environments are very desirable, especially for security and surveillance applications. In this paper, a novel foreground segmentation algorithm that uses only a Kinect depth sensor is proposed to satisfy the aforementioned system characteristics. This is achieved by combining a mixture of Gaussians-based background subtraction algorithm with a new Bayesian network that robustly predicts the foreground/background regions between consecutive time steps. The Bayesian network explicitly exploits the intrinsic characteristics of the depth data by means of two dynamic models that estimate the spatial and depth evolution of the foreground/background regions. The most remarkable contribution is the depth-based dynamic model that predicts the changes in the foreground depth distribution between consecutive time steps. This is a key difference with regard to visible imagery,where the color/gray distribution of the foreground is typically assumed to be constant.Experiments carried out on two different depth-based databases demonstrate that the proposed combination of algorithms is able to obtain a more accurate segmentation of the foreground/background than other state-of-the art approaches.

SELECTION OF CITATIONS
SEARCH DETAIL
...