Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Data Brief ; 54: 110299, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38524840

ABSTRACT

The dataset includes thermal videos of various hand gestures captured by the FLIR Lepton Thermal Camera. A large dataset is created to accurately classify hand gestures captured from eleven different individuals. The dataset consists of 9 classes corresponding to various hand gestures from different people collected at different time instances with complex backgrounds. This data includes flat/leftward, flat/rightward, flat/contract, spread/ leftward, spread/rightward, spread/contract, V-shape/leftward, V-shape/rightward, and V-shape/contract. There are 110 videos in the dataset for each gesture and a total of 990 videos corresponding to 9 gestures. Each video has data of three different (15/10/5) frame lengths.

2.
Data Brief ; 45: 108659, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36425988

ABSTRACT

The dataset contains RGB and depth version video frames of various hand movements captured with the Intel RealSense Depth Camera D435. The camera has two channels for collecting both RGB and depth frames at the same time. A large dataset is created for accurate classification of hand gestures under complex backgrounds. The dataset is made up of 29718 frames from RGB and depth versions corresponding to various hand gestures from different people collected at different time instances with complex backgrounds. Hand movements corresponding to scroll-right, scroll-left, scroll-up, scroll-down, zoom-in, and zoom-out are included in the data. Each sequence has data of 40 frames, and there is a total of 662 sequences corresponding to each gesture in the dataset. To capture all the variations in the dataset, the hand is oriented in various ways while capturing.

3.
J Acoust Soc Am ; 151(4): 2773, 2022 04.
Article in English | MEDLINE | ID: mdl-35461490

ABSTRACT

Recognizing background information in human speech signals is a task that is extremely useful in a wide range of practical applications, and many articles on background sound classification have been published. It has not, however, been addressed with background embedded in real-world human speech signals. Thus, this work proposes a lightweight deep convolutional neural network (CNN) in conjunction with spectrograms for an efficient background sound classification with practical human speech signals. The proposed model classifies 11 different background sounds such as airplane, airport, babble, car, drone, exhibition, helicopter, restaurant, station, street, and train sounds embedded in human speech signals. The proposed deep CNN model consists of four convolution layers, four max-pooling layers, and one fully connected layer. The model is tested on human speech signals with varying signal-to-noise ratios (SNRs). Based on the results, the proposed deep CNN model utilizing spectrograms achieves an overall background sound classification accuracy of 95.2% using the human speech signals with a wide range of SNRs. It is also observed that the proposed model outperforms the benchmark models in terms of both accuracy and inference time when evaluated on edge computing devices.


Subject(s)
Neural Networks, Computer , Speech , Humans , Sound
4.
Data Brief ; 41: 107977, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35242951

ABSTRACT

The dataset contains low resolution thermal images corresponding to various sign language digits represented by hand and captured using the Omron D6T thermal camera. The resolution of the camera is 32 × 32 pixels. Because of the low resolution of the images captured by this camera, machine learning models for detecting and classifying sign language digits face additional challenges. Furthermore, the sensor's position and quality have a significant impact on the quality of the captured images. In addition, it is affected by external factors such as the temperature of the surface in comparison to the temperature of the hand. The dataset consists of 3200 images corresponding to ten sign digits, 0-9. Thus, each sign language digit consists of 320 images collected from different persons. The hand is oriented in various ways to capture all of the variations in the dataset.

5.
Data Brief ; 42: 108037, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35341036

ABSTRACT

An update to the previously published low resolution thermal imaging dataset is presented in this paper. The new dataset contains high resolution thermal images corresponding to various hand gestures captured using the FLIR Lepton 3.5 thermal camera and Purethermal 2 breakout board. The resolution of the camera is 160 × 120 with calibrated array of 19,200 pixels. The images captured by the thermal camera are light-independent. The dataset consists of 14,400 images with equal share from color and gray scale. The dataset consists of 10 different hand gestures. Each gesture has a total of 24 images from a single person with a total of 30 persons for the whole dataset. The dataset also contains the images captured under different orientations of the hand under different lighting conditions.

SELECTION OF CITATIONS
SEARCH DETAIL
...