Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26.931
Filtrar
1.
Sensors (Basel) ; 24(12)2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38931629

RESUMO

Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.


Assuntos
Algoritmos , Interface para o Reconhecimento da Fala , Humanos , Fala/fisiologia , Dinâmica não Linear , Reconhecimento Automatizado de Padrão/métodos
2.
Sensors (Basel) ; 24(12)2024 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-38931682

RESUMO

Monitoring activities of daily living (ADLs) plays an important role in measuring and responding to a person's ability to manage their basic physical needs. Effective recognition systems for monitoring ADLs must successfully recognize naturalistic activities that also realistically occur at infrequent intervals. However, existing systems primarily focus on either recognizing more separable, controlled activity types or are trained on balanced datasets where activities occur more frequently. In our work, we investigate the challenges associated with applying machine learning to an imbalanced dataset collected from a fully in-the-wild environment. This analysis shows that the combination of preprocessing techniques to increase recall and postprocessing techniques to increase precision can result in more desirable models for tasks such as ADL monitoring. In a user-independent evaluation using in-the-wild data, these techniques resulted in a model that achieved an event-based F1-score of over 0.9 for brushing teeth, combing hair, walking, and washing hands. This work tackles fundamental challenges in machine learning that will need to be addressed in order for these systems to be deployed and reliably work in the real world.


Assuntos
Atividades Cotidianas , Atividades Humanas , Aprendizado de Máquina , Humanos , Algoritmos , Caminhada/fisiologia , Reconhecimento Automatizado de Padrão/métodos
3.
Sensors (Basel) ; 24(12)2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38931728

RESUMO

There has been a resurgence of applications focused on human activity recognition (HAR) in smart homes, especially in the field of ambient intelligence and assisted-living technologies. However, such applications present numerous significant challenges to any automated analysis system operating in the real world, such as variability, sparsity, and noise in sensor measurements. Although state-of-the-art HAR systems have made considerable strides in addressing some of these challenges, they suffer from a practical limitation: they require successful pre-segmentation of continuous sensor data streams prior to automated recognition, i.e., they assume that an oracle is present during deployment, and that it is capable of identifying time windows of interest across discrete sensor events. To overcome this limitation, we propose a novel graph-guided neural network approach that performs activity recognition by learning explicit co-firing relationships between sensors. We accomplish this by learning a more expressive graph structure representing the sensor network in a smart home in a data-driven manner. Our approach maps discrete input sensor measurements to a feature space through the application of attention mechanisms and hierarchical pooling of node embeddings. We demonstrate the effectiveness of our proposed approach by conducting several experiments on CASAS datasets, showing that the resulting graph-guided neural network outperforms the state-of-the-art method for HAR in smart homes across multiple datasets and by large margins. These results are promising because they push HAR for smart homes closer to real-world applications.


Assuntos
Atividades Humanas , Redes Neurais de Computação , Humanos , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
4.
Sensors (Basel) ; 24(12)2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38931754

RESUMO

Electromyography-based gesture recognition has become a challenging problem in the decoding of fine hand movements. Recent research has focused on improving the accuracy of gesture recognition by increasing the complexity of network models. However, training a complex model necessitates a significant amount of data, thereby escalating both user burden and computational costs. Moreover, owing to the considerable variability of surface electromyography (sEMG) signals across different users, conventional machine learning approaches reliant on a single feature fail to meet the demand for precise gesture recognition tailored to individual users. Therefore, to solve the problems of large computational cost and poor cross-user pattern recognition performance, we propose a feature selection method that combines mutual information, principal component analysis and the Pearson correlation coefficient (MPP). This method can filter out the optimal subset of features that match a specific user while combining with an SVM classifier to accurately and efficiently recognize the user's gesture movements. To validate the effectiveness of the above method, we designed an experiment including five gesture actions. The experimental results show that compared to the classification accuracy obtained using a single feature, we achieved an improvement of about 5% with the optimally selected feature as the input to any of the classifiers. This study provides an effective guarantee for user-specific fine hand movement decoding based on sEMG signals.


Assuntos
Eletromiografia , Antebraço , Gestos , Mãos , Reconhecimento Automatizado de Padrão , Humanos , Eletromiografia/métodos , Mãos/fisiologia , Antebraço/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Masculino , Adulto , Análise de Componente Principal , Feminino , Algoritmos , Movimento/fisiologia , Adulto Jovem , Máquina de Vetores de Suporte , Aprendizado de Máquina
5.
PLoS One ; 19(6): e0303451, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38870195

RESUMO

Infrared target detection is widely used in industrial fields, such as environmental monitoring, automatic driving, etc., and the detection of weak targets is one of the most challenging research topics in this field. Due to the small size of these targets, limited information and less surrounding contextual information, it increases the difficulty of target detection and recognition. To address these issues, this paper proposes YOLO-ISTD, an improved method for infrared small target detection based on the YOLOv5-S framework. Firstly, we propose a feature extraction module called SACSP, which incorporates the Shuffle Attention mechanism and makes certain adjustments to the CSP structure, enhancing the feature extraction capability and improving the performance of the detector. Secondly, we introduce a feature fusion module called NL-SPPF. By introducing an NL-Block, the network is able to capture richer long-range features, better capturing the correlation between background information and targets, thereby enhancing the detection capability for small targets. Lastly, we propose a modified K-means clustering algorithm based on Distance-IoU (DIoU), called K-means_DIOU, to improve the accuracy of clustering and generate anchors suitable for the task. Additionally, modifications are made to the detection heads in YOLOv5-S. The original 8, 16, and 32 times downsampling detection heads are replaced with 4, 8, and 16 times downsampling detection heads, capturing more informative coarse-grained features. This enables better understanding of the overall characteristics and structure of the targets, resulting in improved representation and localization of small targets. Experimental results demonstrate significant achievements of YOLO-ISTD on the NUST-SIRST dataset, with an improvement of 8.568% in mAP@0.5 and 8.618% in mAP@0.95. Compared to the comparative models, the proposed approach effectively addresses issues of missed detections and false alarms in the detection results, leading to substantial improvements in precision, recall, and model convergence speed.


Assuntos
Algoritmos , Raios Infravermelhos , Análise por Conglomerados , Reconhecimento Automatizado de Padrão/métodos
6.
PLoS One ; 19(6): e0300837, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38870208

RESUMO

Recognizing postures in multi-person dance scenarios presents challenges due to mutual body part obstruction and varying distortions across different dance actions. These challenges include differences in proximity and size, demanding precision in capturing fine details to convey action expressiveness. Robustness in recognition becomes crucial in complex real-world environments. To tackle these issues, our study introduces a novel approach, i.e., Multi-Person Dance Tiered Posture Recognition with Cross Progressive Multi-Resolution Representation Integration (CPMRI) and Tiered Posture Recognition (TPR) modules. The CPMRI module seamlessly merges high-level features, rich in semantic information, with low-level features that provide precise spatial details. Leveraging a cross progressive approach, it retains semantic understanding while enhancing spatial precision, bolstering the network's feature representation capabilities. Through innovative feature concatenation techniques, it efficiently blends high-resolution and low-resolution features, forming a comprehensive multi-resolution representation. This approach significantly improves posture recognition robustness, especially in intricate dance postures involving scale variations. The TPR module classifies body key points into core torso joints and extremity joints based on distinct distortion characteristics. Employing a three-tier tiered network, it progressively refines posture recognition. By addressing the optimal matching problem between torso and extremity joints, the module ensures accurate connections, refining the precision of body key point locations. Experimental evaluations against state-of-the-art methods using MSCOCO2017 and a custom Chinese dance dataset validate our approach's effectiveness. Evaluation metrics including Object Keypoint Similarity (OKS)-based Average Precision (AP), mean Average Precision (mAP), and Average Recall (AR) underscore the efficacy of the proposed method.


Assuntos
Dança , Postura , Humanos , Postura/fisiologia , Dança/fisiologia , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
7.
Sensors (Basel) ; 24(11)2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38894423

RESUMO

Gesture recognition using electromyography (EMG) signals has prevailed recently in the field of human-computer interactions for controlling intelligent prosthetics. Currently, machine learning and deep learning are the two most commonly employed methods for classifying hand gestures. Despite traditional machine learning methods already achieving impressive performance, it is still a huge amount of work to carry out feature extraction manually. The existing deep learning methods utilize complex neural network architectures to achieve higher accuracy, which will suffer from overfitting, insufficient adaptability, and low recognition accuracy. To improve the existing phenomenon, a novel lightweight model named dual stream LSTM feature fusion classifier is proposed based on the concatenation of five time-domain features of EMG signals and raw data, which are both processed with one-dimensional convolutional neural networks and LSTM layers to carry out the classification. The proposed method can effectively capture global features of EMG signals using a simple architecture, which means less computational cost. An experiment is conducted on a public DB1 dataset with 52 gestures, and each of the 27 subjects repeats every gesture 10 times. The accuracy rate achieved by the model is 89.66%, which is comparable to that achieved by more complex deep learning neural networks, and the inference time for each gesture is 87.6 ms, which can also be implied in a real-time control system. The proposed model is validated using a subject-wise experiment on 10 out of the 40 subjects in the DB2 dataset, achieving a mean accuracy of 91.74%. This is illustrated by its ability to fuse time-domain features and raw data to extract more effective information from the sEMG signal and select an appropriate, efficient, lightweight network to enhance the recognition results.


Assuntos
Aprendizado Profundo , Eletromiografia , Gestos , Redes Neurais de Computação , Eletromiografia/métodos , Humanos , Processamento de Sinais Assistido por Computador , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Aprendizado de Máquina , Mãos/fisiologia , Memória de Curto Prazo/fisiologia
8.
Artigo em Inglês | MEDLINE | ID: mdl-38833396

RESUMO

The global trend of population aging presents an urgent challenge in ensuring the safety and well-being of elderly individuals, especially those living alone due to various circumstances. A promising approach to this challenge involves leveraging Human Action Recognition (HAR) by integrating data from multiple sensors. However, the field of HAR has struggled to strike a balance between accuracy and response time. While technological advancements have improved recognition accuracy, complex algorithms often come at the expense of response time. To address this issue, we introduce an innovative asynchronous detection method called Rapid Response Elderly Safety Monitoring (RESAM), which relies on progressive hierarchical action recognition and multi-sensor data fusion. Through initial analysis of inertial sensor data using Kernel Principal Component Analysis (KPCA) and multi-class classifiers, we efficiently reduce processing time and lower the false-negative rate (FNR). The inertial sensor identification serves as a pre-filter, enabling the identification of filtered abnormal signals. Decision-level data fusion is then executed, incorporating skeleton image analysis based on ResNet and the inertial sensor data from the initial step. This integration enables the accurate differentiation between normal and abnormal behaviors. The RESAM method achieves an impressive 97.4% accuracy on the UTD-MHAD database with a minimal delay of 1.22 seconds. On our internally collected database, the RESAM system attains an accuracy of 99%, ranking among the most accurate state-of-the-art methods available. These results underscore the practicality and effectiveness of our approach in meeting the critical demand for swift and precise responses in healthcare scenarios.


Assuntos
Algoritmos , Análise de Componente Principal , Humanos , Idoso , Masculino , Feminino , Reconhecimento Automatizado de Padrão/métodos , Segurança , Idoso de 80 Anos ou mais
9.
Sci Rep ; 14(1): 13156, 2024 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849454

RESUMO

This research investigates the recognition of basketball techniques actions through the implementation of three-dimensional (3D) Convolutional Neural Networks (CNNs), aiming to enhance the accurate and automated identification of various actions in basketball games. Initially, basketball action sequences are extracted from publicly available basketball action datasets, followed by data preprocessing, including image sampling, data augmentation, and label processing. Subsequently, a novel action recognition model is proposed, combining 3D convolutions and Long Short-Term Memory (LSTM) networks to model temporal features and capture the spatiotemporal relationships and temporal information of actions. This facilitates the facilitating automatic learning of the spatiotemporal features associated with basketball actions. The model's performance and robustness are further improved through the adoption of optimization algorithms, such as adaptive learning rate adjustment and regularization. The efficacy of the proposed method is verified through experiments conducted on three publicly available basketball action datasets: NTURGB + D, Basketball-Action-Dataset, and B3D Dataset. The results indicate that this approach achieves outstanding performance in basketball technique action recognition tasks across different datasets compared to two common traditional methods. Specifically, when compared to the frame difference-based method, this model exhibits a significant accuracy improvement of 15.1%. When compared to the optical flow-based method, this model demonstrates a substantial accuracy improvement of 12.4%. Moreover, this method showcases strong robustness, accurately recognizing actions under diverse lighting conditions and scenes, achieving an average accuracy of 93.1%. The research demonstrates that the method reported here effectively captures the spatiotemporal relationships of basketball actions, thereby providing reliable technical assessment tools for basketball coaches and players.


Assuntos
Algoritmos , Basquetebol , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos
10.
Opt Express ; 32(10): 16645-16656, 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38858865

RESUMO

Single-Photon Avalanche Diode (SPAD) direct Time-of-Flight (dToF) sensors provide depth imaging over long distances, enabling the detection of objects even in the absence of contrast in colour or texture. However, distant objects are represented by just a few pixels and are subject to noise from solar interference, limiting the applicability of existing computer vision techniques for high-level scene interpretation. We present a new SPAD-based vision system for human activity recognition, based on convolutional and recurrent neural networks, which is trained entirely on synthetic data. In tests using real data from a 64×32 pixel SPAD, captured over a distance of 40 m, the scheme successfully overcomes the limited transverse resolution (in which human limbs are approximately one pixel across), achieving an average accuracy of 89% in distinguishing between seven different activities. The approach analyses continuous streams of video-rate depth data at a maximal rate of 66 FPS when executed on a GPU, making it well-suited for real-time applications such as surveillance or situational awareness in autonomous systems.


Assuntos
Fótons , Humanos , Atividades Humanas , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Desenho de Equipamento
11.
Math Biosci Eng ; 21(4): 5007-5031, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38872524

RESUMO

In demanding application scenarios such as clinical psychotherapy and criminal interrogation, the accurate recognition of micro-expressions is of utmost importance but poses significant challenges. One of the main difficulties lies in effectively capturing weak and fleeting facial features and improving recognition performance. To address this fundamental issue, this paper proposed a novel architecture based on a multi-scale 3D residual convolutional neural network. The algorithm leveraged a deep 3D-ResNet50 as the skeleton model and utilized the micro-expression optical flow feature map as the input for the network model. Drawing upon the complex spatial and temporal features inherent in micro-expressions, the network incorporated multi-scale convolutional modules of varying sizes to integrate both global and local information. Furthermore, an attention mechanism feature fusion module was introduced to enhance the model's contextual awareness. Finally, to optimize the model's prediction of the optimal solution, a discriminative network structure with multiple output channels was constructed. The algorithm's performance was evaluated using the public datasets SMIC, SAMM, and CASME Ⅱ. The experimental results demonstrated that the proposed algorithm achieves recognition accuracies of 74.6, 84.77 and 91.35% on these datasets, respectively. This substantial improvement in efficiency compared to existing mainstream methods for extracting micro-expression subtle features effectively enhanced micro-expression recognition performance and increased the accuracy of high-precision micro-expression recognition. Consequently, this paper served as an important reference for researchers working on high-precision micro-expression recognition.


Assuntos
Algoritmos , Expressão Facial , Redes Neurais de Computação , Humanos , Imageamento Tridimensional/métodos , Face , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão/métodos , Processamento de Imagem Assistida por Computador/métodos
12.
BMC Med Inform Decis Mak ; 24(1): 165, 2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38872146

RESUMO

BACKGROUND: Pattern mining techniques are helpful tools when extracting new knowledge in real practice, but the overwhelming number of patterns is still a limiting factor in the health-care domain. Current efforts concerning the definition of measures of interest for patterns are focused on reducing the number of patterns and quantifying their relevance (utility/usefulness). However, although the temporal dimension plays a key role in medical records, few efforts have been made to extract temporal knowledge about the patient's evolution from multivariate sequential patterns. METHODS: In this paper, we propose a method to extract a new type of patterns in the clinical domain called Jumping Diagnostic Odds Ratio Sequential Patterns (JDORSP). The aim of this method is to employ the odds ratio to identify a concise set of sequential patterns that represent a patient's state with a statistically significant protection factor (i.e., a pattern associated with patients that survive) and those extensions whose evolution suddenly changes the patient's clinical state, thus making the sequential patterns a statistically significant risk factor (i.e., a pattern associated with patients that do not survive), or vice versa. RESULTS: The results of our experiments highlight that our method reduces the number of sequential patterns obtained with state-of-the-art pattern reduction methods by over 95%. Only by achieving this drastic reduction can medical experts carry out a comprehensive clinical evaluation of the patterns that might be considered medical knowledge regarding the temporal evolution of the patients. We have evaluated the surprisingness and relevance of the sequential patterns with clinicians, and the most interesting fact is the high surprisingness of the extensions of the patterns that become a protection factor, that is, the patients that recover after several days of being at high risk of dying. CONCLUSIONS: Our proposed method with which to extract JDORSP generates a set of interpretable multivariate sequential patterns with new knowledge regarding the temporal evolution of the patients. The number of patterns is greatly reduced when compared to those generated by other methods and measures of interest. An additional advantage of this method is that it does not require any parameters or thresholds, and that the reduced number of patterns allows a manual evaluation.


Assuntos
Mineração de Dados , Humanos , Razão de Chances , Mineração de Dados/métodos , Fatores de Tempo , Reconhecimento Automatizado de Padrão , Atenção à Saúde , Registros Eletrônicos de Saúde
13.
Sci Rep ; 14(1): 11744, 2024 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-38778042

RESUMO

Sensorimotor impairments, resulting from conditions like stroke and amputations, can profoundly impact an individual's functional abilities and overall quality of life. Assistive and rehabilitation devices such as prostheses, exo-skeletons, and serious gaming in virtual environments can help to restore some degree of function and alleviate pain after sensorimotor impairments. Myoelectric pattern recognition (MPR) has gained popularity in the past decades as it provides superior control over said devices, and therefore efforts to facilitate and improve performance in MPR can result in better rehabilitation outcomes. One possibility to enhance MPR is to employ transcranial direct current stimulation (tDCS) to facilitate motor learning. Twelve healthy able-bodied individuals participated in this crossover study to determine the effect of tDCS on MPR performance. Baseline training was followed by two sessions of either sham or anodal tDCS using the dominant and non-dominant arms. Assignments were randomized, and the MPR task consisted of 11 different hand/wrist movements, including rest or no movement. Surface electrodes were used to record EMG and the MPR open-source platform, BioPatRec, was used for decoding motor volition in real-time. The motion test was used to evaluate performance. We hypothesized that using anodal tDCS to increase the excitability of the primary motor cortex associated with non-dominant side in able-bodied individuals, will improve motor learning and thus MPR performance. Overall, we found that tDCS enhanced MPR performance, particularly in the non-dominant side. We were able to reject the null hypothesis and improvements in the motion test's completion rate during tDCS (28% change, p-value: 0.023) indicate its potential as an adjunctive tool to enhance MPR and motor learning. tDCS appears promising as a tool to enhance the learning phase of using assistive devices using MPR, such as myoelectric prostheses.


Assuntos
Eletromiografia , Estimulação Transcraniana por Corrente Contínua , Humanos , Estimulação Transcraniana por Corrente Contínua/métodos , Masculino , Feminino , Adulto , Eletromiografia/métodos , Adulto Jovem , Estudos Cross-Over , Córtex Motor/fisiologia , Reconhecimento Automatizado de Padrão/métodos
14.
Neural Netw ; 176: 106348, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38735099

RESUMO

Binary matrix factorization is an important tool for dimension reduction for high-dimensional datasets with binary attributes and has been successfully applied in numerous areas. This paper presents a collaborative neurodynamic optimization approach to binary matrix factorization based on the original combinatorial optimization problem formulation and quadratic unconstrained binary optimization problem reformulations. The proposed approach employs multiple discrete Hopfield networks operating concurrently in search of local optima. In addition, a particle swarm optimization rule is used to reinitialize neuronal states iteratively to escape from local minima toward better ones. Experimental results on eight benchmark datasets are elaborated to demonstrate the superior performance of the proposed approach against six baseline algorithms in terms of factorization error. Additionally, the viability of the proposed approach is demonstrated for pattern discovery on three datasets.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Reconhecimento Automatizado de Padrão/métodos , Neurônios/fisiologia
15.
Sensors (Basel) ; 24(9)2024 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-38732808

RESUMO

Currently, surface EMG signals have a wide range of applications in human-computer interaction systems. However, selecting features for gesture recognition models based on traditional machine learning can be challenging and may not yield satisfactory results. Considering the strong nonlinear generalization ability of neural networks, this paper proposes a two-stream residual network model with an attention mechanism for gesture recognition. One branch processes surface EMG signals, while the other processes hand acceleration signals. Segmented networks are utilized to fully extract the physiological and kinematic features of the hand. To enhance the model's capacity to learn crucial information, we introduce an attention mechanism after global average pooling. This mechanism strengthens relevant features and weakens irrelevant ones. Finally, the deep features obtained from the two branches of learning are fused to further improve the accuracy of multi-gesture recognition. The experiments conducted on the NinaPro DB2 public dataset resulted in a recognition accuracy of 88.25% for 49 gestures. This demonstrates that our network model can effectively capture gesture features, enhancing accuracy and robustness across various gestures. This approach to multi-source information fusion is expected to provide more accurate and real-time commands for exoskeleton robots and myoelectric prosthetic control systems, thereby enhancing the user experience and the naturalness of robot operation.


Assuntos
Eletromiografia , Gestos , Redes Neurais de Computação , Humanos , Eletromiografia/métodos , Processamento de Sinais Assistido por Computador , Reconhecimento Automatizado de Padrão/métodos , Aceleração , Algoritmos , Mãos/fisiologia , Aprendizado de Máquina , Fenômenos Biomecânicos/fisiologia
16.
Sensors (Basel) ; 24(9)2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38732843

RESUMO

As the number of electronic gadgets in our daily lives is increasing and most of them require some kind of human interaction, this demands innovative, convenient input methods. There are limitations to state-of-the-art (SotA) ultrasound-based hand gesture recognition (HGR) systems in terms of robustness and accuracy. This research presents a novel machine learning (ML)-based end-to-end solution for hand gesture recognition with low-cost micro-electromechanical (MEMS) system ultrasonic transducers. In contrast to prior methods, our ML model processes the raw echo samples directly instead of using pre-processed data. Consequently, the processing flow presented in this work leaves it to the ML model to extract the important information from the echo data. The success of this approach is demonstrated as follows. Four MEMS ultrasonic transducers are placed in three different geometrical arrangements. For each arrangement, different types of ML models are optimized and benchmarked on datasets acquired with the presented custom hardware (HW): convolutional neural networks (CNNs), gated recurrent units (GRUs), long short-term memory (LSTM), vision transformer (ViT), and cross-attention multi-scale vision transformer (CrossViT). The three last-mentioned ML models reached more than 88% accuracy. The most important innovation described in this research paper is that we were able to demonstrate that little pre-processing is necessary to obtain high accuracy in ultrasonic HGR for several arrangements of cost-effective and low-power MEMS ultrasonic transducer arrays. Even the computationally intensive Fourier transform can be omitted. The presented approach is further compared to HGR systems using other sensor types such as vision, WiFi, radar, and state-of-the-art ultrasound-based HGR systems. Direct processing of the sensor signals by a compact model makes ultrasonic hand gesture recognition a true low-cost and power-efficient input method.


Assuntos
Gestos , Mãos , Aprendizado de Máquina , Redes Neurais de Computação , Humanos , Mãos/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Ultrassonografia/métodos , Ultrassonografia/instrumentação , Ultrassom/instrumentação , Algoritmos
17.
Sensors (Basel) ; 24(9)2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38732846

RESUMO

Brain-computer interfaces (BCIs) allow information to be transmitted directly from the human brain to a computer, enhancing the ability of human brain activity to interact with the environment. In particular, BCI-based control systems are highly desirable because they can control equipment used by people with disabilities, such as wheelchairs and prosthetic legs. BCIs make use of electroencephalograms (EEGs) to decode the human brain's status. This paper presents an EEG-based facial gesture recognition method based on a self-organizing map (SOM). The proposed facial gesture recognition uses α, ß, and θ power bands of the EEG signals as the features of the gesture. The SOM-Hebb classifier is utilized to classify the feature vectors. We utilized the proposed method to develop an online facial gesture recognition system. The facial gestures were defined by combining facial movements that are easy to detect in EEG signals. The recognition accuracy of the system was examined through experiments. The recognition accuracy of the system ranged from 76.90% to 97.57% depending on the number of gestures recognized. The lowest accuracy (76.90%) occurred when recognizing seven gestures, though this is still quite accurate when compared to other EEG-based recognition systems. The implemented online recognition system was developed using MATLAB, and the system took 5.7 s to complete the recognition flow.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Gestos , Humanos , Eletroencefalografia/métodos , Face/fisiologia , Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Encéfalo/fisiologia , Masculino
18.
Sensors (Basel) ; 24(9)2024 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-38733038

RESUMO

With the continuous advancement of autonomous driving and monitoring technologies, there is increasing attention on non-intrusive target monitoring and recognition. This paper proposes an ArcFace SE-attention model-agnostic meta-learning approach (AS-MAML) by integrating attention mechanisms into residual networks for pedestrian gait recognition using frequency-modulated continuous-wave (FMCW) millimeter-wave radar through meta-learning. We enhance the feature extraction capability of the base network using channel attention mechanisms and integrate the additive angular margin loss function (ArcFace loss) into the inner loop of MAML to constrain inner loop optimization and improve radar discrimination. Then, this network is used to classify small-sample micro-Doppler images obtained from millimeter-wave radar as the data source for pose recognition. Experimental tests were conducted on pose estimation and image classification tasks. The results demonstrate significant detection and recognition performance, with an accuracy of 94.5%, accompanied by a 95% confidence interval. Additionally, on the open-source dataset DIAT-µRadHAR, which is specially processed to increase classification difficulty, the network achieves a classification accuracy of 85.9%.


Assuntos
Pedestres , Radar , Humanos , Algoritmos , Marcha/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Aprendizado de Máquina
19.
J Neural Eng ; 21(3)2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38757187

RESUMO

Objective.Aiming for the research on the brain-computer interface (BCI), it is crucial to design a MI-EEG recognition model, possessing a high classification accuracy and strong generalization ability, and not relying on a large number of labeled training samples.Approach.In this paper, we propose a self-supervised MI-EEG recognition method based on self-supervised learning with one-dimensional multi-task convolutional neural networks and long short-term memory (1-D MTCNN-LSTM). The model is divided into two stages: signal transform identification stage and pattern recognition stage. In the signal transform recognition phase, the signal transform dataset is recognized by the upstream 1-D MTCNN-LSTM network model. Subsequently, the backbone network from the signal transform identification phase is transferred to the pattern recognition phase. Then, it is fine-tuned using a trace amount of labeled data to finally obtain the motion recognition model.Main results.The upstream stage of this study achieves more than 95% recognition accuracy for EEG signal transforms, up to 100%. For MI-EEG pattern recognition, the model obtained recognition accuracies of 82.04% and 87.14% with F1 scores of 0.7856 and 0.839 on the datasets of BCIC-IV-2b and BCIC-IV-2a.Significance.The improved accuracy proves the superiority of the proposed method. It is prospected to be a method for accurate classification of MI-EEG in the BCI system.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Imaginação , Redes Neurais de Computação , Eletroencefalografia/métodos , Humanos , Imaginação/fisiologia , Aprendizado de Máquina Supervisionado , Reconhecimento Automatizado de Padrão/métodos
20.
Sci Rep ; 14(1): 10560, 2024 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720020

RESUMO

The research on video analytics especially in the area of human behavior recognition has become increasingly popular recently. It is widely applied in virtual reality, video surveillance, and video retrieval. With the advancement of deep learning algorithms and computer hardware, the conventional two-dimensional convolution technique for training video models has been replaced by three-dimensional convolution, which enables the extraction of spatio-temporal features. Specifically, the use of 3D convolution in human behavior recognition has been the subject of growing interest. However, the increased dimensionality has led to challenges such as the dramatic increase in the number of parameters, increased time complexity, and a strong dependence on GPUs for effective spatio-temporal feature extraction. The training speed can be considerably slow without the support of powerful GPU hardware. To address these issues, this study proposes an Adaptive Time Compression (ATC) module. Functioning as an independent component, ATC can be seamlessly integrated into existing architectures and achieves data compression by eliminating redundant frames within video data. The ATC module effectively reduces GPU computing load and time complexity with negligible loss of accuracy, thereby facilitating real-time human behavior recognition.


Assuntos
Algoritmos , Compressão de Dados , Gravação em Vídeo , Humanos , Compressão de Dados/métodos , Atividades Humanas , Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...