Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 31: 2683-2694, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35320102

RESUMO

Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.


Assuntos
Redes Neurais de Computação
2.
Int J Bioprint ; 8(1): 406, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35187272

RESUMO

Current research of designing prosthetic robotic hands mainly focuses on improving their functionality by devising new mechanical structures and actuation systems. Most of existing work relies on a single structure/system (e.g., bone-only or tissue-only) and ignores the fact that the human hand is composed of multiple functional structures (e.g., skin, bones, muscles, and tendons). This may increase the difficulty of the design process and lower the flexibility of the fabricated hand. To tackle this problem, this paper proposes a three-dimensional (3D) printable multi-layer design that models the hand with the layers of skin, tissues, and bones. The proposed design first obtains the 3D surface model of a target hand via 3D scanning, and then generates the 3D bone models from the surface model based on a fast template matching method. To overcome the disadvantage of the rigid bone layer in deformation, the tissue layer is introduced and represented by a concentric tube-based structure, of which the deformability can be explicitly controlled by a parameter. The experimental results show that the proposed design outperforms previous designs remarkably. With the proposed design, prosthetic robotic hands can be produced quickly with low cost and be customizable and deformable.

3.
IEEE Trans Image Process ; 30: 7926-7937, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34534079

RESUMO

Recent methods including CoViAR and DMC-Net provide a new paradigm for action recognition since they are directly targeted at compressed videos (e.g., MPEG4 files). It avoids the cumbersome decoding procedure of traditional methods, and leverages the pre-encoded motion vectors and residuals in compressed videos to complete recognition efficiently. However, motion vectors and residuals are noisy, sparse and highly correlated information, which cannot be effectively exploited by plain and separated networks. To tackle these issues, we propose a joint feature optimization and fusion framework that better utilizes motion vectors and residuals in the following three aspects. (i) We model the feature optimization problem as a reconstruction process that represents features by a set of bases, and propose a joint feature optimization module that extracts bases in the both modalities. (ii) A low-rank non-local attention module, which combines the non-local operation with the low-rank constraint, is proposed to tackle the noise and sparsity problem during the feature reconstruction process. (iii) A lightweight feature fusion module and a self-adaptive knowledge distillation method are introduced, which use motion vectors and residuals to generate predictions similar to those from networks with optical flows. With these proposed components embedded in a baseline network, the proposed network not only achieves the state-of-the-art performance on HMDB-51 and UCF-101, but also maintains its advantage in computational complexity.

4.
Vis Comput ; 37(12): 3019-3038, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34345091

RESUMO

Social Assistive Robotics is increasingly being used in care settings to provide psychosocial support and interventions for the elderly with cognitive impairments. Most of these social robots have provided timely stimuli to the elderly at home and in care centres, including keeping them active and boosting their mood. However, previous investigations have registered shortcomings in these robots, particularly in their ability to satisfy an essential human need: the need for companionship. Reports show that the elderly tend to lose interests in these social robots after the initial excitement as the novelty wears out and the monotonous familiarity becomes all too familiar. This paper presents our research facilitating conversations between a social humanoid robot, Nadine, and cognitively impaired elderly at a nursing home. We analysed the effectiveness of human-humanoid interactions between our robot and 14 elderly over 29 sessions. We used both objective tools (based on computer vision methods) and subjective tools (based on observational scales) to evaluate the recorded videos. Our findings showed that our subjects engaged positively with Nadine, suggesting that their interaction with the robot could improve their well-being by compensating for some of their emotional, cognitive, and psychosocial deficiencies. We detected emotions associated with cognitively impaired elderly during these interactions. This study could help understand the expectations of the elderly and the current limitations of Social Assistive Robots. Our research is aligned with all the ethical recommendations by the NTU Institutional Review Board.

5.
IEEE Trans Pattern Anal Mach Intell ; 43(11): 3739-3753, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-32396073

RESUMO

Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to the substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. Different from the existing learning-based monocular RGB-input approaches that require accurate 3D annotations for training, we propose to leverage the depth images that can be easily obtained from commodity RGB-D cameras during training, while during testing we take only RGB inputs for 3D joint predictions. In this way, we alleviate the burden of the costly 3D annotations in real-world dataset. Particularly, we propose a weakly-supervised method, adaptating from fully-annotated synthetic dataset to weakly-labeled real-world single RGB dataset with the aid of a depth regularizer, which serves as weak supervision for 3D pose prediction. To further exploit the physical structure of 3D hand pose, we present a novel CVAE-based statistical framework to embed the pose-specific subspace from RGB images, which can then be used to infer the 3D hand joint locations. Extensive experiments on benchmark datasets validate that our proposed approach outperforms baselines and state-of-the-art methods, which proves the effectiveness of the proposed depth regularizer and the CVAE-based framework.

6.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 225-228, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31945883

RESUMO

Schizophrenia and depression are the two most common mental disorders associated with negative symptoms that contribute to poor functioning and quality of life for millions of patients globally. This study is part of a larger research project. The overall aim of the project is to develop an automated objective pipeline that aids clinical diagnosis and provides more insights into symptoms of mental illnesses. In our previous work, we have analyzed non-verbal cues and linguistic cues of individuals with schizophrenia. In this study, we extend our work to include participants with depression. Powered by natural language processing techniques, we extract verbal features, both dictionary-based and vector-based, from participants' interviews that were automatically transcribed. We also extracted conversational, phonatory, articulatory and prosodic features from the interviews to understand the conversational and acoustic characteristics of schizophrenia and depression. Combining these features, we applied ensemble learning with leave-one-out cross-validation to classify healthy controls, schizophrenic and depressive patients, achieving an accuracy of 69%-75% in paired classification. From those same features, we also predict the subjective Negative Symptoms Assessment 16 scores of patients with schizophrenia or depression, yielding an accuracy of 90.5% for NSA2 but lower accuracy for other NSA indices. Our analysis also revealed significant linguistic and non-verbal differences that are potentially symptomatic of schizophrenia and depression respectively.


Assuntos
Esquizofrenia , Psicologia do Esquizofrênico , Fala , Depressão , Humanos , Qualidade de Vida
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...