Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Neural Netw ; 178: 106406, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38838393

ABSTRACT

Low-light conditions pose significant challenges to vision tasks, such as salient object detection (SOD), due to insufficient photons. Light-insensitive RGB-T SOD models mitigate the above problems to some extent, but they are limited in performance as they only focus on spatial feature fusion while ignoring the frequency discrepancy. To this end, we propose an RGB-T SOD model by mining spatial-frequency cues, called SFMNet, for low-light scenes. Our SFMNet consists of spatial-frequency feature exploration (SFFE) modules and spatial-frequency feature interaction (SFFI) modules. To be specific, the SFFE module aims to separate spatial-frequency features and adaptively extract high and low-frequency features. Moreover, the SFFI module integrates cross-modality and cross-domain information to capture effective feature representations. By deploying both modules in a top-down pathway, our method generates high-quality saliency predictions. Furthermore, we construct the first low-light RGB-T SOD dataset as a benchmark for evaluating performance. Extensive experiments demonstrate that our SFMNet can achieve higher accuracy than the existing models for low-light scenes.

2.
Sci Rep ; 13(1): 17652, 2023 Oct 17.
Article in English | MEDLINE | ID: mdl-37848501

ABSTRACT

Fully convolutional neural network has shown advantages in the salient object detection by using the RGB or RGB-D images. However, there is an object-part dilemma since most fully convolutional neural network inevitably leads to an incomplete segmentation of the salient object. Although the capsule network is capable of recognizing a complete object, it is highly computational demand and time consuming. In this paper, we propose a novel convolutional capsule network based on feature extraction and integration for dealing with the object-part relationship, with less computation demand. First and foremost, RGB features are extracted and integrated by using the VGG backbone and feature extraction module. Then, these features, integrating with depth images by using feature depth module, are upsampled progressively to produce a feature map. In the next step, the feature map is fed into the feature-integrated convolutional capsule network to explore the object-part relationship. The proposed capsule network extracts object-part information by using convolutional capsules with locally-connected routing and predicts the final salient map based on the deconvolutional capsules. Experimental results on four RGB-D benchmark datasets show that our proposed method outperforms 23 state-of-the-art algorithms.

3.
Front Psychiatry ; 11: 458, 2020.
Article in English | MEDLINE | ID: mdl-32528328

ABSTRACT

OBJECTIVE: Although previous studies have shown that screen time (ST), fast foods (FFs) and sugar-sweetened beverages (SSBs) consumption are associated with depressive symptoms in adolescents, research on these associations in Chinese adolescents is scarce. This study aimed to examine the association between ST, FFs, SSBs and depressive symptoms in Chinese adolescents, and explore the mediating effects of FFs and SSBs in the association between ST and depressive symptoms. METHODS: This school-based nationwide survey was carried out among 14,500 students in four provinces of China. The Children's Depression Inventory was used to assess the participants' depressive symptoms. ST, FFs and SSBs consumption was measured by a self-reported questionnaire. The Bayesian multiple mediation model was used to analyze the mediation effect. RESULTS: ST, FFs and SSBs, were more likely to be associated with depressive symptoms, and ORs (95%CI) was 1.075 (1.036-1.116), 1.062 (1.046-1.078) and 1.140 (1.115-1.166), after we adjusted for sociodemographic variables. Additionally, in Bayesian multiple mediation model, direct effect, mediating effect, total effect, the ratio of mediating effect to total effect was 0.125, 0.034, 0.159, and 0.214, respectively. All path coefficients of the three mediation paths are statistically significant (p < 0.05). CONCLUSIONS: Our study demonstrates that ST, FFs and SSBs consumption are associated with depressive symptoms in Chinese adolescents. It is likely that FFs and SSBs partially mediate the association between ST and depressive symptoms by chain-mediating effects.

4.
IEEE Trans Cybern ; 49(10): 3755-3766, 2019 Oct.
Article in English | MEDLINE | ID: mdl-30010606

ABSTRACT

Zero-shot learning (ZSL) is typically achieved by resorting to a class semantic embedding space to transfer the knowledge from the seen classes to unseen ones. Capturing the common semantic characteristics between the visual modality and the class semantic modality (e.g., attributes or word vector) is a key to the success of ZSL. In this paper, we propose a novel encoder-decoder approach, namely latent space encoding (LSE), to connect the semantic relations of different modalities. Instead of requiring a projection function to transfer information across different modalities like most previous work, LSE performs the interactions of different modalities via a feature aware latent space, which is learned in an implicit way. Specifically, different modalities are modeled separately but optimized jointly. For each modality, an encoder-decoder framework is performed to learn a feature aware latent space via jointly maximizing the recoverability of the original space from the latent space and the predictability of the latent space from the original space. To relate different modalities together, their features referring to the same concept are enforced to share the same latent codings. In this way, the common semantic characteristics of different modalities are generalized with the latent representations. Another property of the proposed approach is that it is easily extended to more modalities. Extensive experimental results on four benchmark datasets [animal with attribute, Caltech UCSD birds, aPY, and ImageNet] clearly demonstrate the superiority of the proposed approach on several ZSL tasks, including traditional ZSL, generalized ZSL, and zero-shot retrieval.

5.
Article in English | MEDLINE | ID: mdl-30571635

ABSTRACT

Rapid development of affordable and portable consumer depth cameras facilitates the use of depth information in many computer vision tasks such as intelligent vehicles and 3D reconstruction. However, depth map captured by low-cost depth sensors (e.g., Kinect) usually suffers from low spatial resolution, which limits its potential applications. In this paper, we propose a novel deep network for depth map super-resolution (SR), called DepthSR-Net. The proposed DepthSR-Net automatically infers a high resolution (HR) depth map from its low resolution (LR) version by hierarchical features driven residual learning. Specifically, DepthSR-Net is built on a residual U-Net deep network architecture. Given LR depth map, we first obtain the desired HR by bicubic interpolation upsampling, and then construct an input pyramid to achieve multiple level receptive fields. Next, we extract hierarchical features from the input pyramid, intensity image, and encoder-decoder structure of UNet. Finally, we learn the residual between the interpolated depth map and the corresponding HR one using the rich hierarchical features. The final HR depth map is achieved by adding the learned residual to the interpolated depth map. We conduct an ablation study to demonstrate the effectiveness of each component in the proposed network. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art methods. Additionally, the potential usage of the proposed network in other low-level vision problems is discussed.

6.
IEEE Trans Cybern ; 2018 Jan 30.
Article in English | MEDLINE | ID: mdl-29994569

ABSTRACT

As an important and challenging problem in computer vision, zero-shot learning (ZSL) aims at automatically recognizing the instances from unseen object classes without training data. To address this problem, ZSL is usually carried out in the following two aspects: 1) capturing the domain distribution connections between seen classes data and unseen classes data and 2) modeling the semantic interactions between the image feature space and the label embedding space. Motivated by these observations, we propose a bidirectional mapping-based semantic relationship modeling scheme that seeks for cross-modal knowledge transfer by simultaneously projecting the image features and label embeddings into a common latent space. Namely, we have a bidirectional connection relationship that takes place from the image feature space to the latent space as well as from the label embedding space to the latent space. To deal with the domain shift problem, we further present a transductive learning approach that formulates the class prediction problem in an iterative refining process, where the object classification capacity is progressively reinforced through bootstrapping-based model updating over highly reliable instances. Experimental results on four benchmark datasets (animal with attribute, Caltech-UCSD Bird2011, aPascal-aYahoo, and SUN) demonstrate the effectiveness of the proposed approach against the state-of-the-art approaches.

7.
IEEE Trans Neural Netw Learn Syst ; 29(9): 4116-4127, 2018 09.
Article in English | MEDLINE | ID: mdl-29035229

ABSTRACT

Zero-shot learning (ZSL) endows the computer vision system with the inferential capability to recognize new categories that have never seen before. Two fundamental challenges in it are visual-semantic embedding and domain adaptation in cross-modality learning and unseen class prediction steps, respectively. This paper presents two corresponding methods named Adaptive STructural Embedding (ASTE) and Self-PAced Selective Strategy (SPASS) for both challenges. Specifically, ASTE formulates the visual-semantic interactions in a latent structural support vector machine framework by adaptively adjusting the slack variables to embody different reliablenesses among training instances. To alleviate the domain shift problem in ZSL, SPASS borrows the idea from self-paced learning by iteratively selecting the unseen instances from reliable to less reliable to gradually adapt the knowledge from the seen domain to the unseen domain. Consequently, by combining SPASS and ASTE, we present a self-paced Transductive ASTE (TASTE) method to progressively reinforce the classification capacity. Extensive experiments on three benchmark data sets (i.e., AwA, CUB, and aPY) demonstrate the superiorities of ASTE and TASTE. Furthermore, we also propose a fast training (FT) strategy to improve the efficiency of most existing ZSL methods. The FT strategy is surprisingly simple and general enough, which speeds up the training time of most existing ZSL methods by 4~300 times while holding the previous performance.

8.
Forensic Sci Int ; 233(1-3): 158-66, 2013 Dec 10.
Article in English | MEDLINE | ID: mdl-24314516

ABSTRACT

As powerful image editing tools are widely used, the demand for identifying the authenticity of an image is much increased. Copy-move forgery is one of the tampering techniques which are frequently used. Most existing techniques to expose this forgery need to improve the robustness for common post-processing operations and fail to precisely locate the tampering region especially when there are large similar or flat regions in the image. In this paper, a robust method based on DCT and SVD is proposed to detect this specific artifact. Firstly, the suspicious image is divided into fixed-size overlapping blocks and 2D-DCT is applied to each block, then the DCT coefficients are quantized by a quantization matrix to obtain a more robust representation of each block. Secondly, each quantized block is divided non-overlapping sub-blocks and SVD is applied to each sub-block, then features are extracted to reduce the dimension of each block using its largest singular value. Finally, the feature vectors are lexicographically sorted, and duplicated image blocks will be matched by predefined shift frequency threshold. Experiment results demonstrate that our proposed method can effectively detect multiple copy-move forgery and precisely locate the duplicated regions, even when an image was distorted by Gaussian blurring, AWGN, JPEG compression and their mixed operations.

SELECTION OF CITATIONS
SEARCH DETAIL
...