Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 2722-2740, 2024 May.
Article in English | MEDLINE | ID: mdl-37988208

ABSTRACT

Neural Architecture Search (NAS), aiming at automatically designing neural architectures by machines, has been considered a key step toward automatic machine learning. One notable NAS branch is the weight-sharing NAS, which significantly improves search efficiency and allows NAS algorithms to run on ordinary computers. Despite receiving high expectations, this category of methods suffers from low search effectiveness. By employing a generalization boundedness tool, we demonstrate that the devil behind this drawback is the untrustworthy architecture rating with the oversized search space of the possible architectures. Addressing this problem, we modularize a large search space into blocks with small search spaces and develop a family of models with the distilling neural architecture (DNA) techniques. These proposed models, namely a DNA family, are capable of resolving multiple dilemmas of the weight-sharing NAS, such as scalability, efficiency, and multi-modal compatibility. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using heuristic algorithms. Moreover, under a certain computational complexity constraint, our method can seek architectures with different depths and widths. Extensive experimental evaluations show that our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively. Additionally, we provide in-depth empirical analysis and insights into neural architecture ratings.


Subject(s)
Algorithms , Machine Learning , Plant Extracts , DNA
2.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4430-4446, 2023 Apr.
Article in English | MEDLINE | ID: mdl-35895643

ABSTRACT

Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference. However, their practical runtime usually lags behind the theoretical acceleration due to inefficient sparsity. In this paper, we explore a hardware-efficient dynamic inference regime, named dynamic weight slicing, that can generalized well on multiple dimensions in both CNNs and transformers (e.g. kernel size, embedding dimension, number of heads, etc.). Instead of adaptively selecting important weight elements in a sparse way, we pre-define dense weight slices with different importance level by nested residual learning. During inference, weights are progressively sliced beginning with the most important elements to less important ones to achieve different model capacity for inputs with diverse difficulty levels. Based on this conception, we present DS-CNN++ and DS-ViT++, by carefully designing the double headed dynamic gate and the overall network architecture. We further propose dynamic idle slicing to address the drastic reduction of embedding dimension in DS-ViT++. To ensure sub-network generality and routing fairness, we propose a disentangled two-stage optimization scheme. In Stage I, in-place bootstrapping (IB) and multi-view consistency (MvCo) are proposed to stablize and improve the training of DS-CNN++ and DS-ViT++ supernet, respectively. In Stage II, sandwich gate sparsification (SGS) is proposed to assist the gate training. Extensive experiments on 4 datasets and 3 different network architectures demonstrate our methods consistently outperform the state-of-the-art static and dynamic model compression methods by a large margin (up to 6.6%). Typically, we achieves 2-4× computation reduction and up to 61.5% real-world acceleration on MobileNet, ResNet-50 and Vision Transformer, with minimal accuracy drops on ImageNet. Code release: https://github.com/changlin31/DS-Net.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7686-7695, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36409817

ABSTRACT

Controlling a non-statically bipedal robot is challenging due to the complex dynamics and multi-criterion optimization involved. Recent works have demonstrated the effectiveness of deep reinforcement learning (DRL) for simulation and physical robots. In these methods, the rewards from different criteria are normally summed to learn a scalar function. However, a scalar is less informative and may be insufficient to derive effective information for each reward channel from the complex hybrid rewards. In this work, we propose a novel reward-adaptive reinforcement learning method for biped locomotion, allowing the control policy to be simultaneously optimized by multiple criteria using a dynamic mechanism. The proposed method applies a multi-head critic to learn a separate value function for each reward component, leading to hybrid policy gradients. We further propose dynamic weight, allowing each component to optimize the policy with different priorities. This hybrid and dynamic policy gradient (HDPG) design makes the agent learn more efficiently. We show that the proposed method outperforms summed-up-reward approaches and is able to transfer to physical robots. The MuJoCo results further demonstrate the effectiveness and generalization of HDPG.

4.
IEEE Trans Neural Netw Learn Syst ; 33(10): 5401-5415, 2022 10.
Article in English | MEDLINE | ID: mdl-33872158

ABSTRACT

Current state-of-the-art visual recognition systems usually rely on the following pipeline: 1) pretraining a neural network on a large-scale data set (e.g., ImageNet) and 2) finetuning the network weights on a smaller, task-specific data set. Such a pipeline assumes that the sole weight adaptation is able to transfer the network capability from one domain to another domain based on a strong assumption that a fixed architecture is appropriate for all domains. However, each domain with a distinct recognition target may need different levels/paths of feature hierarchy, where some neurons may become redundant, and some others are reactivated to form new network structures. In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness, compared to the existing image recognition pipeline that only tunes the weights regardless of the architecture. Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks. This further improves the search efficiency of our method. Moreover, we also provide principled and empirical analysis to explain why our approach works by investigating the ineffectiveness of existing neural architecture search. We find that preserving the joint distribution of the network architecture and weights is of importance. This analysis not only benefits image recognition but also provides insights for crafting neural networks. Experiments on five representative image recognition tasks, such as person re-identification, age estimation, gender recognition, image classification, and unsupervised domain adaptation, demonstrate the effectiveness of our method.


Subject(s)
Neural Networks, Computer , Recognition, Psychology , Humans , Neurons
5.
IEEE Trans Neural Netw Learn Syst ; 32(5): 2142-2156, 2021 05.
Article in English | MEDLINE | ID: mdl-32663130

ABSTRACT

Person reidentification (Re-ID) benefits greatly from the accurate annotations of existing data sets (e.g., CUHK03 and Market-1501), which are quite expensive because each image in these data sets has to be assigned with a proper label. In this work, we ease the annotation of Re-ID by replacing the accurate annotation with inaccurate annotation, i.e., we group the images into bags in terms of time and assign a bag-level label for each bag. This greatly reduces the annotation effort and leads to the creation of a large-scale Re-ID benchmark called SYSU- 30k . The new benchmark contains 30k individuals, which is about 20 times larger than CUHK03 (1.3k individuals) and Market-1501 (1.5k individuals), and 30 times larger than ImageNet (1k categories). It sums up to 29606918 images. Learning a Re-ID model with bag-level annotation is called the weakly supervised Re-ID problem. To solve this problem, we introduce a differentiable graphical model to capture the dependencies from all images in a bag and generate a reliable pseudolabel for each person's image. The pseudolabel is further used to supervise the learning of the Re-ID model. Compared with the fully supervised Re-ID models, our method achieves state-of-the-art performance on SYSU- 30k and other data sets. The code, data set, and pretrained model will be available at https://github.com/wanggrun/SYSU-30k.

6.
IEEE Trans Pattern Anal Mach Intell ; 41(3): 596-610, 2019 Mar.
Article in English | MEDLINE | ID: mdl-29993474

ABSTRACT

This paper investigates a fundamental problem of scene understanding: how to parse a scene image into a structured configuration (i.e., a semantic object hierarchy with object interaction relations). We propose a deep architecture consisting of two networks: i) a convolutional neural network (CNN) extracting the image representation for pixel-wise object labeling and ii) a recursive neural network (RsNN) discovering the hierarchical object structure and the inter-object relations. Rather than relying on elaborative annotations (e.g., manually labeled semantic maps and relations), we train our deep model in a weakly-supervised learning manner by leveraging the descriptive sentences of the training images. Specifically, we decompose each sentence into a semantic tree consisting of nouns and verb phrases, and apply these tree structures to discover the configurations of the training images. Once these scene configurations are determined, then the parameters of both the CNN and RsNN are updated accordingly by back propagation. The entire model training is accomplished through an Expectation-Maximization method. Extensive experiments show that our model is capable of producing meaningful scene configurations and achieving more favorable scene labeling results on two benchmarks (i.e., PASCAL VOC 2012 and SYSU-Scenes) compared with other state-of-the-art weakly-supervised deep learning methods. In particular, SYSU-Scenes contains more than 5,000 scene images with their semantic sentence descriptions, which is created by us for advancing research on scene parsing.

7.
IEEE Trans Pattern Anal Mach Intell ; 39(6): 1089-1102, 2017 06.
Article in English | MEDLINE | ID: mdl-27187945

ABSTRACT

Cross-domain visual data matching is one of the fundamental problems in many real-world vision tasks, e.g., matching persons across ID photos and surveillance videos. Conventional approaches to this problem usually involves two steps: i) projecting samples from different domains into a common space, and ii) computing (dis-)similarity in this space based on a certain distance. In this paper, we present a novel pairwise similarity measure that advances existing models by i) expanding traditional linear projections into affine transformations and ii) fusing affine Mahalanobis distance and Cosine similarity by a data-driven combination. Moreover, we unify our similarity measure with feature representation learning via deep convolutional neural networks. Specifically, we incorporate the similarity measure matrix into the deep architecture, enabling an end-to-end way of model optimization. We extensively evaluate our generalized similarity model in several challenging cross-domain matching tasks: person re-identification under different views and face verification over different modalities (i.e., faces from still images and videos, older and younger faces, and sketch and photo portraits). The experimental results demonstrate superior performance of our model over other state-of-the-art methods.

SELECTION OF CITATIONS
SEARCH DETAIL
...