Pesquisa | Portal Regional da BVS

1.

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.

Wang, Yulin; Yue, Yang; Lu, Rui; Han, Yizeng; Song, Shiji; Huang, Gao.

IEEE Trans Pattern Anal Mach Intell ; PP2024 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-38743547

RESUMO

The superior performance of modern computer vision backbones (e.g., vision Transformers learned on ImageNet-1K/22K) usually comes with a costly training procedure. This study contributes to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using easier-to-harder data. Specifically, we reformulate the training curriculum as a soft-selection function, which uncovers progressively more difficult patterns within each example during training, instead of performing easier-to-harder sample selection. Our work is inspired by an intriguing observation on the learning dynamics of visual backbones: during the earlier stages of training, the model predominantly learns to recognize some 'easier-to-learn' discriminative patterns in the data. These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation. Motivated by these findings, we propose a curriculum where the model always leverages all the training data at every learning stage, yet the exposure to the 'easier-to-learn' patterns of each example is initiated first, with harder patterns gradually introduced as training progresses. To implement this idea in a computationally efficient way, we introduce a cropping operation in the Fourier spectrum of the inputs, enabling the model to learn from only the lower-frequency components. Then we show that exposing the contents of natural images can be readily achieved by modulating the intensity of data augmentation. Finally, we integrate these two aspects and design curriculum learning schedules by proposing tailored searching algorithms. Moreover, we present useful techniques for deploying our approach efficiently in challenging practical scenarios, such as large-scale parallel training, and limited input/output or data pre-processing speed. The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective. As an off-the-shelf approach, it reduces the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer) by [Formula: see text] on ImageNet-1K/22K without sacrificing accuracy. It also demonstrates efficacy in self-supervised learning (e.g., MAE). Code is available at: https://github.com/LeapLabTHU/EfficientTrain.

2.

Latency-aware Unified Dynamic Networks for Efficient Image Recognition.

Han, Yizeng; Liu, Zeyu; Yuan, Zhihang; Pu, Yifan; Wang, Chaofei; Song, Shiji; Huang, Gao.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-38662565

RESUMO

Dynamic computation has emerged as a promising strategy to improve the inference efficiency of deep networks. It allows selective activation of various computing units, such as layers or convolution channels, or adaptive allocation of computation to highly informative spatial regions in image features, thus significantly reducing unnecessary computations conditioned on each input sample. However, the practical efficiency of dynamic models does not always correspond to theoretical outcomes. This discrepancy stems from three key challenges: 1) The absence of a unified formulation for various dynamic inference paradigms, owing to the fragmented research landscape; 2) The undue emphasis on algorithm design while neglecting scheduling strategies, which are critical for optimizing computational performance and resource utilization in CUDA-enabled GPU settings; and 3) The cumbersome process of evaluating practical latency, as most existing libraries are tailored for static operators. To address these issues, we introduce Latency-Aware Unified Dynamic Networks (LAUDNet), a comprehensive framework that amalgamates three cornerstone dynamic paradigms-spatially-adaptive computation, dynamic layer skipping, and dynamic channel skipping-under a unified formulation. To reconcile theoretical and practical efficiency, LAUDNet integrates algorithmic design with scheduling optimization, assisted by a latency predictor that accurately and efficiently gauges the inference latency of dynamic operators. This latency predictor harmonizes considerations of algorithms, scheduling strategies, and hardware attributes. We empirically validate various dynamic paradigms within the LAUDNet framework across a range of vision tasks, including image classification, object detection, and instance segmentation. Our experiments confirm that LAUDNet effectively narrows the gap between theoretical and real-world efficiency. For example, LAUDNet can reduce the practical latency of its static counterpart, ResNet-101, by over 50% on hardware platforms such as V100, RTX3090, and TX2 GPUs. Furthermore, LAUDNet surpasses competing methods in the trade-off between accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet.

3.

Fine-Grained Recognition With Learnable Semantic Data Augmentation.

Pu, Yifan; Han, Yizeng; Wang, Yulin; Feng, Junlan; Deng, Chao; Huang, Gao.

IEEE Trans Image Process ; 33: 3130-3144, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38662557

RESUMO

Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. Source code is available at https://github.com/LeapLabTHU/LearnableISDA.

4.

Dynamic Neural Networks: A Survey.

Han, Yizeng; Huang, Gao; Song, Shiji; Yang, Le; Wang, Honghui; Wang, Yulin.

IEEE Trans Pattern Anal Mach Intell ; 44(11): 7436-7456, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-34613907

RESUMO

Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area by dividing dynamic networks into three main categories: 1) sample-wise dynamic models that process each sample with data-dependent architectures or parameters; 2) spatial-wise dynamic networks that conduct adaptive computation with respect to different spatial locations of image data; and 3) temporal-wise dynamic models that perform adaptive inference along the temporal dimension for sequential data such as videos and texts. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, optimization technique and applications, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions.

Assuntos

Algoritmos , Redes Neurais de Computação

5.

Spatially Adaptive Feature Refinement for Efficient Inference.

Han, Yizeng; Huang, Gao; Song, Shiji; Yang, Le; Zhang, Yitian; Jiang, Haojun.

IEEE Trans Image Process ; 30: 9345-9358, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34752395

RESUMO

Spatial redundancy commonly exists in the learned representations of convolutional neural networks (CNNs), leading to unnecessary computation on high-resolution features. In this paper, we propose a novel Spatially Adaptive feature Refinement (SAR) approach to reduce such superfluous computation. It performs efficient inference by adaptively fusing information from two branches: one conducts standard convolution on input features at a lower spatial resolution, and the other one selectively refines a set of regions at the original resolution. The two branches complement each other in feature learning, and both of them evoke much less computation than standard convolution. SAR is a flexible method that can be conveniently plugged into existing CNNs to establish models with reduced spatial redundancy. Experiments on CIFAR and ImageNet classification, COCO object detection and PASCAL VOC semantic segmentation tasks validate that the proposed SAR can consistently improve the network performance and efficiency. Notably, our results show that SAR only refines less than 40% of the regions in the feature representations of a ResNet for 97% of the samples in the validation set of ImageNet to achieve comparable accuracy with the original model, revealing the high computational redundancy in the spatial dimension of CNNs.

Assuntos

Algoritmos , Redes Neurais de Computação , Semântica

6.

Physical mechanism of order between electric and magnetic dipoles in spoof plasmonic structures.

Wu, Hong-Wei; Han, Yi-Zeng; Chen, Hua-Jun; Zhou, Yu; Li, Xue-Chao; Gao, Juan; Sheng, Zong-Qiang.

Opt Lett ; 42(21): 4521-4524, 2017 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-29088203

RESUMO

It has been recently shown that a solid-textured metal cylinder can support electric and magnetic dipolar resonances simultaneously [Phys. Rev. X4, 021003 (2014)PRXHAE2160-330810.1103/PhysRevX.4.021003] which are almost degenerate in a two-dimensional (2-D) structure and non-degenerate in a three-dimensional (3-D) structure, and with the magnetic dipole appearing at higher frequency. They are described as spoof localized plasmonic modes analogous to localized plasmonic resonances in optical frequencies. Here, we consider a hollow metal cylinder corrugated by periodic cut-through slits. Our results indicate that the magnetic dipole can be separated from the electric dipole in a 2-D structure, and magnetic dipolar resonance appears at lower frequency, rather than electric resonance in both 2-D and 3-D structures. In order to clarify the physical mechanism behind the abnormal phenomenon, we study the influence of the core material on the electric- and magnetic-dipole modes based on theoretical analysis and numerical simulation. It is discovered that there is a threshold of an imaginary part of permittivity for switching the order between electric and magnetic dipoles. These results may provide fundamental understanding and physical insight for spoof plasmonic modes supported in designer structures.

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA