Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3910-3922, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38241113

RESUMO

Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. However, modeling global correlations with multi-head self-attention (MSA) layers leads to two widely recognized issues: the massive computational resource consumption and the lack of intrinsic inductive bias for modeling local visual patterns. To solve both issues, we devise a simple yet effective method named Single-Path Vision Transformer pruning (SPViT), to efficiently and automatically compress the pre-trained ViTs into compact models with proper locality added. Specifically, we first propose a novel weight-sharing scheme between MSA and convolutional operations, delivering a single-path space to encode all candidate operations. In this way, we cast the operation search problem as finding which subset of parameters to use in each MSA layer, which significantly reduces the computational cost and optimization difficulty, and the convolution kernels can be well initialized using pre-trained MSA parameters. Relying on the single-path space, we introduce learnable binary gates to encode the operation choices in MSA layers. Similarly, we further employ learnable gates to encode the fine-grained MLP expansion ratios of FFN layers. In this way, our SPViT optimizes the learnable gates to automatically explore from a vast and unified search space and flexibly adjust the MSA-FFN pruning proportions for each individual dense model. We conduct extensive experiments on two representative ViTs showing that our SPViT achieves a new SOTA for pruning on ImageNet-1 k. For example, our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14481-14496, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37535486

RESUMO

Previous human parsing methods are limited to parsing humans into pre-defined classes, which is inflexible for practical fashion applications that often have new fashion item classes. In this paper, we define a novel one-shot human parsing (OSHP) task that requires parsing humans into an open set of classes defined by any test example. During training, only base classes are exposed, which only overlap with part of the test-time classes. To address three main challenges in OSHP, i.e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net). Firstly, an end-to-end human parsing framework is proposed to parse the query image into both coarse-grained and fine-grained human classes, which embeds rich semantic information that is shared across different granularities to identify the small-sized human classes. Then, we gradually smooth the training-time static prototypes to get robust class representations. Moreover, we employ a dynamic objective to encourage the network enhancing features' representational capability in the early training phase while improving features' transferability in the late training phase. Therefore, our method can quickly adapt to the novel classes and mitigate the testing bias issue. In addition, we add a contrastive loss at the prototype level to enforce inter-class distances, thereby discriminating the similar parts. For comprehensive evaluations on the new task, we tailor three existing popular human parsing benchmarks to the OSHP task. Experiments demonstrate that EOP-Net outperforms representative one-shot segmentation models by large margins and serves as a strong baseline for further research.


Assuntos
Algoritmos , Benchmarking , Humanos , Semântica
3.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12459-12473, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37167046

RESUMO

Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct quantization based on the pruned model. However, this strategy may ignore that the pruning and quantization would affect each other and thus performing them separately may lead to sub-optimal performance. To address this, performing pruning and quantization jointly is essential. Nevertheless, how to make a trade-off between pruning and quantization is non-trivial. Moreover, existing compression methods often rely on some pre-defined compression configurations (i.e., pruning rates or bitwidths). Some attempts have been made to search for optimal configurations, which however may take unbearable optimization cost. To address these issues, we devise a simple yet effective method named Single-path Bit Sharing (SBS) for automatic loss-aware model compression. To this end, we consider the network pruning as a special case of quantization and provide a unified view for model pruning and quantization. We then introduce a single-path model to encode all candidate compression configurations, where a high bitwidth value will be decomposed into the sum of a lowest bitwidth value and a series of re-assignment offsets. Relying on the single-path model, we introduce learnable binary gates to encode the choice of configurations and learn the binary gates and model parameters jointly. More importantly, the configuration search problem can be transformed into a subset selection problem, which helps to significantly reduce the optimization difficulty and computation cost. In this way, the compression configurations of each layer and the trade-off between pruning and quantization can be automatically determined. Extensive experiments on CIFAR-100 and ImageNet show that SBS significantly reduces computation cost while achieving promising performance. For example, our SBS compressed MobileNetV2 achieves 22.6× Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4035-4051, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-33755553

RESUMO

We study network pruning which aims to remove redundant channels/kernels and hence speed up the inference of deep networks. Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, while the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power. To this end, we first introduce additional discrimination-aware losses into the network to increase the discriminative power of the intermediate layers. Next, we select the most discriminative channels for each layer by considering the discrimination-aware loss and the reconstruction error, simultaneously. We then formulate channel pruning as a sparsity-inducing optimization problem with a convex objective and propose a greedy algorithm to solve the resultant problem. Note that a channel (3D tensor) often consists of a set of kernels (each with a 2D matrix). Besides the redundancy in channels, some kernels in a channel may also be redundant and fail to contribute to the discriminative power of the network, resulting in kernel level redundancy. To solve this issue, we propose a discrimination-aware kernel pruning (DKP) method to further compress deep networks by removing redundant kernels. To avoid manually determining the pruning rate for each layer, we propose two adaptive stopping conditions to automatically determine the number of selected channels/kernels. The proposed adaptive stopping conditions tend to yield more efficient models with better performance in practice. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of our methods. For example, on ILSVRC-12, the resultant ResNet-50 model with 30 percent reduction of channels even outperforms the baseline model by 0.36 percent in terms of Top-1 accuracy. We also deploy the pruned models on a smartphone (equipped with a Qualcomm Snapdragon 845 processor). The pruned MobileNetV1 and MobileNetV2 achieve 1.93× and 1.42× inference acceleration on the mobile device, respectively, with negligible performance degradation. The source code and the pre-trained models are available at https://github.com/SCUT-AILab/DCP.


Assuntos
Algoritmos , Compressão de Dados , Compressão de Dados/métodos , Pressão
5.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6140-6152, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34125669

RESUMO

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a network with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bitwidth from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods.


Assuntos
Algoritmos , Redes Neurais de Computação
6.
Plant Methods ; 13: 79, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29118821

RESUMO

BACKGROUND: Accurately counting maize tassels is important for monitoring the growth status of maize plants. This tedious task, however, is still mainly done by manual efforts. In the context of modern plant phenotyping, automating this task is required to meet the need of large-scale analysis of genotype and phenotype. In recent years, computer vision technologies have experienced a significant breakthrough due to the emergence of large-scale datasets and increased computational resources. Naturally image-based approaches have also received much attention in plant-related studies. Yet a fact is that most image-based systems for plant phenotyping are deployed under controlled laboratory environment. When transferring the application scenario to unconstrained in-field conditions, intrinsic and extrinsic variations in the wild pose great challenges for accurate counting of maize tassels, which goes beyond the ability of conventional image processing techniques. This calls for further robust computer vision approaches to address in-field variations. RESULTS: This paper studies the in-field counting problem of maize tassels. To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment. With 361 field images collected in four experimental fields across China between 2010 and 2015 and corresponding manually-labelled dotted annotations, a novel Maize Tassels Counting (MTC) dataset is created and will be released with this paper. To alleviate the in-field challenges, a deep convolutional neural network-based approach termed TasselNet is proposed. TasselNet can achieve good adaptability to in-field variations via modelling the local visual characteristics of field images and regressing the local counts of maize tassels. Extensive results on the MTC dataset demonstrate that TasselNet outperforms other state-of-the-art approaches by large margins and achieves the overall best counting performance, with a mean absolute error of 6.6 and a mean squared error of 9.6 averaged over 8 test sequences. CONCLUSIONS: TasselNet can achieve robust in-field counting of maize tassels with a relatively high degree of accuracy. Our experimental evaluations also suggest several good practices for practitioners working on maize-tassel-like counting problems. It is worth noting that, though the counting errors have been greatly reduced by TasselNet, in-field counting of maize tassels remains an open and unsolved problem.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...