Pesquisa | Portal Regional da BVS (teste)

A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation.

Sui, Xuefu; Lv, Qunbo; Zhi, Liangjie; Zhu, Baoyu; Yang, Yuanbo; Zhang, Yu; Tan, Zheng.

Sensors (Basel) ; 23(2)2023 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-36679624

RESUMO

To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve "lossless" CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.

Assuntos

Compressão de Dados , Redes Neurais de Computação , Algoritmos , Computadores

Adaptive Global Power-of-Two Ternary Quantization Algorithm Based on Unfixed Boundary Thresholds.

Sui, Xuefu; Lv, Qunbo; Ke, Changjun; Li, Mingshan; Zhuang, Mingjin; Yu, Haiyang; Tan, Zheng.

Sensors (Basel) ; 24(1)2023 Dec 28.

Artigo em Inglês | MEDLINE | ID: mdl-38203043

RESUMO

In the field of edge computing, quantizing convolutional neural networks (CNNs) using extremely low bit widths can significantly alleviate the associated storage and computational burdens in embedded hardware, thereby improving computational efficiency. However, such quantization also presents a challenge related to substantial decreases in detection accuracy. This paper proposes an innovative method, called Adaptive Global Power-of-Two Ternary Quantization Based on Unfixed Boundary Thresholds (APTQ). APTQ achieves adaptive quantization by quantizing each filter into two binary subfilters represented as power-of-two values, thereby addressing the accuracy degradation caused by a lack of expression ability of low-bit-width weight values and the contradiction between fixed quantization boundaries and the uneven actual weight distribution. It effectively reduces the accuracy loss while at the same time presenting strong hardware-friendly characteristics because of the power-of-two quantization. This paper extends the APTQ algorithm to propose the APQ quantization algorithm, which can adapt to arbitrary quantization bit widths. Furthermore, this paper designs dedicated edge deployment convolutional computation modules for the obtained quantized models. Through quantization comparison experiments with multiple commonly used CNN models utilized on the CIFAR10, CIFAR100, and Mini-ImageNet data sets, it is verified that the APTQ and APQ algorithms possess better accuracy performance than most state-of-the-art quantization algorithms and can achieve results with very low accuracy loss in certain CNNs (e.g., the accuracy loss of the APTQ ternary ResNet-56 model on CIFAR10 is 0.13%). The dedicated convolutional computation modules enable the corresponding quantized models to occupy fewer on-chip hardware resources in edge chips, thereby effectively improving computational efficiency. This adaptive CNN quantization method, combined with the power-of-two quantization results, strikes a balance between the quantization accuracy performance and deployment efficiency in embedded hardware. As such, valuable insights for the industrial edge computing domain can be gained.

Blind Deblurring of Remote-Sensing Single Images Based on Feature Alignment.

Zhu, Baoyu; Lv, Qunbo; Yang, Yuanbo; Sui, Xuefu; Zhang, Yu; Tang, Yinhui; Tan, Zheng.

Sensors (Basel) ; 22(20)2022 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-36298241

RESUMO

Motion blur recovery is a common method in the field of remote sensing image processing that can effectively improve the accuracy of detection and recognition. Among the existing motion blur recovery methods, the algorithms based on deep learning do not rely on a priori knowledge and, thus, have better generalizability. However, the existing deep learning algorithms usually suffer from feature misalignment, resulting in a high probability of missing details or errors in the recovered images. This paper proposes an end-to-end generative adversarial network (SDD-GAN) for single-image motion deblurring to address this problem and to optimize the recovery of blurred remote sensing images. Firstly, this paper applies a feature alignment module (FAFM) in the generator to learn the offset between feature maps to adjust the position of each sample in the convolution kernel and to align the feature maps according to the context; secondly, a feature importance selection module is introduced in the generator to adaptively filter the feature maps in the spatial and channel domains, preserving reliable details in the feature maps and improving the performance of the algorithm. In addition, this paper constructs a self-constructed remote sensing dataset (RSDATA) based on the mechanism of image blurring caused by the high-speed orbital motion of satellites. Comparative experiments are conducted on self-built remote sensing datasets and public datasets as well as on real remote sensing blurred images taken by an in-orbit satellite (CX-6(02)). The results show that the algorithm in this paper outperforms the comparison algorithm in terms of both quantitative evaluation and visual effects.

Assuntos

Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos , Movimento (Física)

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation.

Sui, Xuefu; Lv, Qunbo; Bai, Yang; Zhu, Baoyu; Zhi, Liangjie; Yang, Yuanbo; Tan, Zheng.

Sensors (Basel) ; 22(17)2022 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-36081072

RESUMO

To address the problems of convolutional neural networks (CNNs) consuming more hardware resources (such as DSPs and RAMs on FPGAs) and their accuracy, efficiency, and resources being difficult to balance, meaning they cannot meet the requirements of industrial applications, we proposed an innovative low-bit power-of-two quantization method: the global sign-based network quantization (GSNQ). This method involves designing different quantization ranges according to the sign of the weights, which can provide a larger quantization-value range. Combined with the fine-grained and multi-scale global retraining method proposed in this paper, the accuracy loss of low-bit quantization can be effectively reduced. We also proposed a novel convolutional algorithm using shift operations to replace multiplication to help to deploy the GSNQ quantized models on FPGAs. Quantization comparison experiments performed on LeNet-5, AlexNet, VGG-Net, ResNet, and GoogLeNet showed that GSNQ has higher accuracy than most existing methods and achieves "lossless" quantization (i.e., the accuracy of the quantized CNN model is higher than the baseline) at low-bit quantization in most cases. FPGA comparison experiments showed that our convolutional algorithm does not occupy on-chip DSPs, and it also has a low comprehensive occupancy in terms of on-chip LUTs and FFs, which can effectively improve the computational parallelism, and this proves that GSNQ has good hardware-adaptation capability. This study provides theoretical and experimental support for the industrial application of CNNs.

Assuntos

Algoritmos , Redes Neurais de Computação , Computadores

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA