Pesquisa | Portal Regional da BVS

Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration.

Kim, Joon Kyung; Ahn, Byung Hoon; Kinzer, Sean; Ghodrati, Soroush; Mahapatra, Rohan; Yatham, Brahmendra; Wang, Shu-Ting; Kim, Dohee; Sarikhani, Parisa; Mahmoudi, Babak; Mahajan, Divya; Park, Jongse; Esmaeilzadeh, Hadi.

IEEE Micro ; 42(5): 89-98, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37008678

RESUMO

FPGA accelerators offer performance and efficiency gains by narrowing the scope of acceleration to one algorithmic domain. However, real-life applications are often not limited to a single domain, which naturally makes Cross-Domain Multi-Acceleration a crucial next step. The challenge is, existing FPGA accelerators are built upon their specific vertically-specialized stacks, which prevents utilizing multiple accelerators from different domains. To that end, we propose a pair of dual abstractions, called Yin-Yang, which work in tandem and enable programmers to develop cross-domain applications using multiple accelerators on a FPGA. The Yin abstraction enables cross-domain algorithmic specification, while the Yang abstraction captures the accelerator capabilities. We also develop a dataflow virtual machine, dubbed XLVM, that transparently maps domain functions (Yin) to best-fit accelerator capabilities (Yang). With six real-world cross-domain applications, our evaluations show that Yin-Yang unlocks 29.4× speedup, while the best single-domain acceleration achieves 12.0×.

ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks.

Elthakeb, Ahmed T; Pilligundla, Prannoy; Mireshghallah, Fatemehsadat; Esmaeilzadeh, Hadi; Yazdanbakhsh, Amir.

IEEE Micro ; 40(5): 37-45, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-34413565

RESUMO

Deep Quantization (below eight bits) can significantly reduce the DNN computation and storage by decreasing the bitwidth of network encodings. However, without arduous manual effort, this deep quantization can lead to significant accuracy loss, leaving it in a position of questionable utility. We propose a systematic approach to tackle this problem, by automating the process of discovering the bitwidths through an end-to-end deep reinforcement learning framework (ReLeQ). This framework utilizes the sample efficiency of proximal policy optimization to explore the exponentially large space of possible assignment of the bitwidths to the layers. We show how ReLeQ can balance speed and quality, and provide a heterogeneous bitwidth assignment for quantization of a large variety of deep networks with minimal accuracy loss (≤ 0.3% loss) while minimizing the computation and storage costs. With these DNNs, ReLeQ enables conventional hardware and custom DNN accelerator to achieve 2.2× speedup over 8-bit execution.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA