A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow - ArXiv:2

Показать описание

Title: A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow

Authors: Zhiyuan Zhao, Yihao Chen, Pengcheng Feng, Jixing Li, Gang Chen, Rongxuan Shen, Huaxiang Lu

Abstract:
FPGA accelerators for lightweight neural convolutional networks (LWCNNs) have recently attracted significant attention. Most existing LWCNN accelerators focus on single-Computing-Engine (CE) architecture with local optimization. However, these designs typically suffer from high on-chip/off-chip memory overhead and low computational efficiency due to their layer-by-layer dataflow and unified resource mapping mechanisms. To tackle these issues, a novel multi-CE-based accelerator with balanced dataflow is proposed to efficiently accelerate LWCNN through memory-oriented and computing-oriented optimizations. Firstly, a streaming architecture with hybrid CEs is designed to minimize off-chip memory access while maintaining a low cost of on-chip buffer size. Secondly, a balanced dataflow strategy is introduced for streaming architectures to enhance computational efficiency by improving efficient resource mapping and mitigating data congestion. Furthermore, a resource-aware memory and parallelism allocation methodology is proposed, based on a performance model, to achieve better performance and scalability. The proposed accelerator is evaluated on Xilinx ZC706 platform using MobileNetV2 and ShuffleNetV2.Implementation results demonstrate that the proposed accelerator can save up to 68.3% of on-chip memory size with reduced off-chip memory access compared to the reference design. It achieves an impressive performance of up to 2092.4 FPS and a state-of-the-art MAC efficiency of up to 94.58%, while maintaining a high DSP utilization of 95%, thus significantly outperforming current LWCNN accelerators.

Academia Accelerated

Рекомендации по теме

A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow - ArXiv:2

A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow - ArXiv:2

A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow - ArXiv:2

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks

[FPGA 2022] HP-GNN: Generating High Throughput GNN Training Implementation

[FPGA 2022] HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement

OpenCL-based FPGA Accelerator for Semi-Global Approximate String Matching Using Diagonal Bit-Vectors

[FPGA 2022] High-Performance Sparse Linear Algebra on HBM-Equipped FPGAs Using HLS

FPGA accelerators for compute: Intel PAC Speaker: Pawel Olejniczak (Intel)

An FPGA Based Hardware Acceleratorfor Traffic Sign Detection

High-speed Radar and 5G NR GSPS Processing on FPGAs and SoCs

How to choose an accelerator for your application (FPGA parallelism)

[FPGA 2022] Towards Agile DNN Accelerator Design Using Incremental Synthesis on FPGAs

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

MicroRec: Efficient Recommendation Inference on FPGAs

[FPGA 2022] FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks

[FPGA 2022] An FPGA-based RNN-T Inference Accelerator with PIM-HBM

FPGA-Based FFT Accelerator for Voice Optimizer || Final Project || ECE6775 FA23

Day in My Life as a Quantum Computing Engineer!

Crossroads FPGA Seminar: High Performance CNN Inference Acceleration on FPGAs

[FPGA 2023] CHARM: Composing Heterogeneous Accelerators for Matrix Multiply on Versal ACAP [...]

How hard is it to use an FPGA for compute acceleration in 2023?

Tutorial (ISFPGA'2021): Neural Network Accelerator Co-Design with FINN

[FPGA 2022] Sextans: A Streaming Accelerator for Sparse-Matrix Dense-Matrix Multiplication

tinyML Talks Pakistan: FFConv: An FPGA-based Accelerator for Fast Convolution Layers in...