Efficient implementation of a neural network on hardware using compression techniques

Показать описание

5-min ML Paper Challenge

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Abstract—State-of-the-art deep neural networks (DNNs)
have hundreds of millions of connections and are both computationally
and memory intensive, making them difficult to deploy
on embedded systems with limited hardware resources and
power budgets. While custom hardware helps the computation,
fetching weights from DRAM is two orders of magnitude more
expensive than ALU operations, and dominates the required
power.
Previously proposed ‘Deep Compression’ makes it possible
to fit large DNNs (AlexNet and VGGNet) fully in on-chip
SRAM. This compression is achieved by pruning the redundant
connections and having multiple connections share the same
weight. We propose an energy efficient inference engine (EIE)
that performs inference on this compressed network model and
accelerates the resulting sparse matrix-vector multiplication
with weight sharing. Going from DRAM to SRAM gives EIE
120× energy saving; Exploiting sparsity saves 10×; Weight
sharing gives 8×; Skipping zero activations from ReLU saves
another 3×. Evaluated on nine DNN benchmarks, EIE is
189× and 13× faster when compared to CPU and GPU
implementations of the same DNN without compression. EIE
has a processing power of 102 GOPS/s working directly on
a compressed network, corresponding to 3 TOPS/s on an
uncompressed network, and processes FC layers of AlexNet at
1.88×104
frames/sec with a power dissipation of only 600mW.
It is 24,000× and 3,400× more energy efficient than a CPU
and GPU respectively. Compared with DaDianNao, EIE has
2.9×, 19× and 3× better throughput, energy efficiency and
area efficiency.

Рекомендации по теме

Efficient implementation of a neural network on hardware using compression techniques

Efficient implementation of a neural network on hardware using compression techniques

Efficient hardware implementation of deep neural network processing Marian Verhelst

Techniques for Efficient Implementation of Deep Neural Networks, Song Han @ Embedded vision summit

[LCTES24] Efficient Implementation of Neural Networks Usual Layers on Fixed-Point Architectures

CMU Neural Nets for NLP 2021 (4): Efficiency Tricks for Neural Nets

'Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks' Guy Katz | CAV 2017

Efficient Processing of Deep Neural Network: from Algorithms to Hardware Architectures #NeurIPS2019

Neural Network In 5 Minutes | What Is A Neural Network? | How Neural Networks Work | Simplilearn

Neural Network Acceleration of the Nonlinear Schrödinger Equation and Its Quantum Implications

Efficient VLSI Implementation of Neural Networks With Hyperbolic Tangent||VLSI Bangalore

Computationally-Efficient Deep Neural Networks: Motivation and Methods

Neural Networks Demystified [Part 6: Training]

Spiking Neural Networks for More Efficient AI Algorithms

MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation

[Advanced Learning Algorithms] 13.How neural networks are implemented efficiently

[FPGA'21] FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activatio...

Learn about Neural Network Implementation

PyTorch in 100 Seconds

Recurrent Neural Network Implementation Details

Neural Efficiency Hypothesis

Neural Networks Audiobook: Chapter 8, Neural Network Optimization and Efficiency

SemifreddoNets: Partially Frozen Neural Networks for Efficient Computer Vision Systems (ECCV 2020)

SVAIL Tech Notes: Deploying Deep Neural Networks Efficiently

SD IEEE VLSI 2014 EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATI