BitWave: Exploiting Column-Based Bit-Level Sparsity for Deep Learning Acceleration

Man Shi , Marian Verhelst Hardware-efficient AI and ML

Research goal: Deep neural networks (DNNs) are constructed with ever-increasing model size and complexity to enhance accuracy. As a result, the performance of deep learning accelerators must at the same time align with the growing performance demands and the increased pressure on power-efficient deployment. Particularly in embedded edge devices like digital watches, smartphones, or autonomous vehicles, the latter is driven by the limitations imposed by hardware resources, power budget, and/or cost. Therefore, there is a strong need to further enhance the efficiency of these accelerators to cater to both high-performance and power-efficient use cases.

Gap in the SotA: Bit-serial computation facilitates bit-wise sequential data processing, offering benefits, such as a reduced area footprint and dynamically-adaptive computational precision. It has emerged as a prominent approach in leveraging bit-level sparsity in Deep Neural Networks (DNNs). However, existing bit-serial accelerators exploit bit-level sparsity to reduce computations by skipping zero bits, but they suffer from inefficient memory accesses due to the irregular indices of the non-zero bits.

Recent results: Hence, this work presents "BitWave," utilizing a novel "bit-column-serial" approach. By combining structured bit-level sparsity and dynamic dataflow techniques, BitWave reduces computations and memory footprints through optimized computation skipping and weight compression. It mitigates performance drops and the need for retraining associated with sparsity-enhancing techniques, achieving a 13.25× higher speedup and 7.71× efficiency compared to existing accelerators. BitWave occupies 1.138 mm2 and consumes 17.56 mW power in a 16nm FinFet process node.

 

Get in touch
Man Shi
Phd student
Marian Verhelst
Academic staff

Publications about this research topic

M. Shi, V. Jain, A. Joseph, M. Meijer, M. Verhelst. “BitWave: Exploiting Column-Based Bit-Level Sparsity for Deep Learning Acceleration,” in 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). Accepted.

Other research topics in Hardware-efficient AI and ML

Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators
Hardware-efficient AI and ML
Jun Yin | Marian Verhelst
A Scalable Heterogenous Multi-accelerator Platform for AI and ML
Hardware-efficient AI and ML
Ryan Antonio | Marian Verhelst
Uncertainty-Aware Design Space Exploration for AI Accelerators
Hardware-efficient AI and ML
Jiacong Sun | Georges Gielen and Marian Verhelst
Integer GEMM Accelerator for SNAX
Hardware-efficient AI and ML
Xiaoling Yi | Marian Verhelst
Improving GPGPU micro architecture for future AI workloads
Hardware-efficient AI and ML
Giuseppe Sarda | Marian Verhelst
SRAM based digital in memory compute macro in 16nm
Hardware-efficient AI and ML
Weijie Jiang | Wim Dehaene
Scalable large array nanopore readouts for proteomics and next-generation sequencing
Analog and power management circuits, Hardware-efficient AI and ML, Biomedical circuits and sensor interfaces
Sander Crols | Filip Tavernier and Marian Verhelst
Design space exploration of in-memory computing DNN accelerators
Hardware-efficient AI and ML
Pouya Houshmand and Jiacong Sun | Marian Verhelst
Multi-core architecture exploration for layer-fused deep learning acceleration
Hardware-efficient AI and ML
Pouya Houshmand and Arne Symons | Marian Verhelst
HW-algorithm co-design for Bayesian inference of probabilistic machine learning
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Shirui Zhao | Marian Verhelst
Design space exploration for machine learning acceleration
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Optimized deployment of AI algorithms on rapidly-changing heterogeneous multi-core compute platforms
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Josse Van Delm | Marian Verhelst
High-throughput high-efficiency SRAM for neural networks
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Wim Dehaene and Marian Verhelst
Heterogeneous Multi-core System-on-Chips for Ultra Low Power Machine Learning Application at the Edge
Hardware-efficient AI and ML
Pouya Houshmand, Giuseppe Sarda, and Ryan Antonio | Marian Verhelst

Want to work with us?

Get in touch or discover the way we can collaborate.