Research goal: Deep neural networks (DNNs) are constructed with ever-increasing model size and complexity to enhance accuracy. As a result, the performance of deep learning accelerators must at the same time align with the growing performance demands and the increased pressure on power-efficient deployment. Particularly in embedded edge devices like digital watches, smartphones, or autonomous vehicles, the latter is driven by the limitations imposed by hardware resources, power budget, and/or cost. Therefore, there is a strong need to further enhance the efficiency of these accelerators to cater to both high-performance and power-efficient use cases.
Gap in the SotA: Bit-serial computation facilitates bit-wise sequential data processing, offering benefits, such as a reduced area footprint and dynamically-adaptive computational precision. It has emerged as a prominent approach in leveraging bit-level sparsity in Deep Neural Networks (DNNs). However, existing bit-serial accelerators exploit bit-level sparsity to reduce computations by skipping zero bits, but they suffer from inefficient memory accesses due to the irregular indices of the non-zero bits.
Recent results: Hence, this work presents "BitWave," utilizing a novel "bit-column-serial" approach. By combining structured bit-level sparsity and dynamic dataflow techniques, BitWave reduces computations and memory footprints through optimized computation skipping and weight compression. It mitigates performance drops and the need for retraining associated with sparsity-enhancing techniques, achieving a 13.25× higher speedup and 7.71× efficiency compared to existing accelerators. BitWave occupies 1.138 mm2 and consumes 17.56 mW power in a 16nm FinFet process node.
M. Shi, V. Jain, A. Joseph, M. Meijer, M. Verhelst. “BitWave: Exploiting Column-Based Bit-Level Sparsity for Deep Learning Acceleration,” in 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). Accepted.