HW-algorithm co-design for Bayesian inference of probabilistic machine learning

Shirui Zhao , Marian Verhelst

Ultra-low power digital SoCs and memories

Hardware-efficient AI and ML

Research Goal: The aim of this research project is to design efficient and scalable hardware systems for probabilistic machine learning models. Unlike the "black box" deep learning methods, probabilistic models are gaining popularity due to their ability to integrate domain knowledge, deal with uncertainty, and produce interpretable results. However, the inference on probabilistic models is computationally intensive and requires a large memory footprint. Designing hardware for probabilistic models allows for the optimization of computation and memory utilization, leading to faster and more energy-efficient processing. Additionally, dedicated hardware can enable the use of probabilistic models in resource-constrained environments, such as mobile devices and Internet of Things (IoT) devices.

Gap in SotA: The current state-of-the-art (SotA) in ML processors is primarily focused on accelerating deep learning workloads, with little emphasis on Bayesian or probabilistic inference acceleration. The major challenges in accelerating Bayesian inference operations are its need for sequential data processing and its frequent updates of large amounts of (irregular) data structures, making it difficult to map the computation on widely parallel hardware platforms. This results in a lack of energy efficient compute platforms for the compute intensive probabilistic inference algorithms, preventing their application in edge devices. There is hence a clear need for flexible hardware solutions that can handle the dynamic and changing requirements of Bayesian inference algorithms in real-world applications.

Results: This research project started with the development of a basic hardware block for probabilistic inference: the Knuth-Yao sampler. Generating random variables is the fundamental operation in this field, as typical approximate Bayesian inference involves the generation of billions of probabilistic values. As such, hardware samplers became the bottleneck to reduce overall energy consumption and increase performance. The novel reconfigurable Knuth-Yao sampling architecture that supports both flexible range and dynamic precision provides up to 13x energy efficiency benefits and 11x area efficiency improvement over the traditional linear CDT-based samplers used in the SotA accelerators which is suitable for our workloads.

This sampler is subsequently integrated in a 16-core approximate inference accelerator, in which each of the CPU's has an ISA enhanced with probabilistic inference. The added instructions include a sampling operation, as well as LUT-based special function approximations and the ability of cores to access the register files of neightboring cores. The altter allows cores operating on different variables in parallel to quickly exchange information and axchieve highm inference throughput. The chip has been published at ESSERC2024.

Get in touch

Shirui Zhao

Phd student

Marian Verhelst

Academic staff

Probabilistic Knuth-Yao sampler

Multi-core approximate inference accelerator

Publications about this research topic

S. Zhao, N. Shah, W. Meert and M. Verhelst, "Discrete Samplers for Approximate Inference in Probabilistic Machine Learning," 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 2022.

Zhao, Shirui, et al. "AIA: A 16nm Multicore SoC for Approximate Inference Acceleration Exploiting Non-normalized Knuth-Yao Sampling and Inter-Core Register Sharing." 2024 IEEE European Solid-State Electronics Research Conference (ESSERC). IEEE, 2024.

Discover more publications

Other research topics in Ultra-low power digital SoCs and memories and Hardware-efficient AI and ML

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format

Hardware-efficient AI and ML

Man Shi, Arne Symons, Robin Geens, and Chao Fang | Marian Verhelst

Massive parallelism for combinatorial optimisation problems

Hardware-efficient AI and ML

Toon Bettens and Sofie De Weer | Wim Dehaene and Marian Verhelst

Carbon-aware Design Space Exploration for AI Accelerators

Hardware-efficient AI and ML

Jiacong Sun | Georges Gielen and Marian Verhelst

Decoupled Control Flow and Memory Orchestration in the Vortex GPGPU

Hardware-efficient AI and ML

Giuseppe Sarda | Marian Verhelst

Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators

Hardware-efficient AI and ML

Jun Yin | Marian Verhelst

A Scalable Heterogenous Multi-accelerator Platform for AI and ML

Hardware-efficient AI and ML

Ryan Antonio | Marian Verhelst

BitWave: Exploiting Column-Based Bit-Level Sparsity for Deep Learning Acceleration

Hardware-efficient AI and ML

Man Shi | Marian Verhelst

Uncertainty-Aware Design Space Exploration for AI Accelerators

Hardware-efficient AI and ML

Jiacong Sun | Georges Gielen and Marian Verhelst

Activity-independent variability resilience for complex ultra-low voltage digital ICs

Ultra-low power digital SoCs and memories

Clara Nieto Taladriz Moreno | Wim Dehaene

Integer GEMM Accelerator for SNAX

Hardware-efficient AI and ML

Xiaoling Yi | Marian Verhelst

Improving GPGPU micro architecture for future AI workloads

Hardware-efficient AI and ML

Giuseppe Sarda | Marian Verhelst

SRAM based digital in memory compute macro in 16nm

Hardware-efficient AI and ML

Weijie Jiang | Wim Dehaene

Scalable large array nanopore readouts for proteomics and next-generation sequencing

Analog and power management circuits, Hardware-efficient AI and ML, Biomedical circuits and sensor interfaces

Sander Crols | Filip Tavernier and Marian Verhelst

Hardware-algorithm Co-design and Accelerator Architecture Exploration for hybrid DNN and DSP Workloads

Hardware-efficient AI and ML

Jun Yin | Marian Verhelst

Design space exploration of in-memory computing DNN accelerators

Hardware-efficient AI and ML

Pouya Houshmand and Jiacong Sun | Marian Verhelst

Multi-core architecture exploration for layer-fused deep learning acceleration

Hardware-efficient AI and ML

Arne Symons | Marian Verhelst

Design space exploration for machine learning acceleration

Hardware-efficient AI and ML

Arne Symons | Marian Verhelst

Efficient execution of irregular data flow graphs: Hardware/software co-optimization for probabilistic AI and sparse triangular systems

Hardware-efficient AI and ML

Marian Verhelst

Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories

Hardware-efficient AI and ML

Man Shi | Marian Verhelst

Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators

Hardware-efficient AI and ML

Arne Symons | Marian Verhelst

Automated in-situ monitoring for variability resilient and energy efficient digital circuits

Ultra-low power digital SoCs and memories

Clara Nieto Taladriz Moreno | Wim Dehaene

Optimized deployment of AI algorithms on rapidly-changing heterogeneous multi-core compute platforms

Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML

Josse Van Delm | Marian Verhelst

High-throughput high-efficiency SRAM for neural networks

Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML

Wim Dehaene and Marian Verhelst

Ultrasound wave based body area networks

Analog and power management circuits, Ultra-low power digital SoCs and memories, Biomedical circuits and sensor interfaces

Wim Dehaene and Marian Verhelst

Heterogeneous Multi-core System-on-Chips for Ultra Low Power Machine Learning Application at the Edge

Hardware-efficient AI and ML

Pouya Houshmand, Giuseppe Sarda, and Ryan Antonio | Marian Verhelst

SRAM Macro Design

Ultra-low power digital SoCs and memories

Bob Vanhoof | Wim Dehaene

Want to work with us?

Get in touch or discover the way we can collaborate.

Discover how we can collaborate