Efficient execution of irregular data flow graphs: Hardware/software co-optimization for probabilistic AI and sparse triangular systems

Marian Verhelst

Hardware-efficient AI and ML

Introduction / Objective

To meet the ever-present demand for smarter and more intelligent machines, increasing research efforts are focussed on developing novel artificial intelligence (AI) models. However, despite the promising algorithmic properties, many novel models do not compute well on existing hardware architectures like GPU and neural network processors. A salient example of such a class of models is Probabilistic Circuits (PC) used for neuro-symbolic AI, which requires sparse and irregular graph-based challenging computational patterns. This project takes on this challenge by developing a hardware/software co-optimized computation stack, enabling energy-constrained edge applications.

Research Methodology

To address the execution bottlenecks of PCs (and similar irregular data flow graphs in general), several contributions are made across the hardware/software stack as follows:

• Application: The most suitable data representation is identified by developing analytical error and energy models of customized fixed and floating-point formats. A novel representation based on the posit format is also investigated.
• Compilation: Optimized mapping algorithms are developed to parallelize the workloads on general-purpose multithreaded CPU and dedicated hardware architectures by minimizing synchronization/communication overheads.
• Hardware: Two versions of dedicated DAG Processing Units (DPUs) are developed incorporating a dedicated spatial datapath, targeted interconnection network, precision-scalable arithmetic unit, and custom memory hierarchy.
• Implementation: The hardware innovations are realized and validated by the optimized physical implementation of the first version of DPU on chip in a 28nm CMOS technology.

Results & Conclusions

The cohesive hardware/software optimizations achieve higher throughput than CPU and GPU, while operating at order of magnitude higher energy efficency. The main findings can be summarized as follows:

• An 8b posit can be customized to reach the same accuracy as the 32b float for PCs.
• Optimized mapping algorithms achieve a speed of 2× for multithreaded CPU execution.
• The 28nm DPU prototype achieves a speedup of 5× and 20× over CPU and GPU, while operating below 0.25W.

These results demonstrate that the project contributes important pieces enabling efficient execution of PC and similar irregular data flow graph-based workloads.

Get in touch

Marian Verhelst

Academic staff

Publications about this research topic

Articles in international journals

Articles in international conference proceedings

Discover more publications

Other research topics in Hardware-efficient AI and ML

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format

Hardware-efficient AI and ML

Man Shi, Arne Symons, Robin Geens, and Chao Fang | Marian Verhelst

Massive parallelism for combinatorial optimisation problems

Hardware-efficient AI and ML

Toon Bettens and Sofie De Weer | Wim Dehaene and Marian Verhelst

Carbon-aware Design Space Exploration for AI Accelerators

Hardware-efficient AI and ML

Jiacong Sun | Georges Gielen and Marian Verhelst

Decoupled Control Flow and Memory Orchestration in the Vortex GPGPU

Hardware-efficient AI and ML

Giuseppe Sarda | Marian Verhelst

Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators

Hardware-efficient AI and ML

Jun Yin | Marian Verhelst

A Scalable Heterogenous Multi-accelerator Platform for AI and ML

Hardware-efficient AI and ML

Ryan Antonio | Marian Verhelst

BitWave: Exploiting Column-Based Bit-Level Sparsity for Deep Learning Acceleration

Hardware-efficient AI and ML

Man Shi | Marian Verhelst

Uncertainty-Aware Design Space Exploration for AI Accelerators

Hardware-efficient AI and ML

Jiacong Sun | Georges Gielen and Marian Verhelst

Integer GEMM Accelerator for SNAX

Hardware-efficient AI and ML

Xiaoling Yi | Marian Verhelst

Improving GPGPU micro architecture for future AI workloads

Hardware-efficient AI and ML

Giuseppe Sarda | Marian Verhelst

SRAM based digital in memory compute macro in 16nm

Hardware-efficient AI and ML

Weijie Jiang | Wim Dehaene

Scalable large array nanopore readouts for proteomics and next-generation sequencing

Analog and power management circuits, Hardware-efficient AI and ML, Biomedical circuits and sensor interfaces

Sander Crols | Filip Tavernier and Marian Verhelst

Hardware-algorithm Co-design and Accelerator Architecture Exploration for hybrid DNN and DSP Workloads

Hardware-efficient AI and ML

Jun Yin | Marian Verhelst

Design space exploration of in-memory computing DNN accelerators

Hardware-efficient AI and ML

Pouya Houshmand and Jiacong Sun | Marian Verhelst

Multi-core architecture exploration for layer-fused deep learning acceleration

Hardware-efficient AI and ML

Arne Symons | Marian Verhelst

HW-algorithm co-design for Bayesian inference of probabilistic machine learning

Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML

Shirui Zhao | Marian Verhelst

Design space exploration for machine learning acceleration

Hardware-efficient AI and ML

Arne Symons | Marian Verhelst

Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories

Hardware-efficient AI and ML

Man Shi | Marian Verhelst

Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators

Hardware-efficient AI and ML

Arne Symons | Marian Verhelst

Optimized deployment of AI algorithms on rapidly-changing heterogeneous multi-core compute platforms

Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML

Josse Van Delm | Marian Verhelst

High-throughput high-efficiency SRAM for neural networks

Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML

Wim Dehaene and Marian Verhelst

Heterogeneous Multi-core System-on-Chips for Ultra Low Power Machine Learning Application at the Edge

Hardware-efficient AI and ML

Pouya Houshmand, Giuseppe Sarda, and Ryan Antonio | Marian Verhelst

Want to work with us?

Get in touch or discover the way we can collaborate.

Discover how we can collaborate