A Scalable Heterogenous Multi-accelerator Platform for AI and ML

Ryan Antonio , Marian Verhelst Hardware-efficient AI and ML

Context: State-of-the-art AI algorithms have varying workloads that make it challenging to process data efficiently, even with existing high-performance multicore architectures. A scalable heterogeneous multi-accelerator architecture allows us to achieve the highest efficiency and reconfigurability to handle the varying workload needs. However, managing compute resources (like CPU cores and accelerators), data (layout and transfers), and scalability for any AI algorithm is challenging. To achieve good performance, it is crucial to enable a very tight connection between the accelerators and memory, without giving up on scalability and the flexibility to enable mapping a wide variety of workloads. 

Research goal: This research aims to develop a standardized accelerator shell with hardware features that support 1.) the tight coupling of the accelerator and memory for maximum compute-memory efficiency; and 2.) the ability to tie many accelerators together for scalability reasons. Each accelerator supports specific kernels in AI algorithms, and targets compute-bound throughput rather than memory-bound throughput. A standardized shell allows us to easily connect multiple cores, making it more reconfigurable to new algorithms that may appear. In this work, we propose the Snitch Accelerator Extension (SNAX) shell that consists of a light-weight management core, a tightly coupled data memory with streaming ports towards an accelerator, a smart DMA that can transform data layouts during data transfers, and a task buffer that acts like a hardware loop that can repetitively run a set of tasks. 

SNAX is currently combined with the first set of accelerators for neural network processing: a GeMM accelerator and an activation accelerator.

Get in touch
Ryan Antonio
Phd student
Marian Verhelst
Academic staff
SNAX Cluster Architecture
SNAX Cluster Architecture

Other research topics in Hardware-efficient AI and ML

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Hardware-efficient AI and ML
Man Shi, Arne Symons, Robin Geens, and Chao Fang | Marian Verhelst
Massive parallelism for combinatorial optimisation problems
Hardware-efficient AI and ML
Toon Bettens and Sofie De Weer | Wim Dehaene and Marian Verhelst
Carbon-aware Design Space Exploration for AI Accelerators
Hardware-efficient AI and ML
Jiacong Sun | Georges Gielen and Marian Verhelst
Decoupled Control Flow and Memory Orchestration in the Vortex GPGPU
Hardware-efficient AI and ML
Giuseppe Sarda | Marian Verhelst
Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators
Hardware-efficient AI and ML
Jun Yin | Marian Verhelst
Uncertainty-Aware Design Space Exploration for AI Accelerators
Hardware-efficient AI and ML
Jiacong Sun | Georges Gielen and Marian Verhelst
Integer GEMM Accelerator for SNAX
Hardware-efficient AI and ML
Xiaoling Yi | Marian Verhelst
Improving GPGPU micro architecture for future AI workloads
Hardware-efficient AI and ML
Giuseppe Sarda | Marian Verhelst
SRAM based digital in memory compute macro in 16nm
Hardware-efficient AI and ML
Weijie Jiang | Wim Dehaene
Scalable large array nanopore readouts for proteomics and next-generation sequencing
Analog and power management circuits, Hardware-efficient AI and ML, Biomedical circuits and sensor interfaces
Sander Crols | Filip Tavernier and Marian Verhelst
Design space exploration of in-memory computing DNN accelerators
Hardware-efficient AI and ML
Pouya Houshmand and Jiacong Sun | Marian Verhelst
Multi-core architecture exploration for layer-fused deep learning acceleration
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
HW-algorithm co-design for Bayesian inference of probabilistic machine learning
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Shirui Zhao | Marian Verhelst
Design space exploration for machine learning acceleration
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Optimized deployment of AI algorithms on rapidly-changing heterogeneous multi-core compute platforms
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Josse Van Delm | Marian Verhelst
High-throughput high-efficiency SRAM for neural networks
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Wim Dehaene and Marian Verhelst
Heterogeneous Multi-core System-on-Chips for Ultra Low Power Machine Learning Application at the Edge
Hardware-efficient AI and ML
Pouya Houshmand, Giuseppe Sarda, and Ryan Antonio | Marian Verhelst

Want to work with us?

Get in touch or discover the way we can collaborate.