Hardware-algorithm Co-design and Accelerator Architecture Exploration for hybrid DNN and DSP Workloads

Jun Yin , Marian Verhelst Hardware-efficient AI and ML

Research Goal: Hybrid algorithm structures with proper digital signal processing (DSP) pre-/post-processing in combination with light-weight deep neural network (DNN) models have proved to be beneficial in many application domains. Such hybrid ML models typically give up model regularity for a reduced memory/computational footprint. To facilitate the hardware-algorithm co-design of these hybrid DSP/DNN algorithms targetting resource-contrained embedded platforms, early-design-stage modeling and design space exploration is necessary to find optimal hardware architectures for efficient deployment. 

This research is being carried out in a joint project with Bosch within the EU Marie-Curie Project I-SPOT, within the application domain of automotive acoustic perception.

Gap in SotA: Although many dedicated homogeneous or heterogeneous accelerator systems have been proposed in the adjacent domains of natural language processing and indoor acoustic reasoning, there is little research on both hardware and algorithmic solutions for outdoor automotive scenarios. Therefore, rapid prototyping and modeling are required to enable iterative optimizations across the fast-paced algorithmic design and hardware domain. There exist several domain-specific toolchains that target rapid reconfigurable hardware generation and for-loop-based dataflow optimization. Yet, the end-to-end support of these tools for hybrid DNN-DSP workloads still requires heavy manual tweaking.

Progress Updates: The first stage of research focused on exploring the typical complexity and workflow diagrams of outdoor acoustic applications in a hybrid DSP+DNN manner. Hence, we set off from a CNN-based model using SRP-PHAT features to perform robust sound source localization in noisy and reverberent environments. This allowed to jointly evaluate the algorithm accuracy and hardware overhead in a DSP-DNN combined design space. Over multiple system parameter cases, we compressed the hybrid algorithm to save 10.32∼73.71% computational complexity and 59.77∼94.66%
DNN weights from the baseline, while still retaining competitiveness in state-of-the-art accuracy comparisons. 

For the next step, we are integrating finer-grained algorithmic scheduling into the workflow, together with hardware overhead estimation in order to allow searching for the optimal accelerator architecture configuration for DSP/DNN systems.

 

Get in touch
Jun Yin
Phd student
Marian Verhelst
Academic staff

Publications about this research topic

Jun Yin and Marian Verhelst; "CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge"; Accepted to Transactions on Embedded Computing Systems.

Other research topics in Hardware-efficient AI and ML

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Hardware-efficient AI and ML
Man Shi, Arne Symons, Robin Geens, and Chao Fang | Marian Verhelst
Massive parallelism for combinatorial optimisation problems
Hardware-efficient AI and ML
Toon Bettens and Sofie De Weer | Wim Dehaene and Marian Verhelst
Carbon-aware Design Space Exploration for AI Accelerators
Hardware-efficient AI and ML
Jiacong Sun | Georges Gielen and Marian Verhelst
Decoupled Control Flow and Memory Orchestration in the Vortex GPGPU
Hardware-efficient AI and ML
Giuseppe Sarda | Marian Verhelst
Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators
Hardware-efficient AI and ML
Jun Yin | Marian Verhelst
A Scalable Heterogenous Multi-accelerator Platform for AI and ML
Hardware-efficient AI and ML
Ryan Antonio | Marian Verhelst
Uncertainty-Aware Design Space Exploration for AI Accelerators
Hardware-efficient AI and ML
Jiacong Sun | Georges Gielen and Marian Verhelst
Integer GEMM Accelerator for SNAX
Hardware-efficient AI and ML
Xiaoling Yi | Marian Verhelst
Improving GPGPU micro architecture for future AI workloads
Hardware-efficient AI and ML
Giuseppe Sarda | Marian Verhelst
SRAM based digital in memory compute macro in 16nm
Hardware-efficient AI and ML
Weijie Jiang | Wim Dehaene
Scalable large array nanopore readouts for proteomics and next-generation sequencing
Analog and power management circuits, Hardware-efficient AI and ML, Biomedical circuits and sensor interfaces
Sander Crols | Filip Tavernier and Marian Verhelst
Design space exploration of in-memory computing DNN accelerators
Hardware-efficient AI and ML
Pouya Houshmand and Jiacong Sun | Marian Verhelst
Multi-core architecture exploration for layer-fused deep learning acceleration
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
HW-algorithm co-design for Bayesian inference of probabilistic machine learning
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Shirui Zhao | Marian Verhelst
Design space exploration for machine learning acceleration
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Optimized deployment of AI algorithms on rapidly-changing heterogeneous multi-core compute platforms
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Josse Van Delm | Marian Verhelst
High-throughput high-efficiency SRAM for neural networks
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Wim Dehaene and Marian Verhelst
Heterogeneous Multi-core System-on-Chips for Ultra Low Power Machine Learning Application at the Edge
Hardware-efficient AI and ML
Pouya Houshmand, Giuseppe Sarda, and Ryan Antonio | Marian Verhelst

Want to work with us?

Get in touch or discover the way we can collaborate.