Event - 20 March 2026

Precision-Scalable Microscaling Hardware for Continual Learning

Lectured by Stef Cuyckens

What

Autonomous robots increasingly require on-device continual learning to adapt to dynamic environments in real-time, eliminating the latency and connectivity constraints of cloud-based training. However, edge training introduces a fundamental tension: while inference tasks run efficiently on narrow integer formats, training necessitates the high dynamic range of floating-point precision to accurately capture gradient updates. Microscaling (MX) formats resolve this tension by combining integer and floating-point elements under shared exponents, providing the dynamic range needed for training while minimizing memory footprint and energy consumption. Despite this potential, current accelerator architectures are bottlenecked by inefficient memory handling that forces weight duplication between forward and backward passes. Furthermore, traditional MX designs suffer from energy-heavy accumulation overhead and memory bandwidth contention when scaling down precision. This presentation introduces a precision-scalable MX hardware architecture designed to address these limitations. We present a unified datapath supporting all standardized MX data types, leveraging a block-based data organization that makes forward and backward passes symmetric without duplicating memory or requiring on-the-fly requantization. Additionally, we optimize the dominant MAC reduction tree and integrate the MACs into a general-purpose NPU platform with bandwidth-aware streaming. By addressing these core system bottlenecks, our architecture demonstrates a substantial reduction in memory footprint, providing a functional, precision-scalable compute fabric for continual learning at the edge.

When

20/3/2026 11:00 - 12:00

Where

ESAT Aula R

Back to overview