Event - 16 January 2026

XDMA: A Distributed DMA for Flexible and Efficient Data Movement in Heterogeneous Multi-Accelerator SoCs

Lectured by Fanchen Kong

What

Heterogeneous multi-accelerator SoCs promise higher performance and energy efficiency, yet data movement between accelerators remains a critical bottleneck. Modern workloads, particularly large language models, are memory-bound with irregular access patterns and frequent point-to-multipoint (P2MP) data requirements that conventional DMA engines cannot efficiently support. Existing solutions either introduce high hardware overhead or require costly software intervention, severely limiting interconnect utilization and energy efficiency.

In this talk, we present XDMA, a distributed DMA architecture for efficient data movement in heterogeneous SoCs. We first demonstrate how hardware-based address generators eliminate software control overhead while sustaining high bandwidth utilization for complex, accelerator-specific memory layouts. We then identify critical scalability challenges in broadcasting scenarios and propose Chainwrite, an application-layer mechanism that delivers efficient P2MP transfers by relocating multicast operations from network routers to DMA endpoints. Finally, we validate these techniques on heterogeneous SoC designs, showing substantial improvements in energy efficiency and inference latency while maintaining compatibility with standard interconnect protocols.

When

16/1/2026 11:00 - 12:00

Where

ESAT Aula C