Context: Deep learning has improved performance in various fields such as computer vision, natural language processing, speech recognition, and signal analysis, but their increased size and intermediate data values pose challenges for energy, latency, and memory footprint requirements at the edge. Specialized hardware architectures have evolved to accelerate deep learning, with recent architectures shifting toward multi-core / multi-accelerator designs. However, previous approaches such as layer-by-layer processing and pipelining have limitations for latency-critical edge applications and result in a large memory footprint.
Recent results for the Stream framework: To overcome these challenges, this project presents the first general exploration framework, called Stream, for heterogeneous multi-core HW architectures with fine-grained scheduling of layer-fused DNNs. The framework combines a unified modeling representation, a rapid fine-grained data dependency generator, a genetic algorithm-based layer-core allocator, and a heuristics-based scheduler to enable the exploration of fine-grained scheduling. Stream allows us to co-explore the effect of combining fine-grained layer-fused scheduling with multi-core architectures.