SRAM is an important component to store the weight and activations in a neural network chip. And for the parallelism, especially when cooperating with a large-scale PE array, multiple words would be addressed from the memory in the same cycle. Considering the significant portion consumed by memory operations, the high SRAM bandwidth could favour low latency and high energy efficiency of the network. Our target, therefore, is to provide neural networks with a high bandwidth SRAM for multiple-port access and promising density, power as well as frequency.
SotA: Multi-port SRAMs provided by foundries get use of multi-port bitcells and individual peripherals for every port. Although such designs offer a high bandwidth, they are facing the large bitcell area, i.e., low density, and serious voltage/frequency constraints as multi-port bitcells are vulnerable to access failures. Banked single-port SRAM is another approach, yet its low addressing flexibility leads to degraded throughput and complex mapping strategy between computation parts and memory components.
Our approach: This research explores the hierarchical multi-port SRAM design that multiple addressing is implemented for small bitcell matrixes. Inside the bitcell matrix, high-density single-port bitcells are used cooperating with pitch-matched local peripherals. Then, different voltage swings are assigned to global and local signals accordingly to save energy while maintain the accessibility.
Recent result: The first prototype would be carried out with GlobalFoundries 12nm LP process in late 2023. The target is to enable 4-port access with expected efficiency.