SRAM based digital in memory compute macro in 16nm

Weijie Jiang , Wim Dehaene Hardware-efficient AI and ML

Efficient inference of CNN models requires large amount data movement. This is energy hungry. For traditional digital accelerators, the memory is made with compiled SRAM. During run time, weights of the model, input, and output all need to be moved to and from the memory. In-memory-compute is proposed to reduce the amount of data movement by putting the data-path much closer to the memory cells. This becomes even more effective if the whole model, of moderate size, can be stored at once on the chip. This is very often the case for edge applications.

Researchers firstly focused on analog in-memory-compute (AIMC), which performs well with low precision. The main drawback of AIMC is that modification of bit-cell is usually needed, and sophisticated analog circuits are needed to lower the impact of variability. This result in low area efficiency for most published AIMC macros. Digital in-memory-compute (DIMC), on the other hand, is more robust to variability. Thus IMC can achieve potentially much higher density. We proposed a 128KB DIMC macro in 16nm finFET. The macro's area is 0.5mm2, achieving max 23.8TOPs/W for 8b MAC. Storage density of this macro is 256kB/mm2. Compute density is 0.364TOPs/mm2. Among all >10kB IMC designs, we achieved the highest FoM (defined as storage density (kB/mm2) * compute density (TOPs/mm2)). DIMC opens a realm of options for effeceinte calucation of AI algorithms on the edge. 

Get in touch
Weijie Jiang
Phd student
Wim Dehaene
Academic staff
Idea of DIMC and chip photo.
Idea of DIMC and chip photo.

Publications about this research topic

W. Jiang, P. Houshmand, M. Verhelst and W. Dehaene, "A 16nm 128kB high-density fully digital In Memory Compute macro with reverse SRAM pre-charge achieving 0.36TOPs/mm2, 256kB/mm2 and 23. 8TOPs/W," ESSCIRC 2023- IEEE 49th European Solid State Circuits Conference (ESSCIRC), Lisbon, Portugal, 2023, pp. 409-412, doi: 10.1109/ESSCIRC59616.2023.10268774.

Other research topics in Hardware-efficient AI and ML

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Hardware-efficient AI and ML
Man Shi, Arne Symons, Robin Geens, and Chao Fang | Marian Verhelst
Massive parallelism for combinatorial optimisation problems
Hardware-efficient AI and ML
Toon Bettens and Sofie De Weer | Wim Dehaene and Marian Verhelst
Carbon-aware Design Space Exploration for AI Accelerators
Hardware-efficient AI and ML
Jiacong Sun | Georges Gielen and Marian Verhelst
Decoupled Control Flow and Memory Orchestration in the Vortex GPGPU
Hardware-efficient AI and ML
Giuseppe Sarda | Marian Verhelst
Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators
Hardware-efficient AI and ML
Jun Yin | Marian Verhelst
A Scalable Heterogenous Multi-accelerator Platform for AI and ML
Hardware-efficient AI and ML
Ryan Antonio | Marian Verhelst
Uncertainty-Aware Design Space Exploration for AI Accelerators
Hardware-efficient AI and ML
Jiacong Sun | Georges Gielen and Marian Verhelst
Integer GEMM Accelerator for SNAX
Hardware-efficient AI and ML
Xiaoling Yi | Marian Verhelst
Improving GPGPU micro architecture for future AI workloads
Hardware-efficient AI and ML
Giuseppe Sarda | Marian Verhelst
Scalable large array nanopore readouts for proteomics and next-generation sequencing
Analog and power management circuits, Hardware-efficient AI and ML, Biomedical circuits and sensor interfaces
Sander Crols | Filip Tavernier and Marian Verhelst
Design space exploration of in-memory computing DNN accelerators
Hardware-efficient AI and ML
Pouya Houshmand and Jiacong Sun | Marian Verhelst
Multi-core architecture exploration for layer-fused deep learning acceleration
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
HW-algorithm co-design for Bayesian inference of probabilistic machine learning
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Shirui Zhao | Marian Verhelst
Design space exploration for machine learning acceleration
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators
Hardware-efficient AI and ML
Arne Symons | Marian Verhelst
Optimized deployment of AI algorithms on rapidly-changing heterogeneous multi-core compute platforms
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Josse Van Delm | Marian Verhelst
High-throughput high-efficiency SRAM for neural networks
Ultra-low power digital SoCs and memories, Hardware-efficient AI and ML
Wim Dehaene and Marian Verhelst
Heterogeneous Multi-core System-on-Chips for Ultra Low Power Machine Learning Application at the Edge
Hardware-efficient AI and ML
Pouya Houshmand, Giuseppe Sarda, and Ryan Antonio | Marian Verhelst

Want to work with us?

Get in touch or discover the way we can collaborate.