Efficient inference of CNN models requires large amount data movement. This is energy hungry. For traditional digital accelerators, the memory is made with compiled SRAM. During run time, weights of the model, input, and output all need to be moved to and from the memory. In-memory-compute is proposed to reduce the amount of data movement by putting the data-path much closer to the memory cells. This becomes even more effective if the whole model, of moderate size, can be stored at once on the chip. This is very often the case for edge applications.
Researchers firstly focused on analog in-memory-compute (AIMC), which performs well with low precision. The main drawback of AIMC is that modification of bit-cell is usually needed, and sophisticated analog circuits are needed to lower the impact of variability. This result in low area efficiency for most published AIMC macros. Digital in-memory-compute (DIMC), on the other hand, is more robust to variability. Thus IMC can achieve potentially much higher density. We proposed a 128KB DIMC macro in 16nm finFET. The macro's area is 0.5mm2, achieving max 23.8TOPs/W for 8b MAC. Storage density of this macro is 256kB/mm2. Compute density is 0.364TOPs/mm2. Among all >10kB IMC designs, we achieved the highest FoM (defined as storage density (kB/mm2) * compute density (TOPs/mm2)). DIMC opens a realm of options for effeceinte calucation of AI algorithms on the edge.
W. Jiang, P. Houshmand, M. Verhelst and W. Dehaene, "A 16nm 128kB high-density fully digital In Memory Compute macro with reverse SRAM pre-charge achieving 0.36TOPs/mm2, 256kB/mm2 and 23. 8TOPs/W," ESSCIRC 2023- IEEE 49th European Solid State Circuits Conference (ESSCIRC), Lisbon, Portugal, 2023, pp. 409-412, doi: 10.1109/ESSCIRC59616.2023.10268774.