Further evolution of applications in robotics, autonomous vehicles, biomedical wearables and so on rely on ultra low energy consumption of the electronics circuits they encompass. Ever increasing leakage and technological variability are the critical phenomena. Also, the way the design abstraction layers needed to deal with ever growing complexity, are structured causes energy overhead. Thus designers are forced to rethink their design strategies. The classical strategy of adding margins to the design to keep the design targets within the required windows leads to prohibitive oversizing, with all the extra energy and area cost that comes with it. Over the last decade MICAS has been exploring design strategies for digital processors and memories to deal with new technology properties on the one hand and the ever more demanding application requirements on the other hand.
Timing closure as of today is based on (statistic) static timing analysis. Basically, this means that the delay of the critical path is required to stay below the clock period minus sequencing overhead in the worst case. When all timing derating – process, supply voltage, temperature, aging, local variability - is added up this leads to oversized designs with huge amounts of hold buffers and a power supply voltage that is overrated. This can partially be mitigated by adding replica timing detectors, driving a voltage and frequency adaption system, to the circuit. Replica’s, however, are ineffective for time varying or intra-die variability. In-situ error detection and correction can take the effectivity of voltage and frequency scaling much further as our recent publications have shown. In MICAS we investigate several techniques for EDAC. We focus on both the effectivity of the techniques. For this, different kinds of timing detection are considered: e.g., end point based or activity-based detection. A second focus is the automatic insertion of EDAC circuits. The goal is to make EDAC insertion a fully automated step in the design flow.
Overview of completion detection timing error detection: a late signal going from D0 to D2 causes a detected error
In a lot of modern systems the energy consumption of the memories it the major glutton. This is further aggravated by technological leakage and variability. This instigates the quest for ever more efficient memory circuits. In MICAS we focus on SRAM. We conceive advanced memory matrix architectures accompanied by ultra-low leakage periphery. Contrary to popular believe this memory matrixes are not efficient on the lowest voltage as SRAMs are leakage dominated circuits. This quest will continue in the coming years: pushing down the leakage with circuit optimizations that mitigate variability.
Machine learning becomes more ubiquitous every day. Neural net-based algorithms run in server farms as well as on smart phones. They can even be found in the most advanced hearing aids. No wonder that also the cry for energy efficiency sounds ever louder. To reduce energy in ML, a closer marriage between calculating logic and local storage is an attractive option. However, the tradeoff between area, energy and performance for these emerging in memory compute systems is, and will be for a long time, subject to research. Are we going for digital or analog in memory compute? How does this question map on a precision axis? Higher accuracies will require digital precision but how much memory do we actually implement? The answer to all these research questions is dependent on the neural network and thus the application at hand. Furthermore, not only the inference of the neural networks must be considered also real time learning will enter the game. Advanced – MICAS – circuit research will have to provide the answers.
Embedded systems need a high degree of programmability, while keeping their energy footprint down. This calls for embedded processors, customized towards specific classes of workloads. MICAS is both active in extending RISC-V cores with custom accelerators, as well as adapting processors and their periphery towards ULP operation. Also, unconventional compute architectures, such as coarse grain reconfigurable arrays and other non-Von-Neuman structures are part of the exploration landscape.