Research Goal: The aim of this research project is to design efficient and scalable hardware systems for probabilistic machine learning models. Unlike the "black box" deep learning methods, probabilistic models are gaining popularity due to their ability to integrate domain knowledge, deal with uncertainty, and produce interpretable results. However, the inference on probabilistic models is computationally intensive and requires a large memory footprint. Designing hardware for probabilistic models allows for the optimization of computation and memory utilization, leading to faster and more energy-efficient processing. Additionally, dedicated hardware can enable the use of probabilistic models in resource-constrained environments, such as mobile devices and Internet of Things (IoT) devices.
Gap in SotA: The current state-of-the-art (SotA) in ML processors is primarily focused on accelerating deep learning workloads, with little emphasis on Bayesian or probabilistic inference acceleration. The major challenges in accelerating Bayesian inference operations are its need for sequential data processing and its frequent updates of large amounts of (irregular) data structures, making it difficult to map the computation on widely parallel hardware platforms. This results in a lack of energy efficient compute platforms for the compute intensive probabilistic inference algorithms, preventing their application in edge devices. There is hence a clear need for flexible hardware solutions that can handle the dynamic and changing requirements of Bayesian inference algorithms in real-world applications.
Results: This research project started with the development of a basic hardware block for probabilistic inference: the Knuth-Yao sampler. Generating random variables is the fundamental operation in this field, as typical approximate Bayesian inference involves the generation of billions of probabilistic values. As such, hardware samplers became the bottleneck to reduce overall energy consumption and increase performance. We evaluated a wide variety of sampler architectures for discrete probability distributions in terms of speed, area, and energy consumption. The novel reconfigurable Knuth-Yao sampling architecture that supports both flexible range and dynamic precision provides up to 13x energy efficiency benefits and 11x area efficiency improvement over the traditional linear CDT-based samplers used in the SotA accelerators which is suitable for our workloads.
Next, we are integrating this sampler into a complete probabilistic ML accelerator.