As the demand for intelligent devices continues to rise, the need for specialized programmable hardware for machine learning becomes increasingly apparent. However, the highly-specific nature of this hardware also poses a significant challenge in terms of programming and deployment. To address this issue, multi-level machine learning (ML) compilers such as Apache TVM and MLIR-based compilation flows have emerged as effective solutions, capable of automatically generating optimized ML compute kernels for a wide range of hardware, including GPUs, CPUs, and custom NPUs.
Despite their efficacy, the process of porting these compilers to new hardware remains a complex and time-consuming task, requiring extensive knowledge of both the target hardware and software algorithms. This research aims to leverage multi-level compiler technology, such as MLIR, to enhance the integration of hardware and software development flows, enabling a tighter coupling of compilers and associated ML compute hardware.
The ultimate goal of this research is to improve the utilization of new compute hardware, making machine learning more efficient and widely accessible. Through the better integration of hardware and software development flows, we aim to achieve more efficient hardware, more efficient compilers, and ultimately, more efficient ML compute. This has the potential to expand the use of machine learning in a wide range of scenarios.