2023 EuroLLVM - ML-on-CPU: should vectorization happen in the LLVM backend or higher up the stack?

preview_player
Показать описание
2023 European LLVM Developers' Meeting
------
ML-on-CPU: should vectorization happen in the LLVM backend or higher up the stack?
Speaker: Elen Kalda
------
-----
This talk is about how TVM, one of the most mature machine learning compilation stacks in ML space, interacts with LLVM. TVM is a domain specific compiler that consumes a machine learning model expressed in high level ML framework like TensorFlow or PyTorch and compiles it for a chosen target, such as Arm(R) architecture. For CPU targets, it does this by using LLVM as a backend, directly translating TVM's IR into LLVM IR.

In TVM, just like in other Machine Learning stacks using LLVM as a backend for CPU code generation, one needs to make a decision about where optimizations like vectorization should happen: in the LLVM backend, or in the ML stack higher up. This is further complicated by the emergence of scalable vectors, like the Scalable Vector Extension (SVE). While generating code for fixed length vectors can mostly be left to LLVM, there is a case to be made for the presence of variable length vectors in TVM stack, to be able to more effectively use the capabilities of SVE. In this talk, we're going to present our experiences and insights on the trade-offs targeting SVE in the TVM+LLVM stack.
-----
Рекомендации по теме