tinyML EMEA 2022 - Andrew Reusch: Whole-model optimization with Apache TVM

preview_player
Показать описание
tinyML EMEA 2022
Algorithms, Software & Tools session
Whole-model optimization with Apache TVM
Andrew REUSCH, Software Engineer, OctoML

Optimized deep learning kernels are crucial for achieving good performance in deployed ML models. Increasingly, developers are turning to deployment tools to assemble these optimized kernels into full models. However, per-kernel optimizations have limited impact on a full model’s throughput, especially when heterogenous compute platforms are in use or when the underlying hardware is designed to execute operators concurrently.
In this talk, I’ll describe how Relax, a new model-level language in Apache TVM, enables hardware vendors to easily apply common hardware optimization techniques such as striping and global memory planning across the full program. With Relax, these techniques can be easily tailored towards both individual accelerators and heterogeneous compute environments. I’ll lastly discuss our future plans to integrate Relax with Apache TVM’s Ahead-of-Time compilation flow, making it available in a low-overhead runtime targeted to bare-metal environments.
Рекомендации по теме