LVC21-113 TensorFlow Lite Delegates on Arm based Devices

Показать описание

TensorFlow Lite is a popular open-source framework that enables Machine Learning on mobile, embedded, and IoT devices by default using the Arm NEON instruction set. The delegate mechanism takes TensorFlow Lite one step further and provides a mechanism to use on-device hardware accelerators such as the GPU, NPU, or Digital Signal Processor.

Session Detail:
The session focuses on TensorFlow Lite, and especially its delegate mechanism. Tensorflow Lite is a widely used open-source framework developed by Google for mobile, embedded, and IoT machine learning. By default, TensorFlow Lite kernel implementation is optimized using Arm NEON instruction set for Arm Cortex-A cores. The delegate mechanism provides the means to leverage the computational power of on-device accelerators such as the GPU, NPU, or DSP and offloads the execution of machine learning kernels to another framework, or an optimized kernel library. Such examples are the Arm NN delegate developed by Arm for the Linaro Machine Learning initiative, NN API delegate, or XNNPACK delegate.

We will briefly cover what is TensorFlow Lite and then dive deeper into the delegate mechanism. We will go over several examples of TF Lite delegates. The new Arm NN delegate, which was released in Arm NN 20.11 (an inference engine for machine learning developed by Linaro Machine Learning Initiative). The NN API delegate, which is one of the default delegates designed for Android and used for acceleration on the Arm-based NXP i.MX8 platform. We will also go over why should a developer consider implementing a custom delegate, how to do that. In the end, we will discuss delegate performance and operation support.

Flow:
1) Introduction to TF Lite
A brief introduction to Google’s embedded machine learning framework.

2) Introduction to Delegates
An introduction to the delegate mechanism, which offloads the execution of TF Lite machine learning kernels to another framework, driver, hardware, or other mechanisms, which enable acceleration.

3) NN API delegate
A default delegate integrated into TF Lite, which offloads the execution using Android NN API.

4) Arm NN delegate (new in Arm NN 20.11)
A newly implemented delegate in Arm NN 20.11, which offloads the execution to the Arm NN framework.

5) Performance benchmarking and operation support discussion
Options to benchmark the performance of the delegate. Handling of delegate unsupported operators and graph partitioning.

6) Implementing a custom delegate
Considerations why to implement a custom delegate, how to implement such a delegate, and a simple example.