Accelerate Transformer inference on GPU with Optimum and Better Transformer

Показать описание

In this video, I show you how to accelerate Transformer inference with Optimum, an open-source library by Hugging Face, and Better Transformer, a PyTorch extension available since PyTorch 1.12.

Using an AWS instance equipped with an NVIDIA V100 GPU, I start from a couple of models that I previously fine-tuned: a DistilBERT model for text classification and a Vision Transformer model for image classification. I first benchmark the original models, then I use Optimum and Better Transformer to optimize them with a single line of code, and I benchmark them again. This simple process delivers a 20-30% percent speedup with no accuracy drop!

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️

Рекомендации по теме

Комментарии

Hey. I need some help. I like to mess around with these Ai models just for fun on huggingface. I had an Nvidia gpu with 12GB of vRAM. Decided to go for an AMD 7900 XT which has 20GB of vRAM to see if I could do a bit more with the models. Now I can't run pytorch with the gpu. It appears I need linux to do it but I also heard linux doesn't run well on my gpu. I suppose I could try dual booting but the problem there is my pc is also a media server in my house and I have programs that don't work in linux that I need to run. I'd really prefer one os that does it all. But I can't seem to figure it all out.

theonerm

Accelerate Transformer inference on GPU with Optimum and Better Transformer

Accelerate Transformer inference on GPU with Optimum and Better Transformer

Accelerate Big Model Inference: How Does it Work?

Supercharge your PyTorch training loop with Accelerate

How to run Large AI Models from Hugging Face on Single GPU without OOM

Accelerate Transformer inference on CPU with Optimum and ONNX

Nvidia CUDA in 100 Seconds

Handling Heavy-tailed Input of Transformer Inference on GPUs

'High-Performance Training and Inference on GPUs for NLP Models' - Lei Li

Better Transformer: Accelerating Transformer Inference in PyTorch at PyTorch Conference 2022

Hugging Face Infinity - GPU Walkthrough

Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing - Dan Sun

How to setup NVIDIA GPU for PyTorch on Windows 10/11

Accelerate Transformer inference on CPU with Optimum and Intel OpenVINO

Accelerating Stable Diffusion Inference on Intel CPUs with Hugging Face (part 1) 🚀 🚀 🚀

Hardware acceleration for on-device Machine Learning

PyTorch 2.0 Q&A: Optimizing Transformers for Inference

Getting Started with NVIDIA Torch-TensorRT

Supercharge your PyTorch training loop with 🤗 Accelerate

Run 70Bn Llama 3 Inference on a Single 4GB GPU

How to Load Large Hugging Face Models on Low-End Hardware | CoLab | HF | Karndeep Singh

Nvidia H100 GPU Explained in 60 Seconds | CUDA | Tensor | HPC | HBM3 #new #ai #technology #shorts

Introducing Accelerate & PEFT to Democratize LLM:Training & Inference LLM With Less Hardware

Herbie Bradley – EleutherAI – Speeding up inference of LLMs with Triton and FasterTransformer

Pipeline parallel inference with Hugging Face Accelerate