Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide

Показать описание

Stable Diffusion Gets A Major Boost With RTX Acceleration. One of the most common ways to use Stable Diffusion, the popular Generative AI tool that allows users to produce images from simple text descriptions, is through the Stable Diffusion Web UI by Automatic1111. In today’s Game Ready Driver, NVIDIA added TensorRT acceleration for Stable Diffusion Web UI, which boosts GeForce RTX performance by up to 2X. In this tutorial video I will show you everything about this new Speed up via extension installation and TensorRT SD UNET generation.

#TensorRT #StableDiffusion #NVIDIA

Automatic Installer Of Tutorial ⤵️

Tutorial GitHub Readme File ⤵️

0:00 Introduction to how to utilize RTX Acceleration / TensorRT for 2x inference speed
2:15 How to do a fresh installation of Automatic1111 SD Web UI
3:32 How to enable quick SD VAE and SD UNET selections from settings of Automatic1111 SD Web UI
4:38 How to install TensorRT extension to hugely speed up Stable Diffusion image generation
6:35 How to start / run Automatic1111 SD Web UI
7:19 How to install TensorRT extension manually via URL install
7:58 How to install TensorRT extension via git clone method
8:57 How to download and upgrade cuDNN files
11:23 Speed test of SD 1.5 model without TensorRT
11:56 How to generate a TensorRT for a model
12:47 Explanation of min, optimal, max settings when generating a TensorRT model
14:00 Where is ONNX file is exported
15:48 How to set command line arguments to not get any errors during TensorRT generation
16:55 How to get maximum performance when generating and using TensorRT
17:41 How to start using generated TensorRT for almost double speed
18:08 How to switch to dev branch of Automatic1111 SD Web UI for SDXL TensorRT usage
20:33 The comparison of image difference between TensoRT on and off
20:45 Speed test of TensorRT with multiple resolutions
21:32 Generating a TensorRT for Stable Diffusion XL (SDXL)
23:24 How to verify you have switched to dev branch of Automatic1111 Web UI to make SDXL TensorRT work
24:32 Generating images with SDXL TensorRT
25:00 How to generate TensorRT for your DreamBooth trained model
25:49 How to install After Detailer (ADetailer) extension and what does it do explanation
27:23 Starting generation of TensorRT for SDXL
28:06 Batch size vs batch count difference
29:00 How to train amazing SDXL DreamBooth model
29:10 How to get amazing prompt list for DreamBooth models and use them
30:25 The dataset I used for DreamBooth training myself and why it is deliberately low quality
30:46 How to generate TensorRT for LoRA models
33:30 Where and how to see TensorRT profiles you have for each model
36:57 Generating LoRA TensorRT for SD 1.5 and testing it
39:54 How to fix TensorRT LoRA not being effective bug

Рекомендации по теме

Комментарии

If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 ⤵

Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews ⤵

Playlist of StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img ⤵

SECourses

Holy smokes! I would have never thought to look into this without your tutorials. I cannot believe I have had this RTX for months and I did not think to do this. Maestro!! ('.')7

reapicus

So much value in this video, thank you for sharing this for free !
And its amazing that nvidia did this repo, probably in 2 years auto1111 will be considered like photoshop, stable diffusion skills will be valuable

ArtificialBeauties

very interesting, thanks! Works well on non dev automatic1111 version.

VooDooEf

Thank you! I hope there will be something like this for comfy

captainoctonion

RTX 3060Ti 4.28it/s>6.63it/s new sub, thanks. (i use the default engine)

RANDOM-ixpn

Performance is quite promising, I install using A1111 v1.6 without any problem, the speed are quite fast with 4 second generation compare to 6 seconds at resolution 768x768
but the time it takes to export engine each time we need to switch different resolution, or checkpoint or Lora is very time consuming. Sometime around more than 30mins for higher resolution to export engine using my RTX3060 12gb vram

kenrock

Now I guess you can install and work. Initial errors are removed, but there will be others:)
Thanks Furkan

michail_

Very informative video! Much appreciated.

covninja

Quite some huge packages to d/load :))

JackTorcello

If you've done a recent installation, the CUDA dll files will already be up to date and TensorRT will work already.

絵空事-oe

Img2img da 1152x1152 0.55 denoising ayarında RTX3080Ti ile 56snde render alırken TensorRT ile 15snde render alıyorum. Teşekkürler anlatım için. Ek olarak min ve optimum prompt token 75de tutulmalı ve sadece max tokeni yükseltmeliyiz (min ve optimum ne yaparsanız yapın eşitleniyor ve bug oluyor.). Bir de TensorRT profil modellerini silseniz bile sistemde hala var gözüküyor ve çalışmama bugu oluyor bunu kaldırmak için manuel olarak models.json u editlemek lazım.

Imquorra

Thanks for the video. How do I convert the new created JSON file to TRT file?

erickevinz

I don't have a nvidia folder in venv\Lib\site-packages although I did install the tensorRT extension from the extension tab.

vfbotgl

Like someone mentioned, for 4090 not worth. And every time generate model is non sense..

pastuh

Right now this is more of a proof of concept. It has some uses when running fixed pipelines with larger volumes of images. But there is not much benefits for the average A1111 user. In fact most of the time it will just eff up your workflow due to all the limitations

lennylein

Hey! Found u on github, question about min max prompt token count in tensorrt, did u try >75? There is 0.3 beta on github, but looks like no fix for that problem, issue still opened

chf

This is really useful for speeding up SDXL image generation. However, this thing requires much more bigger VRAM, need at least an Nvidia GPU with 12GB VRAM with Sysmem Fallback enabled. During the process, user should not doing anything besides it eg. browsing the internet to avoid the process interrupted abnormally.

Also, TensorRT will not work when --medvram or --lowvram flags enabled.

AmirZaimMohdZaini

keep getting ERROR:root:Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
No clue how to solve this. Been looking everywhere! I only have 4090 rtx, I dont have more than 1 gpu. Im in dev mode too! I went to your video hoping to find a solution.

TechMDYoutube

Hey!! Minute 32:32 you jumped the issue with the Loras not appearing in the list. I have this issue as well and it's not coming up after a restart. Any solution? Thank you, amazing video!

pablo.montero

Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide

Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide

Stable diffusion up to 50% faster? I'll show you.

Testing Stable Diffusion inpainting on video footage #shorts

Stable Diffusion Running on an NVIDIA RTX 4090 (Speed Test) Automatic 1111 (Vlads SD.Next)

Animating Products with Z-Depth Maps in Stable Diffusion (Houdini style)

Mastering Stable Diffusion: Common Errors and Easy Fixes

How to speed up Stable Diffusion to a 2 second inference time — 500x improvement

SD3 - Local Install Guide! FASTEST Way to run the new Model - Stable Diffusion 3

Explaining Prompting Techniques In 12 Minutes – Stable Diffusion Tutorial (Automatic1111)

AMD's Hidden $100 Stable Diffusion Beast!

Which nVidia GPU is BEST for Local Generative AI and LLMs in 2024?

LightningAI: STOP PAYING for Google's Colab with this NEW & FREE Alternative (Works with VS...

Mythbusters Demo GPU versus CPU

Deploy Stable Diffusion XL (SDXL) Inference on SaladCloud | GPU | Salad Portal

Local AI Just Got Easy (and Cheap)

Day in My Life as a Quantum Computing Engineer!

Revolutionizing Image Generation with NextJS and Replicate - Stable Diffusion Made Easy!

Coding Stable Diffusion from scratch in PyTorch

The Wrong Batch Size Will Ruin Your Model

Accelerating Stable Diffusion Inference on Intel CPUs with Hugging Face (part 2) 🚀 🚀 🚀

RTX 3060 12GB vs 4090 🤔 Do You Really Need an RTX 4090 for AI?

Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

Cosplay by b.tech final year at IIT Kharagpur

Stable Diffusion 2.0 Quickstart (webUI, local installation etc.)