2X PERFORMANCE PLUGIN 🤯 OFFICIAL A11 STABLE DIFFUSION UPDATE GUIDE

preview_player
Показать описание
TensorRT, ONNX, Olive and other tech. Things are very early in terms of development, but we already have our hands on an EXTENSION that should DOUBLE your performance with AUTOMATIC1111's Stable Diffusion WebUI using TensorRT. This speed boost should be supported by all Nvidia RTX graphics cards.

Timestamps:
0:00 - New TensorRT, Olive, DirectML and other speed news
1:40 - Double performance extension for SDUI (TensorRT)
2:00 - Limitations
2:30 - Breakdown of what we need to do
3:08 - Downloading TensorRT from Nvidia
4:00 - Installing TensorRT plugin
4:36 - Installing TensorRT from Nvidia
5:41 - Use AUTOMATIC1111 Dev Branch (Fix sd_unet error)
6:07 - TensorRT tab in AUTOMATIC1111
6:32 - Preparing Model to ONNX
7:12 - Fix "RuntimeError: invalid unordered_map K, T key"
8:00 - Converting model to ONNX
8:12 - Converting ONNX to TensorRT
9:40 - Speeding up ONNX to TensorRT
10:50 - What to do after converting model
11:50 - Before and after (2x PERFORMANCE!)
13:10 - Do LORAs, TIs and other models work?

#AI #StableDiffusion #Performance
-----------------------------
-----------------------------
-----------------------------
🖥️ My Current Hardware (Links here are affiliate links. If you click one, I'll receive a small commission at no extra cost to you):
🎙️ My Current Mic/Recording Gear:

Everything in this video is my personal opinion and experience and should not be considered professional advice. Always do your own research and ensure what you're doing is safe.
Рекомендации по теме
Комментарии
Автор

I'll wait until control net is supported too and then see how it actually increases the performance on bigger images and especially for latent upscaling. Creating tons of low resolution images even faster doesn't add much for me. Videos sequences might be a use case but for this control net is a must have too.

testales
Автор

Thats amazing video, looking to see how far we go from here, and subscribed.

bomar
Автор

your a Legend!! great video best thing about this is you got a 3080ti like me 🤣

Nithproxi
Автор

Using the same GPU, a 3080 Ti, with a xformer model, I achieved a speed of approximately 19 iterations per second. However, we can expect even better performance, around 40 iterations per second, if we combine both the xformer and TensorRT and utilize them simultaneously. I have no idea if things work this way.

MaxPayne-of
Автор

unfortunately it did not work with the RTX 2060 6GB (hopefully for now), after several hours trying to install PyCuda and getting the extension to work, with many OUT OF MEMORY warnings the .TRT was compiled but when loading it and trying to generate an image, everything crashes

allendimetri
Автор

Amazing! Since can't support controlNET yet, I'll wait until they do to test it.

ianhmoll
Автор

however i do the checkout command the error on sd_unet doesn't disappear and the extension tab doesn't appear. please help

franckparkinson
Автор

Very interesting video, I hope they add Lora support that is not needed to be baked in. Is the file size the same as the original though? that would double the hard drive space we use

GamingDaveUK
Автор

i was unable to get a speed boost on a RTX 2060 6gb vram. one image is 4.5 seconds with and without using a TensorRT converted model

appolonius
Автор

I can sometimes get it working if I create images with increased batches and batch size, but it is very finicky and it bugs out if I try to generate only one single image and sometimes multiple images doesn't work unless I set the batches and batch size high enough, also using the hires fixes also doesn't work.
I get the following error: "bad shape for TensorRT input x" a lot if I try and do single images, not enough batches and batch size, or add hires fixes.

Adohleas
Автор

It is really fast, but the results have nothing to do with the original model used. Sometimes can be nice, but in general if you are using loras it loses a lot of details...

leandrozanardo
Автор

i need help with one of your previous video for installing stable diffusion in mac. do you mind to check there and help those people in need?

JC-jnjz
Автор

I look forward to trying this. For some reason my stable diffusion install always hits an error loading xtensors and I can't figure out why but it runs so I haven't worried. It looks like that might be a problem for this model conversion though so I'll have to hope for the best.

carnacthemagnificent
Автор

Does the windows 10 version of "TensorRT 8.6 GA for Windows 10 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7 and 11.8 ZIP Package" works on windows 11?

LibertyRecordsFree
Автор

Hello! nice video! anyway im using ONNX to Tensor RT but when i change setting to max 1024 instead of 512 it always error and crash ? why? does it depend on model?

Spindonesia
Автор

The speed increase isn't enough to break pretty much evety lora you have and go through all those issues for.

TPCDAZ
Автор

Why did you censor the model selected in a1111 webui 🤨

ruzanmuhammedasher
Автор

but the total time generation is the same ~7.2 sec, what's the point?

stavsap
Автор

looks crap, though the speed is phenomenal, waiting for more benchmarks and reviews. the conversion of lots of files is holding me back.

blind
Автор

Wait so if I bake a lora in I could still use it by using the right words?

Backtitrationfan