Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

preview_player
Показать описание
🚀 UNLOCK INSANE SPEED BOOSTS with NVIDIA's Latest Driver Update or not? 🚀 Are you ready to turbocharge your AI performance? Watch me compare the brand-new NVIDIA 555 driver against the older 552 driver on an RTX 3090 TI for #StableDiffusion. Discover how TensorRT and ONNX models can skyrocket your speed! Don't miss out on these game-changing results!

1-Click fresh Automatic1111 SD Web UI Installer Script with TensorRT and more ⤵️

0:00 Introduction to the NVIDIA newest driver update performance boost claims
0:25 What I am going to test and compare in this video
1:11 How to install latest version of Automatic1111 Web UI
1:40 The very best sampler of Automatic1111 for Stable Diffusion image generation / inference
1:57 Automatic1111 SD Web UI default installation versions
2:12 RTX 3090 TI image generation / inference speed for SDXL model with default Automatic1111 SD Web UI installation
2:22 How to see your NVIDIA driver version and many more info with nvitop library
2:40 Default installation speed for NVIDIA 551.23 driver
2:53 How to update Automatic1111 SD Web UI to the latest Torch and xFormers
3:05 Which CPU and RAM used to conduct these speed tests CPU-Z results
3:54 nvitop status while generating an image with Stable Diffusion XL - SDLX on Automatic1111 Web UI
4:10 The new generation speed after updating Torch (2.3.0) and xFormers (0.0.26) to the latest version
4:20 How to install TensorRT extension on Automatic1111 SD Web UI
5:28 How to generate a TensorRT ONNX model for huge speed up during image generation / inference
6:39 How to enable SD Unet selection to be able to use TensorRT generated model
7:13 TensorRT pros and cons
7:38 TensorRT image generation / inference speed results
8:09 How to download and install the latest NVIDIA driver properly and cleanly on Windows
9:03 Repeating all the testing again on the newest NVIDIA driver (555.85)
10:06 Comparison of other optimizations such as SDP attention or doggettx
10:35 Conclusion of the tutorial

NVIDIA's Latest Driver: Does It Really Deliver?

In this video, we dive deep into NVIDIA's newest driver update, comparing the performance of driver versions 552 and 555 on an RTX 3090 TI running Windows 10. We'll explore the claims of speed improvements, particularly with #ONNX runtime and TensorRT integration, using the popular Automatic1111 Web UI.

What You'll Learn:

Driver Comparison: Direct performance comparison between NVIDIA drivers 552 and 555.
Setup and Installation: Step-by-step guide on setting up a fresh #Automatic1111 Web UI installation, including the latest versions of Torch and xFormers.
ONNX and TensorRT Models: Detailed testing of default and TensorRT-generated models to measure speed differences.
Hardware Specifications: Insights into the hardware used for testing, including CPU and memory configurations.
Testing Procedure:

Initial Setup:
Fresh installation using a custom installer script which includes necessary models and styles.
Initial speed test with default settings and configurations.
Driver 552 Performance:
Speed testing on driver 552 with default models and configurations.
Detailed performance metrics and image generation speed analysis.
Upgrading to Latest Torch and xFormers:
Updating to the latest versions of Torch (2.3.0) and xFormers (0.0.26).
Performance testing after updates and comparison with initial setup.
TensorRT Installation and Testing:
Installing TensorRT extension and generating TensorRT models.
Overcoming common installation errors and optimizations.
Speed testing with TensorRT models and analysis of performance improvements.
Upgrading to Driver 555:
Step-by-step guide on downloading and installing NVIDIA driver 555.
Performance comparison between driver 552 and 555.
Analyzing the impact on speed and efficiency.
Results and Conclusions:

Performance Metrics: Detailed analysis of speed improvements (or lack thereof) with the newest NVIDIA driver.
TensorRT Benefits: How TensorRT models significantly boost performance.
Driver Update Impact: Understanding the real-world impact of updating to the latest NVIDIA driver.
Рекомендации по теме
Комментарии
Автор

1-Click fresh Automatic1111 SD Web UI Installer Script with TensorRT and more ⤵

SECourses
Автор

I spent 11 minutes on the video, but saved 1 hour on the tests, thank you for the time you've saved

Artazar
Автор

Great vid! I got best performance with older install and latest nvid studio driver 555.99 (from 546)
parameters for best performance (on win 10 RTX 4090):
v1.8.0 python3.10.11 torch 2.2.0CU121 xform 0.0.24 gradio 3.41.2
speed w/o tensorRT 6.84it/s
speed w tensorRT 10.9it/s

mtnmecca_ej
Автор

9:47 Nvidia said that the speedup with the new driver is only for LLMs (Large Language Models). It's not for image generation. :) That's why you don't see a speedup with it when generating images. BTW, as far as I know, you can't use Control Net with TensorRT which makes TensorRT pretty much useless for me and I assume a lot of other people. Can you kindly test and confirm that this is true? Also, is there any progress on restoring settings from PNG images generated with SUPIR? This will be really handy.

bgtubber
Автор

I particularty dont like this testing scheme, as for example on my pc after some generetions or switching models, some memory leaks or stuck somewhere, and even it shows 0 gpu usage, all forward performance will be negative affected. I suggest after any changes to a1111 completely reboot system to test new settings or test results can be affected.

YakaBita
Автор

I wonder if the speed impact varies depending on the GPU. would it have a bigger impact on a 4090, or on a 3060

pn
Автор

4:47 Hocam bu sorunu Forge'da nasıl çözeriz acaba biliyor musunuz?

clemenwine
Автор

Can you help me with a problem i have with tensort and A1111?

DezorianGuy
Автор

This is the error message I'm receiving with using any SDXL model with TensorRT engines, even though I was able to generate engines, however I was not able to use them w/o encountering this error message:

"Warning Enabling Pytorch Fallback as no engine was found"

Do I need to change a setting(s) A1111 Webui's "cross attention optimization"? Is my RTX 3070 8GB VRAM causing this issue? I'm able to use SD 1.5 checkpoint TensorRT engines & previously was able to with RealVisXL, with absolutely no success with original SDXL Base 1.0 for generating text to image. Hopefully you have a fix, as I really appreciate the speed of TensorRT with xformers...its like a night & day difference especially when using your Incantations Extension with adetailer extension (After Detailer)!

markschrader
Автор

TensorRT doesnt support loras so you cant use lora with them, useless for now

xuzygex
Автор

All of this and no speed increase! lol

lucianodaluz