Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

Показать описание

🚀 UNLOCK INSANE SPEED BOOSTS with NVIDIA's Latest Driver Update or not? 🚀 Are you ready to turbocharge your AI performance? Watch me compare the brand-new NVIDIA 555 driver against the older 552 driver on an RTX 3090 TI for #StableDiffusion. Discover how TensorRT and ONNX models can skyrocket your speed! Don't miss out on these game-changing results!

1-Click fresh Automatic1111 SD Web UI Installer Script with TensorRT and more ⤵️

0:00 Introduction to the NVIDIA newest driver update performance boost claims
0:25 What I am going to test and compare in this video
1:11 How to install latest version of Automatic1111 Web UI
1:40 The very best sampler of Automatic1111 for Stable Diffusion image generation / inference
1:57 Automatic1111 SD Web UI default installation versions
2:12 RTX 3090 TI image generation / inference speed for SDXL model with default Automatic1111 SD Web UI installation
2:22 How to see your NVIDIA driver version and many more info with nvitop library
2:40 Default installation speed for NVIDIA 551.23 driver
2:53 How to update Automatic1111 SD Web UI to the latest Torch and xFormers
3:05 Which CPU and RAM used to conduct these speed tests CPU-Z results
3:54 nvitop status while generating an image with Stable Diffusion XL - SDLX on Automatic1111 Web UI
4:10 The new generation speed after updating Torch (2.3.0) and xFormers (0.0.26) to the latest version
4:20 How to install TensorRT extension on Automatic1111 SD Web UI
5:28 How to generate a TensorRT ONNX model for huge speed up during image generation / inference
6:39 How to enable SD Unet selection to be able to use TensorRT generated model
7:13 TensorRT pros and cons
7:38 TensorRT image generation / inference speed results
8:09 How to download and install the latest NVIDIA driver properly and cleanly on Windows
9:03 Repeating all the testing again on the newest NVIDIA driver (555.85)
10:06 Comparison of other optimizations such as SDP attention or doggettx
10:35 Conclusion of the tutorial

NVIDIA's Latest Driver: Does It Really Deliver?

In this video, we dive deep into NVIDIA's newest driver update, comparing the performance of driver versions 552 and 555 on an RTX 3090 TI running Windows 10. We'll explore the claims of speed improvements, particularly with #ONNX runtime and TensorRT integration, using the popular Automatic1111 Web UI.

What You'll Learn:

Driver Comparison: Direct performance comparison between NVIDIA drivers 552 and 555.
Setup and Installation: Step-by-step guide on setting up a fresh #Automatic1111 Web UI installation, including the latest versions of Torch and xFormers.
ONNX and TensorRT Models: Detailed testing of default and TensorRT-generated models to measure speed differences.
Hardware Specifications: Insights into the hardware used for testing, including CPU and memory configurations.
Testing Procedure:

Initial Setup:
Fresh installation using a custom installer script which includes necessary models and styles.
Initial speed test with default settings and configurations.
Driver 552 Performance:
Speed testing on driver 552 with default models and configurations.
Detailed performance metrics and image generation speed analysis.
Upgrading to Latest Torch and xFormers:
Updating to the latest versions of Torch (2.3.0) and xFormers (0.0.26).
Performance testing after updates and comparison with initial setup.
TensorRT Installation and Testing:
Installing TensorRT extension and generating TensorRT models.
Overcoming common installation errors and optimizations.
Speed testing with TensorRT models and analysis of performance improvements.
Upgrading to Driver 555:
Step-by-step guide on downloading and installing NVIDIA driver 555.
Performance comparison between driver 552 and 555.
Analyzing the impact on speed and efficiency.
Results and Conclusions:

Performance Metrics: Detailed analysis of speed improvements (or lack thereof) with the newest NVIDIA driver.
TensorRT Benefits: How TensorRT models significantly boost performance.
Driver Update Impact: Understanding the real-world impact of updating to the latest NVIDIA driver.

Рекомендации по теме

Комментарии

1-Click fresh Automatic1111 SD Web UI Installer Script with TensorRT and more ⤵

SECourses

I spent 11 minutes on the video, but saved 1 hour on the tests, thank you for the time you've saved

Artazar

Great vid! I got best performance with older install and latest nvid studio driver 555.99 (from 546)
parameters for best performance (on win 10 RTX 4090):
v1.8.0 python3.10.11 torch 2.2.0CU121 xform 0.0.24 gradio 3.41.2
speed w/o tensorRT 6.84it/s
speed w tensorRT 10.9it/s

mtnmecca_ej

9:47 Nvidia said that the speedup with the new driver is only for LLMs (Large Language Models). It's not for image generation. :) That's why you don't see a speedup with it when generating images. BTW, as far as I know, you can't use Control Net with TensorRT which makes TensorRT pretty much useless for me and I assume a lot of other people. Can you kindly test and confirm that this is true? Also, is there any progress on restoring settings from PNG images generated with SUPIR? This will be really handy.

bgtubber

I particularty dont like this testing scheme, as for example on my pc after some generetions or switching models, some memory leaks or stuck somewhere, and even it shows 0 gpu usage, all forward performance will be negative affected. I suggest after any changes to a1111 completely reboot system to test new settings or test results can be affected.

YakaBita

I wonder if the speed impact varies depending on the GPU. would it have a bigger impact on a 4090, or on a 3060

pn

4:47 Hocam bu sorunu Forge'da nasıl çözeriz acaba biliyor musunuz?

clemenwine

Can you help me with a problem i have with tensort and A1111?

DezorianGuy

This is the error message I'm receiving with using any SDXL model with TensorRT engines, even though I was able to generate engines, however I was not able to use them w/o encountering this error message:

"Warning Enabling Pytorch Fallback as no engine was found"

Do I need to change a setting(s) A1111 Webui's "cross attention optimization"? Is my RTX 3070 8GB VRAM causing this issue? I'm able to use SD 1.5 checkpoint TensorRT engines & previously was able to with RealVisXL, with absolutely no success with original SDXL Base 1.0 for generating text to image. Hopefully you have a fix, as I really appreciate the speed of TensorRT with xformers...its like a night & day difference especially when using your Incantations Extension with adetailer extension (After Detailer)!

markschrader

TensorRT doesnt support loras so you cant use lora with them, useless for now

xuzygex

All of this and no speed increase! lol

lucianodaluz

Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

Stable Diffusion Running on an NVIDIA RTX 4090 (Speed Test) Automatic 1111 (Vlads SD.Next)

RTX 3060 12GB vs 4090 🤔 Do You Really Need an RTX 4090 for AI?

Which nVidia GPU is BEST for Local Generative AI and LLMs in 2024?

AMD's Hidden $100 Stable Diffusion Beast!

Testing Stable Diffusion inpainting on video footage #shorts

When M1 DESTROYS a RTX card for Machine Learning | MacBook Pro vs Dell XPS 15

How to obtain best inference performance for Stable Diffusion models on Mac? #ml #m2

Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide

Mythbusters Demo GPU versus CPU

M3 max 128GB for AI running Llama2 7b 13b and 70b

Downgrading My GPU For More Performace

DALL-E, Stable Diffusion, Midjourney - AI Image Generation Tested- Revolutionary for Game Devs?

Stable diffusion up to 50% faster? I'll show you.

MI210s vs A100 -- Is ROCm Finally Viable in 2023? Tested on the Supermicro AS-2114GT-DNR

Nvidia L40s - The Ultimate GPU for Deep-Learning | Enabling Generative AI for Enterprises.

All You Need To Know About Running LLMs Locally

Local AI Just Got Easy (and Cheap)

Day in My Life as a Quantum Computing Engineer!

How to Choose an NVIDIA GPU for Deep Learning in 2023: Ada, Ampere, GeForce, NVIDIA RTX Compared

REALITY vs Apple’s Memory Claims | vs RTX4090m

The Wrong Batch Size Will Ruin Your Model

Cheap vs Expensive MacBook Machine Learning | M3 Max

PyTorch in 100 Seconds