FAST Flux GGUF for low VRAM GPUs with Highest Quality. Installation, Tips & Performance Comparison.

preview_player
Показать описание
We install the new GGUF node on ComfyUI locally for NVIDIA or AMD GPUs.
The image generation examples show both the great quality as well as the detailed performance, followed by tips & tricks including Flux.1 DEV and Flux.1 SCHNELL.

Videos:

Links:

The GGUF Models:
About Quantization:

PLEASE CHECK THE PINNED COMMENT FOR UPDATES !

Chapters:
0:00 About Flux and GGUF
1:02 GGUF Installation
2:54 ZLUDA Update
3:46 Adding GGUF Loader
4:33 GGUF Models and Test
6:43 Example Generation
7:45 Result Comparison
9:47 Performance Details
13:16 Key findings

#comfyui #flux #gguf #stablediffusion
Рекомендации по теме
Комментарии
Автор

How is your performance with the low-VRAM GGUF quantized models?
UPDATE 2: Speed increased x2 with flux1-DEV/SCHNELL-Q5_K_S.gguf in comparison to the original models (tested on AMD GPU). Important, you have to start with runtime parameter ‘--force-fp32’ and although this parameter speeds up the quantized models, it slows down the original ones! T5 text encoder model has currently zero influence on my machine's performance, so I choose T5xxl_fp16.
Many different sizes are available, choose Q5_K_M or larger. Place in your ‘clip’ folder.
Update the GGUF node (do a 'git pull' in this node's directory), most probably you have to update ComfyUI as described in my vid, too.
Replace the clip loader in your model with the new 'DualCLIPLoader (GGUF) to be found in 'Add Node->bootleg'.

NextTechandAI
Автор

I get "RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3072 and 1320x18432)
mat1 and mat2 shapes cannot be multiplied (1x3072 and 1320x18432)" when trying to use these new GGUF models with Forge UI. Does it even work with Forge?

TheGalacticIndian
Автор

This use slightly less vram (q8 vs fp8) and (f16 vs fp16) but it's not faster, as long as you don't overpass your vram pool, the speed will remain the same. Same thing for schnell, q4 render in 4seconds on a 4090, just like fp16. Unified or "baked" BNB NF4 flux models was way faster to load, but not compatible with LoRA and now considered as Deprecated.

crypt_exe
Автор

I have nvidia 3060 6gb vram and am able to run flux-dev model easily which takes roughly 1 min 15 sec to generate with the the 6.2gb gguf file

vivekkarumudi
Автор

How much do you "lose" using these models? Like my GPU can handle the full dev model, but its slow. Would using these models be faster but at the same quality or do you lose noticeable quality? Also what do the different Q* files mean?

JohnVanderbeck
Автор

Great video. Bravo. I wanted to ask you if you can do an update using the new t5_v1.1-xxl GGUF
Thank you

Giorgio_Venturini
Автор

For a 12 GB 3080Ti what models do you recommend?

LX
Автор

I'm very new to this world and i'm learning quite a bit from your videos. One thing I don't quite fully understand, maybe you can help:
My system is: Radeon RX 7700XT (12gb vram) GPU / Ryzen 78000X3D / 64GB DDR5 .
I'm running comfyUI in Ubuntu, followed one of your great tutorials.

I just can't run any fp16 versions of the models because it says it's running out of memory.

So, does the the 16 in fp16 there means 16gb vram required?
Can I somehow leverage my big pc ram size for something?
Is the CPU useful for anything in those scenarios?

BrunoOrsolon
Автор

3060ti. Took 10 minutes to finish 8 image on flux. 😂😂😂😢😢

ericcheah
Автор

NF4 > GGUF. GGUF is slower due to being compressed and NF4 was optimized for speed. As a trainer I wish either I could use to train with and this is hideously slow being forced to BS1 on a 4090.

generalawareness
Автор

hi! im lost in the Zluda step, im not using comfyui portable, i have cloned comfyui-zluda from patientx, in wich folder need to pip install gguf? thx for ur videos

luxiland
Автор

But I only have a black image as a result.Anything I did bad ? Please help me adv thanks

tamizhanVanmam
Автор

To uninstall or remove nf4, do I just delete the ComfyUI_bitsandbytes_NF4 folder?

DarwinsGreatestHits
Автор

im on AMD, using Zluda and ComfyUI is up to date and i can see flux support in the patch notes inside comfy manager but it cannot get the dualclip into flux mode is there an extra step required that i could have missed ?

Gwaboo
Автор

Which model do you recommend for Rtx 4070ti 12gb v ram?

lowaura
Автор

Hi I'm new to all this - I don't see bootleg within the add node dropdown? Little help... Ah I got it the cmd didn't install on the first try for some reason :P

SamiHD
Автор

hi! have a 6700xt 12gb vram comfyui with zluda and stable diffusion 1.0 xl when put 1024 x 1024 crash the app, the terminal show a message CUDA out of memory tried to allocate 2.50 GiB sorry my question is not from this video T_T thx for all your excellent videos.

luxiland
Автор

Do you need the 23GB F16 files also? for scnel and dev

boltr
Автор

About 355 seconds running "flux1-dev-Q4_K_S" in ComfyUI on a Mac Studio (96GB/38-Core GPU). So, still unusable for me, but par for the course because Apple doesn't care about MPS and open source.

rmeta
Автор

I am struggling a bit with Flux. I have a GeForce 3080 Ti, which is nothing to be scoffed at, and driver version 560 is installed on Windows. I tried a bunch of different workflows with dev FP8, and all of them are super slow. I only have 64 gigs of DDR5 RAM. But I haven't read anywhere that it should be a problem.

shushens