ComfyUI SDXL Basic Setup Part 3

preview_player
Показать описание
A barebones basic way of setting up SDXL
Requires:
ComfyUI manager (for easiest way of finding custom nodes)
WAS suite
Derfuu's Math Nodes
Quality of life suite
Рекомендации по теме
Комментарии
Автор

I thought I'd add something - as of December 27, 2023, there still no 'real' understanding of the 'focus' function. It can, however, be said that CLIP_G and CLIP_L are contentious at best, to the point where most extensions for comfy just concatenate 2 string and call it a day. I still divide my prompts the way you do, because i learned from the SDXL whitepaper, but I'll be the first to admit that finding determinism has been REALLy hard and fraught with frustration. If anyone has a DEFINITIVE link beyond "G is for subject, understands natural language, is 'weighted' stronger than L" and "L is for tokenised input more related to context", I'd love to know. Apparently someone (this is just gravevine of course) managed to reach the devs and they said something along the lines "Even us don't really use this".

As for sizing post creation of the latent in the CLIP encoder, I found the following info:

"Image Size - instead of discarding a significant portion of the dataset below a certain resolution threshold, they decided to use smaller images. However, image size (height and width of the image) is fed into the model.

Cropping Parameters - during the training process, some cropping happens as not all aspect ratios are supported. So when there is an image with an unsupported aspect ratio, the longer side of the image is randomly cropped to fit the desired size. In this process, some important image parts can be cropped out (the top part of the face, for example). To address this, during the training process, the cropping values were fed to the model, and it’s mentioned in the text, during the inference, setting the crop to (0, 0) fixes most of the issues that crop typically introduces.

Aspect Ratio (bucket size) - finally, SDXL was fine-tuned on various aspect ratio images. The images were bucketed into some number of supported aspect ratios, and during the training, the values of these buckets were fed into the model. So, in theory, we should match the aspect ratio of the target image and this conditioning value during the image generation. Still, also, there is a chance, that we will discover some interesting combinations that violate this rule.
"

stephantual
Автор

These are amazingly helpful. Thank you so much!

lennygordon
Автор

Love your tutorials mate and I hope you continue them!

patarikihale
Автор

Thanks for these tutorials. I assume your workflow file is on reddit, but have been following your tutorial videos just on youtube. Would be good for you to link them in the video about section. I'll have a look on reddit as well after I finish your videos.

Kryptonic
Автор

Derfuu conflicts with Float [ComfyLiterals] & Float [ComfyUI-Logic]. I tried to ignore the conflict, installed Derfuu anyway, called the 'integer' node, and blue-screened my computer with auto-reboot. Not sure why is that happening...

EnzoLuo
Автор

How are you creating your group node boxes that contain the model busses and latent approach logic?

ezrapaulekas
Автор

Enjoying the series of Turoials. I just cant find your Text Box. Its not in my Comfy even after installing WAS suite. Help please

simonmcdonald
Автор

why did u set 2048 for width and height, plus can u explain how the clip is related to height and width, theoretically , like how are they connected to each other pls ?, i always thought that clip is only a model that helps with understanding and converting prompts into embeddings as tensors, is it the dimension of the tensor ?

avinashu
Автор

please, from where can I get (Text Concatenate)

zraieee
Автор

Ok Bro, where are the integer Last time I just downloaded everything and bsod'd my rig. Much obliged for the heads up, Imma go smokestack Lightning for a

pandemicye
Автор

I made a series of images based on albums of a band, by putting the titles in the prompt and using always the same stylistic settings like oil painting etc. I tried to do it by splitting it up in text_g (album title) and text_l (stylistic settings) and got the impression that the results were rather arbitrary, no matter what the title was. Made some more tests without any titles, just the style and got similar results. As it turns out, if you don't want to put the same prompt in both inputs but split it up, it's better to put the style prompt into text_g and the subject into text_l. But even then... the best results I simply got by plugging the same in both inputs... or maybe I just don't understand how to use it the best way.

gordonbrinkmann