ComfyUI: Flux with LLM, 5x Upscale (Workflow Tutorial)

preview_player
Показать описание
The video focuses on Flux.1[dev] usage and workflow in Comfy UI. The workflow is semiautomatic, with logical processing applied to reduce V Ram usage. It entails Image Reference, Image2Image, Text to Image, and Consistent upscaling techniques. Preserving the text during upscale was challenging. The workflow achieves upscaling with text retention up to 5.04x, approximately the original generation.

------------------------

Links for Models:

Upscaler:

GitHub:

CivtiAI LoRA Used:

Ollama:

------------------------

Disable Smart Memory in ComfyUI:
- Add “ --disable-smart-memory” at the end of first line. Save and start comfy

------------------------
System Instructions For Image LLM:
You are an advanced AI assistant equipped with visual and language understanding capabilities. Your primary goal is to meticulously analyze the given image and generate a comprehensive prompt suitable for recreating or expanding upon the image using various generative models.

Analyze the image and classify it as either a vector illustration, digital painting, traditional art painting, drawing, sketch, photograph, 3D rendering, graphic design, street art, folk art, conceptual art, texture, pattern, cartoon comic, etc. Subsequently, describe the image in extreme detail with classification, encompassing its composition, style, mood, atmosphere, colors, lighting, shadows, and overall theme. Ensure your response is structured in paragraphs and is free of additional commentary.

Modify Image LLM:
Objective: Modify the given text prompt based on specific change instructions while preserving all other details.

Read the Change Instruction: Identify the specific change(s) to be made as provided in the next line.

"Reimagine the room decor as a kids room who loves space and astronomy"

Implement the Change: Apply the specified modification(s) to the text.

Preserve Original Context: Ensure that all other aspects of the text, including descriptions, mood, style, and composition, remain unchanged.

Maintain Consistency: Keep the language style and tone consistent with the original text.

Review Changes: Verify that the modifications are limited to the specified change and that the overall meaning and intent of the text are preserved.

Provide Response: Just output the modified text. Ensure your response is free of additional commentary.

Text LLM:
Based on the user prompt, generate a detailed response explaining the composition, style, mood, atmosphere, colors, lighting, shadows, and overall theme. Ensure your response is structured in paragraphs, less than 512 tokens, and is free of additional commentary.

Image/Text Summary LLM:
Objective: Summarize the user's prompt to a concise version of no more than 80 tokens. It is critical that you do not exceed the 80 token limit under any circumstances.

Provide Response: Just output the modified text. Ensure your response is free of additional commentary.

------------------------

TimeStamps:

0:00 Intro.
01:48 Requirements.
05:58 Flux Resolution, Input & Logic.
18:16 Text LLM Conditioning.
24:36 Control Bridge Update.
25:17 Image LLM Conditioning & Modify Logic.
37:10 Switch & Final Conditioning
39:40 Flux Core.
44:32 Img2Img, Settings.
49:43 Img2Img Manipulation & Style Transfer.
53:21 Understanding max_shift.
54:44 InPainting, SAM2.
01:00:37 InPainting Test.
01:02:21 Flux Add Details, Upscale & Post Processing.
01:13:56 Upscale Full Run, VRAM Tips, Schnell.
Рекомендации по теме
Комментарии
Автор


Update (Sep 5, 2024): Flux Inpainting, The Preview Bridge node in the latest update has a new option called "Block". Ensure that it is set to "never" and not "if_empty_mask". This will allow the preview bridge node to pass on the mask. If set to "if_empty_mask", the node will block inpaint model conditioning input, as the switch default is set to SAM + Custom. I had asked the dev to update the node so that the default behavior is always set to never, he has fixed and done the same. Update the node again to the latest version.

Update (Sept 2, 2024): The LLM used are very small models. They have a higher variant as well. The "llava:34b" model for vision performs the best, but require 24 Gb VRAM. There is a 13B version for lower VRAM. The llama3.1 also has a 8b instyrcut model with fp16 "llama3.1:8b-instruct-fp16". This requires 17Gb VRAM as well. I have tested it, there are no issues with the workflow, as we unload the LLM as soon as its done. Llama 3.1 Also has a 70B and 400B parameter, cannot be run on consumer grade hardware. If you require such models, you can run them via API, Chat GPT 4o or Gemini Pro (non humans) perform best for API usage. To use API, just replace the node, ensure the connections are same. The workflow will maintain the logic.

The video is quite long, and had initially planned to include GGUF, but couldn't. ControlNet support for Flux was added in comfy, and was not stable at the time of recording the video. And more tests are anyways needed. Will probably make another video with ControlNet and GGUF if they warrant a tutorial on the channel.

controlaltai
Автор

I'm new playing around with ComfyUI and image generation, and this is the first demo that really showed me the potential of the entire infrastructure. It helped my learning immensely to build this workflow out step by step, following along as you explained each piece. Bravo, well done, and I hope you continue to produce more such great content!

mikedoug
Автор

I just wanted to express my gratitude for your tutorials! Instead of simply providing workflows to download, you're taking the time to explain how each node functions. Thanks to your guidance, I'm starting to grasp how ComfyUI works, and it's making a big difference. Keep up the amazing work! thank you!

dankazama
Автор

00:06 Flux simplifies image manipulation and creative generation with custom nodes and workflows.
02:31 Flux with LLM for AI image generation
08:11 ComfyUI workflow enables flexibility and compatibility
11:26 Workflow control and logic explanation
18:05 Configuring text generation with LLM for quality outputs
20:37 Workflow tutorial on flux conditioning with LLM for text summarization process
26:28 Adding switch logic control to summary system node
30:22 Using LLM in a workflow for flexible modifications
37:02 Workflow tutorial highlights for ComfyUI: Flux with LLM
41:20 Flux model sampling and node adjustments
48:00 Using max shift for stable and consistent results
51:48 Custom conditioning and style transfer with Flux and LLM
59:27 Detailed workflow for in-painting and upscaling process
1:03:08 Choosing specific upscalers is crucial for maintaining image consistency and text retention.
1:09:29 Use specific settings for different upscale groups
1:14:19 Flux model causing VRAM issues and slowing down generation

quickcinemarecap
Автор

COULD YOU HELP ME? i cant find the .json file for that giant workflow you showed here 2:00

RodrigoSantos-gwmw
Автор

I managed to get the workflow fully assembled this weekend, everything works great. This is the most interesting video I've ever seen on this topic. And thank you for helping me find the lower case letter bug in my workflow. Looking forward to more videos.

iArthurA
Автор

Exactly what I was looking for. Good work!

Novalis
Автор

I tried the workflow and it works well except that the image description is extremely censored. Well it's ok if you intend to show a cat or a woman in everyday clothes but the moment you have to describe something nomal but with less textile the Ollama totally ignores it. I had a picture of a barabrian woman riding a horse dressed with a long leather skirt and a leather top but the AI insisted to tell me that she was wearing a full armor and when i tried to modify the prompt to say that she is wearing a leather bikini top, the AI just said yes but insisted to add a leather jacket!! That was funny and ridiculous, well i guess that this Ai has been designed by a group of Amish and Mormons under the guidace of a very concerned preacher!

kukipett
Автор

there is so much to unpack here! for both experienced and newcomers! Im mid way building it, but I hit a brick wall with Layer Styles - I noticed when I was adding the purge Vram - it was not coming up then in the manager it says "Import Failed" I tried everything to get it running. I uninstalled it and re installed it through manager, I tried to "FIX it, through manager", I uninstalled then did a GIT CLONE into the custom node folder, I installed the Pip dependencies inside the node folder.. Im just not sure what else to try... Any ideas?

dkamhaji
Автор

Seems like every time I get a little more comfortable in Comfy, a new development comes along to make us restart again T_T
Thank you so much for this, learning about all brand new things in AI image generation is not so easy!

xandervera
Автор

Wow !, this gives me a headhache but a good one, i've learned so much in just this video. So many things seems so much simpler, with just some basic knowledge that many other youtubers seem to ignore in their tutorials. All that logic part is great to simplify the workflow.
That was great to be able to understand most of it even if i'm just learning ComfyUI since less than a week.
The video was really fast so i slowed it down to 50% so i had more time to get what you were doing.
Thanks a lot for all this knowledge that surely took a lot of time to prepare and to put in a video.

kukipett
Автор

i been able to build it but getting some issues, is it possible to make screenshots of the Node Groups to be able to compare it and see what i did wrong? (like the update of the model sampling on the comment)

Willer
Автор

There is a cut at 1:04:36 where you left something out. You added a ModelSamplingFlux node and some other nodes. But you don't go back and explainn it. Can you tell what happend? Which model do you drop in there?

Novalis
Автор

At 22:16 he tell me to ad a GetNode Text Boolean but there is not such thing in my dropdown menu. What am I doing wrong? :(

ernatogalvao
Автор

please include controlnet like union or best performing ones and also the role of mistoline in flux, thanks

eveekiviblog
Автор

Hey controlaltai could you make an updated animation video on the best way to make an ai animation, theres better tons of softwares and workflows now!

iseahosbourne
Автор

great, I was looking for a good workflow for Flux, appreciate that you go through the whole thing

christofferbersau
Автор

39:32 you "select input 3" to disable LLM. but your TEXT LLM is enabled. Why?
i set my workflow like yours (i hope), but my disable LLM do disable all LLM. (thats what it should do, right?)

hoylogefkal
Автор

Once again thanks so much for this. I am learning so much. I have still some few questions, if you would extend your generosity, I would like to pick your brain: if the 2.52x step renders too soft an image, what would be a reasonable replacement to the realvizxlv40 upscaler? The 1.25x renders consistently perfect results, but I find the 2.5 (and consequently the 5x) a bit too soft. Thanks once more.

RafaPolit
Автор

Finally got to the implementing all the bells and whistles. I added a small section with the main Flux configurations like Guidance, Steps, Sampler Scheduler and seed, extracted from the complex flow. Also, I have a 12GB 3080ti GPU, so GGUF was a must for me... I'm able to use Q8 with both your recommended LoRAs with about 3 to 4.5 sec/it on EULER/BETA, so really happy with it. As you said, I don't think the GGUF switch merits another video, it's just replacing the Unet Model Loader, and the Dual CLIP loader with tGGUF versions, and that's it. At least I hope so. I'm really happy with the results. Thanks a bunch!

RafaPolit