Training Midjourney Level Style And Yourself Into The SD 1.5 Model via DreamBooth Stable Diffusion

preview_player
Показать описание

Playlist of #StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, #DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img:

Used training dataset in the video:

Style Trained model .safetensors (not includes myself - based on SD 1.5 pruned ckpt):

2400 Photo Of Man classification images:

0:00 Midjourney level style for free
0:35 Most important part of teaching a style into Stable Diffusion
1:23 I trained myself into the model and got amazing styled myself
1:30 First part of animation generation just like Corridor Crew did in their anime video
1:37 Used DreamBooth extension of Web UI on RunPod for training
1:57 Why training dataset is 1024x1024 pixels
2:25 Which rare token and class token are chosen why
2:54 Which dataset I have used to train myself
3:05 What kind of training dataset you need to generate consistent animation like Corridor Crew
3:27 A better way to connect your RunPod web UI instance
3:43 Which DreamBooth settings I have used to train myself into the base model
4:10 A good explanation of max resolution settings
4:31 Advanced tab settings of DreamBooth extension
5:15 Concepts tab of DreamBooth training
5:35 FileWords - image captions explanation
7:31 Where to see source checkpoint used for training
7:49 Why do seperate training instead of multiple concepts training
8:08 Style training used settings
9:20 Analysis of after style training upon myself trained model
10:25 x/y/z plot testing for Brad Pitt face to see overtraining effect
11:03 Castle in a forest test to verify not overtrained 1 more time
11:32 I had to do another training of my face
12:03 Final x/y/z plot comparison to decide best checkpoint
13:05 Analysis of final x/y/z plot
14:48 What you can do by using this methodology I explained
15:05 How to generate good quality, good face distant shots
15:10 very important parts of selecting good face training dataset
15:25 Why Stable Diffusion can't produce good face distant shots
15:33 How to do inpainting to fix your face in distant shots
15:48 What settings to use for inpainting to fix faces
16:17 How to upscale your image
16:30 GFPGAN to further improve face

The Midjourney level style provides an excellent starting point for creators looking to develop unique AI-generated animations. The most crucial aspect of teaching a style into Stable Diffusion is ensuring a comprehensive training dataset. To achieve this, the creator utilized a dataset of 1024x1024 pixels, which offers sufficient resolution for high-quality animation generation.

The choice of dataset is critical for generating consistent animations like Corridor Crew. To ensure success, the creator selected a rare token and class token that best suited their needs. The training dataset should be carefully curated to include a diverse range of images and styles to generate the desired outcome.

DreamBooth Extension Settings and Features

The DreamBooth extension offers a variety of settings to optimize the training process. The creator used specific settings in the max resolution and advanced tab settings to fine-tune the training process. Furthermore, they used the concepts tab and FileWords feature to add image captions and enhance the quality of the output.

Training and Analysis

Separate training for different concepts is recommended to achieve the best results. After style training, the creator analyzed the model and performed x/y/z plot testing for Brad Pitt's face to detect any overtraining effects. They also conducted a castle in a forest test to further verify that the model was not overtrained.

Improving Quality and Fixing Issues

One challenge faced in AI-generated animation is producing high-quality, distant face shots. Stable Diffusion may struggle with these shots, but inpainting can be employed to fix faces in distant shots. The creator used specific settings for inpainting and utilized GFPGAN to upscale and further improve facial images.

Conclusion

The methodology outlined in this article offers a comprehensive approach to mastering Midjourney level style and Stable Diffusion for AI-generated animation. By carefully selecting the training dataset, optimizing settings in DreamBooth, and employing advanced techniques such as inpainting and GFPGAN, creators can generate high-quality animations and images that captivate audiences.
Рекомендации по теме
Комментарии
Автор

Please join discord, mention me and ask me any questions. Thank you for like, subscribe, share and Patreon support. I am open to private consulting with Patreon subscription.

Cat:
fantasy rpg a cat amazing intricate vivid colors neon epic details ultra detailed artstation bbuk aesthetic cinematic lightning
Negative prompt: low bad blurry amateur sketch

Myself:
photo of (ohwx man) by bbuk aesthetic, intricate, cinematic, hd, hdr, 8k, 4k, sharp focus, canon, photoshoot, fit, athletic

Negative prompt: low, bad, blurry, grainy, worst, deformed, mutilated, fat, ugly, amateur

SECourses
Автор

Don't mind the comments about the way you speak or accent. I absolutely appreciate your content and your dedication to teach but also the willingness to learn.
Keep up the amazing content!

eyoo
Автор

Nice work the best content about stable diffusion 🔥🔥🔥🔥🔥🔥

OficialMag
Автор

This series is great. Thanks! Became a Patreon yesterday.

disco.volante
Автор

thanks a lot, trying now with green snake anime style. Also just became patreon, thanks for datasets

brofessor
Автор

Sorry if this is a stupid question, I’m fairly new to this and still putting things together

Once you train your 20-50 face/person images with regularization images

And do the same for style/aesthetic

How do you combine both tokens to run both checkpoints when training separately?

Following the corridor video, I saw that they trained some photos of niko with 10000 reg images

Then trained a anime style into 10000 astetic images

Then somehow crossed the two tokens together to create images of their subject and style like you did in this video, then retrained the new images back into the style + aesthetic reg images to furthermore isolate the training

I’m confused on how you guys got the tokens combined in the first place without merging and losing data

I’ll keep watching your videos because you do an amazing job explaining, I’m just a slow learner! Thanks :)

funnyknowledge
Автор

thanks for you tutorial, please do a video about how to train lora in google colab

omarch
Автор

Regarding captions: I had great results with filewords for the instance prompt, but only when also using filewords for the class prompt. It takes a lot of time, though, as you can’t share class images between different trainings, but it was worth it in my case.

disco.volante
Автор

Thank you for this informative tutorial video. I would like to inquire about how long it takes to train the style model?

momensree
Автор

6:05 For filewords:
The captions you get will be generic, like "a man with glasses is smiling for the camera while holding a cell phone in his hand and wearing a blue shirt".
You need to rename all occurrences of "a man" or "a woman" with the token you are trying to train. For this example, I used "ohwx" as the token. This should also be unique and not a common word to avoid clashes with tokens already within the model. To perform this renaming, run this terminal command within the output directory:
sed -i 's/a man/ohwx/g' *.txt (linux command)

Windows command:
Get-ChildItem -Filter *.txt | ForEach-Object { (Get-Content $_.FullName) -replace "a man", "ohwx" | Set-Content $_.FullName }


this renames all of the text files with "a man" into "ohwx"

Sergiosvm
Автор

Maybe an idea is to first train a dataset with limited poses like the one you provided with photos of yourself. Then create some shots with distant faces and fix them through inpainting. Finally create another dataset adding the the augmented images that contain distant shots of your face and finetune the model or train from scratch. Would love to see the results on that

eyoo
Автор

Hello! Assuming someone wants to generate a style dataset for a different model, for example 3d art style, would there be a way to batch generate a bunch of different photos using the 2000+ words you used, or would you need to generate them all one by one? And if they have to be done one by one, do you recommend generating one photo of each item/word (one cup, one vase, one potted plant), or generating multiple photos of each item (20 cups, 20 vases, 20 plants)?

davii.d
Автор

Does your patreon contain an already trained model without your face? (sorry watched the vid an hour ago and forgot)

EricGormly
Автор

A question, will you make in future a video on how to use the new version of Kohya on Runpod? They updated the main repository to support Linux so the old Linux repository is Deprecated and would love know how use it avoiding errors

norianorlan
Автор

Where did you obtain a list of thousands of different things for your style dataset? Did you find a large list of unique things? I am interested in generating my own style dataset. But I use a method that is very different from yours. Instead, I use a combination of one-word prompts, photographs, Loras, ControlNet, and Python script I wrote that works with the Automatic1111 webui's API. I've curated a list of about 700 useful nouns that exist in the SD v1.5 vocabulary of 50, 000 words. Where did you find a large list of thousands of things for your dataset?

gunnarquist
Автор

i think yesterday's update broke the classification image ingest, now every time i train, refuses to load the classification images. is this a bug?

cfernandomoran
Автор

Do you realize how annoying it is to cut everything you say into the sentence ever spoken. Leave some space between subjects or something, I have no idea what I’m seeing or hearing and I’m only 20 seconds in.

beatsbywoods
Автор

What's the thing with accelerating the audio orthe video, your editing its just awful. It makes it really not enjoyable to watch your videos, even with captions autoenabled, besides, your accent is so hard to get. Man, I appreciate your content but is hard to watch a 6 min video, not even thinking on watching an hour tutorial, Is kinda painful, was that intended? I don't believe so but then...

JaySolomonK