Stable Diffusion Dreambooth Tutorial - Just like Corridor Crew!

preview_player
Показать описание
Ever wanted to add things to Stable Diffusion just like in that Corridor Crew video? Well, now you can with this Textual Inversion based software called Dreambooth SD Optimised! Quick and easy to run on Linux and even Microsoft Windows. No coding required! Local install!

Update: Checkpoint loading now seems to be fixed in AUTOMATIC1111's repo, so no restarts required :)

Update 2: The diffusers Dreambooth is closer to the google Dreambooth paper. Looks like I may have to start looking for a diffusers based WebUI ;)

Chapters:
0:00 Introduction
2:43 Setup
5:18 Preparation
9:53 Training
14:28 Inference
19:20 Q&A
27:20 Less than 24GB VRAM?
28:39 Next steps

Links:

Update!
CPU version also available - no GPU required!
Takes just 8 hours and needs at least 32 GB RAM (with a 10 GB swapfile, as memory usage apparently goes to 35 GB)

Update 2!
Diffusers to ckpt converters:
Рекомендации по теме
Комментарии
Автор

Hi! Joe Penna here from the original repo.

I just trained myself on the class names "dog" and "§°ø".

The images look equally good.

I'm starting to get the feeling that regularization does nothing at all.

MysteryGuitarMan
Автор

if you plan on daisy chaining it's gonna be good practice to backup your checkpoint before training your next set of images, so you can always "go back" to that backup checkpoint and don't have to re-train everything in case you mess up a training set for some reason, which I'm already predicting for myself :)

Conrox
Автор

always fascinating how you make it look so easy and here I am having just spend 2 full days to finally get the "regular" textual inversion to work :)

Conrox
Автор

Double clicking on a file to open it? Wow you lost me there it is getting too complicated

sergentboucherie
Автор

I design movie posters (you know for platforms too) for a living. Wether this will become a superpower or the end of my career, this should become my primary work tool from now on.

carlosfernandez
Автор

Thanks so much for this! After reading the back & forth here between you and Joe Penna in the pinned comment, and reading his README notes, I couldn't wait to till the weekend to give this a try & wound up pulling an all-nighter to get it (messily, with lots of just typing away in the terminal like a savage) working in colab. I used the process the two of you describe, but I used a thousand generated images that varied from "face" closeups to "person" full body portraits for my regularization images, and I used nonsense words for *both* the token *and* the class. The result is a checkpoint that is *highly* editable (I can make things like dolls, stuffed animals, cross-stitch patterns of my subject without any placement tricks or inpainting or prompt editing) but also doesn't seem to me to have skewed anything else towards the subject (ie I can use the customized checkpoint with a prompt like "face" or "woman" and there's no resemblance to my token, even though that was the type of image I used for training). This is a *huge* improvement over embeddings for most use cases. Thanks again, your videos are great, love the "speedy but with all the details" format and the courtesy of a text file I can pause on when I just want to look at all your steps.

hntrssthmpsn
Автор

Thank you so much for sharing these amazing videos!
I managed to get an older i9 system of mine working so I am going to see if I can get this working (using your other video for Windows installation).
I geek out every time you share a new video and this one is no exception!
BTW, your video avatars are freaky - in a good way.

carledwards
Автор

I've been getting into AI art recently and it's pretty cool, but now it's clear how my setup can't possibly handle it. Great videos.

jhonm
Автор

Hey Bud can you tell about the AI narrator and video?

bilalalam
Автор

You make great videos, with good explanations! It'd be really handy if the text you showed on screen was in a file on github, with a link pasted in the description! Would you consider making a video on a nice workflow to fine tune stable diffusion on 50 to 1000 labeled images in a variety of classes? I want to take stills from movies like Tron Legacy, or TV shows like The Expanse or Altered Carbon, hand label them, then tune it for them! Maybe using CLIP to initially generate a description of each still, like the interrogate button in webui? And maybe cropping them manually to 512x512 first, so that the descriptions are more accurate to what's actually left in the frame? Just a few ideas! 🙂

luke.perkin.inventor
Автор

How do you convert diffusers .bin to .ckpt for automatic 1111?

Weatwagon
Автор

Btw, I personally changed the resolution used for training in the yaml file and put it at 384.
Bumped the batch size to 4 (it's right on the edge of my 3090 lol) and it trains way faster (and better imo because it finish more of the "epoch" rounds).

At 384 I have perfect results, but it's way faster to train.

Granted, my training images have many some closeup of faces. I even use cropped faces iwth just the eyes nise and mouth. Even reduced at 384 internally for the results is phenomenal. I get the most realistic image ever.

TransformXRED
Автор

This is great help!! Btw, I was also just thinking of using GaGAN for some talking head stuff!!

swannschilling
Автор

You might recommend using mamba as a replacement for conda (you can alias the name so if you have a conda typing habit you don't have to change it). The benefit is drastically faster solves for dependencies (from 5-10 minutes to 1-5 seconds).

tomm
Автор

does it work same with 1.5 SD version?

dan
Автор

Thanks again for this one.

I got funny results at first when I was loading the models in AUTOMATIC1111, it mixed the two different persons i had in two different checkpoint.

Some questions.

I tested one with Tom Segura, I didn't use any identifier (like sks, I removed it from the personalization script), and just used his full name as a class. It still worked fine. I still don't full understand the reason of the identifier. Is it when we train a specific "object" but can have many different forms? Like a duck, we don't want only ducks looking like those in our training images, so the identifier in the prompt is there to really target this particular one right?

So Tom Segura is unique, and using the full name in the class should be sufficient and/or still a correct way to do tbe training? I'm asking because I don't want to mess it up, even if it worked so far, I don't know if I inserted some potential problems by doing like that.

Is the pruning thing is throwing useful info? Or is it a lossless (in the sense that the results are visually similar, like a goid jog compression for an image for example)? Lol I have 7 of thes 11gb files now the script you shared will help me dor sure.

I used faces + head and shoulders + full body sometimes for my training images for one person . And I used 300 images of "men" in various situations downloaded from Google. Everything untouched. I saw someone mentioning that these pictures shouldn't be resized or reframed, and get 600 to 1200.

It worked, and I got some very good images of the person with half of his body (same shirt) on a bike with the hands on the handles. Granted, I trained it for 8000 steps. But again, was I lucky? Or these random images helped me to get better results like this.

Last one, is the merging checkpoints in AUTOMATIC1111 can be used to merge different ones? Like two containing different trained face in each. I saw there an interpolation slider, wouldn't it mess with both of them when merged?

TransformXRED
Автор

For your Q&A
The first point doesn't work (at least for me).

I did that, and I used different identifiers and class name for two people, but it mixes both of the face when I prompt the one that was trained first. It's like the first one got replaced (not completely) during the second training, even with totally different identifiers and class name used. I spent so many hours trying to figure out if I was doing something wrong (maybe I did). But it doesn't work for me.

I start to think that these checkpoints are like those made Textual Inverse, the .pt ones. It's just override the previous onces in some capacities.

What would be great with the Automatic1111 fork si to have the same system as the Textual inversion one. You load different ckpt files, from different people (or style), and you use the name of the file itself to invoke it in a prompt

TransformXRED
Автор

I feel a bit dumb atm (bare with me, been playing with this for two days only so far.), These are two separate gits yeah?
I've downloaded and used stable-diffusion-webui which i can get running. But how do I tie Dreambooth into this, I'm missing that.
I cant find it anywhere in your video or pastebin, where you actually say what were supposed to do here with the two separate projects to make them work together.

bryancorringham
Автор

2.1GB!!! That's massive! If you just do the textual inversion repo you get a ~5kb file that you can put in your embeddings folder that you can call and use normally.
Great vid and on the process tho! Thank you for making this. I was very interested in seeing if this was any better than using the texutal inversion I've been using and it doesn't seem like it is for me.

xdeathknightx
Автор

Hello brother, how do you do the deepfake?

UstedEstaAqui