How to use TemporalNet (Stable Diffusion IMG2IMG VIDEO TO VIDEO ANIMATION)

preview_player
Показать описание
A quick guide on how to use TemporalNet without the python script. Video was made because I got many questions on how to use TemporalNet, and there aren't really any proper guides online for it.

I'm not home for a few months, so I only had access to a laptop with low vram. Even just using the 3 ControlNet units pretty much maxed out my VRam usage, so I couldn't get better looking results for you. Just generating 1 frame also took over 5 minutes, so... yeah. The settings were not tested, just went with the first settings I tried.

Would have made a more video like guide, but due to laptop hardware and time limitations... It's mostly pictures.

I cropped the video beforehand using davinci resolve studio 18, and then extracted frames using ffmpeg at 15fps (original was 30fps). Then I just ran the batch with the settings.
For postprocessing of the final clip, I used Davinci's deflickering and flowframes to up the fps back up to 30.

As you can see, this method allows for quite drastic style changes whilst being pretty consistent. I believe this is currently the best method for denoising 1 that I have seen (except maybe Tokyo_Jab's method, but this allows for longer videos and easier frame-by-frame editing in my opinion). Another model that might help is reference_only, but I couldn't get satisfactory results from the little testing I did in combination with the TemporalNet method.

I personally always use 2 TemporalNets: current and loopback.

General Tips for making videos:
- Background masking/removal helps a lot when trying to make things consistent. The actor/actress in the video is usually fairly simple to get consistent
- Recommended (Optional) post processing methods: Davinci's Deflicker, Flowframes, (EBSynth)
- Also recommended to go and fix select frames afterwards. Sometimes hands become cloths etc.
- Use Multi-Controlnet. Some combos that I have found great are:
-- Normal_bae(or depth) + Softedge_hed (or lineart_realistic) + canny (threshholds around 35-45) + openpose_face
-- openpose_full, normal_bae, canny
-- normal_bae, openpose_full

As for the weights, it varies quite a lot but I usually start at having Normal_bae around 0.55, hed around 0.45, canny around 0.4 and openpose face around 0.35. Then Temporalnet (current) at 0.7-1.0 and temporalnet (loopback) at half of current.

Since someone will always ask anyways, here is the generation data with controlnets used:

Positive:
(masterpiece:1.4, best quality), (intricate details), unity 8k wallpaper, ultra detailed, beautiful and aesthetic, Korean girl dancing, half body, brunette, brown hair, white clothing, white sports bra, white top, navel, belly,

Negative:
(worst quality, low quality:1.4), (zombie, sketch, interlocked fingers, comic), (mask:1.2), blurry, high contrast

Steps: 24,
Sampler: Euler,
CFG scale: 7,
Seed: 4174458460,
Size: 576x1024,
Model hash: 77b7dc4ef0, Model: meinamix_meinaV10,
Denoising strength: 1,
Clip skip: 2,

ControlNet 0: "preprocessor: openpose_full, model: control_v11p_sd15_openpose [cab727d4], weight: 0.7, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: True, control mode: Balanced, preprocessor params: (512, -1, -1)",

ControlNet 1: "preprocessor: none, model: diff_control_sd15_temporalnet_fp16 [adc6bd97], weight: 0.6, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: True, control mode: Balanced, preprocessor params: (-1, -1, -1)",

ControlNet 2: "preprocessor: none, model: diff_control_sd15_temporalnet_fp16 [adc6bd97], weight: 0.35, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: True, control mode: Balanced, preprocessor params: (-1, -1, -1)", Version: v1.3.2
Рекомендации по теме
Комментарии
Автор

TY for this workflow. It really helped me with my consistency in my generated videos. Would be great if you could post more videos that generates video to animation at perfect consistency with face hands and clothing! Look forward to it

AniDanceAI
Автор

I have no idea why it works but it does :D

skyrimguy
Автор

Thanks for the guide. it's very nice.

linghuxiangjiang
Автор

If you dont mind can i know where you got the original video from and where can i finds vids to try that on?

SajidHussain-fuph
Автор

선생님 감사합니다. 알려주신대로 해보고 궁금하거나 잘 안되는 부분이 생기면 댓글로 여쭤봐도 될까요?

kongvage
Автор

Thanks for the guide. I tried exact settings in automatic111 but can't good results with denoise = 1. The only difference between my settings and the settings you shared is automatic111 model which is different. Any hints what else I should be checking?

hunzaXpress
Автор

Finally some decent study around TN =D Thanks
For current loop (0:13) do you put any Single Image input ? Or is it the "Generated input image" you refer at 0:26 ?
For loopback (0:22) I understand we can leave it empty or set a copy of i2i input folder. But are you using i2i in batch mode in the first place ? What if we use i2i without batch but with only 1 pic as input ? Any idea what we should put as loopback input folder in this case ?

susiealm
Автор

зачем тебе аниме если у тебя уже есть аниме:)???

x_mouzzer_x