How To Use Mochi 1 Open Source Video Generation Model On Your Windows PC, RunPod and Massed Compute

preview_player
Показать описание
Mochi 1 from Genmo is the newest state-of-the-art Open Source video generation model that you can use for free on your computer. This model is a breakthrough like the very first Stable Diffusion model but this time it is starting for the video generation models. In this tutorial, I am going to show you how to use Genmo Mochi 1 video generation model on your computer, on windows, locally with the most advanced and very easy to use SwarmUI. SwarmUI as fast as ComfyUI but also as easy as using Automatic1111 Stable Diffusion web UI. Moreover, if you don't have a powerful GPU to run this model locally, I am going to show you how to use this model on the best cloud providers RunPod and Massed Compute.

🔗 Public Open Access Article Used in Video ⤵️

Amazing Ultra Important Tutorials with Chapters and Manually Written Subtitles / Captions

Main Windows SwarmUI Tutorial (Watch To Learn How to Use)

How to install and use. You have to watch this to learn how to use SwarmUI

Cloud Tutorial (Massed Compute - RunPod - Kaggle)

If you don't have a powerful GPU or you want to use powerful GPU this is the tutorial you need

Free Kaggle Account Notebook for GPU-Poor

Installs latest version of SwarmUI on a free Kaggle account

Works with Dual T4 GPU at the same time

Supports SD 1.5, SDXL, SD3, FLUX and Stable Cascade and more :

0:00 Introduction to the tutorial
1:44 How to download, install and use Mochi 1 on Windows
3:59 How to update SwarmUI to the latest version to be able to use Mochi 1
4:17 How to start SwarmUI on Windows
4:27 How to set SwarmUI to use which GPU for generation
4:55 How to generate a video with Mochi 1, what are the best configurations
6:45 Where I have shared all the prompts I used to generate intro demo AI videos
7:30 Where to see step speed of your video generation and what are the speeds of RTX 3060 and RTX 3090
8:04 How do I activate my first primary GPU while also generating on my secondary GPU
8:25 Why queue system may not immediately start using your multiple GPUs and how to fix
9:45 How to solve out of memory error by enabling VAE tiling
10:02 Which parameters are best for VAE tile size and VAE tile overlap
10:53 How to use Mochi 1 and SwarmUI on Massed Compute cloud service - you don't need a GPU for this
11:13 How to apply our SECourses coupon to get 50% discount for real for RTX A6000 GPU
11:37 How to connect initialized Massed Compute and start using Mochi 1
12:23 How to update SwarmUI to latest version on Massed Compute
12:51 How to start SwarmUI with public share to access from computer directly and use in computer browser
14:10 How to install and use Mochi 1 on RunPod with SwarmUI
16:45 How to monitor back-ends loading of SwarmUI on RunPod
17:35 How to properly terminate your RunPod pod and Massed Compute instance to not lose any money

Model Architecture

Mochi 1 represents a significant advancement in open-source video generation, featuring a 10 billion parameter diffusion model built on our novel Asymmetric Diffusion Transformer (AsymmDiT) architecture. Trained entirely from scratch, it is the largest video generative model ever openly released. And best of all, it’s a simple, hackable architecture.

Alongside Mochi, we are open-sourcing our video VAE. Our VAE causally compresses videos to a 128x smaller size, with an 8x8 spatial and a 6x temporal compression to a 12-channel latent space.

An AsymmDiT efficiently processes user prompts alongside compressed video tokens by streamlining text processing and focusing neural network capacity on visual reasoning. AsymmDiT jointly attends to text and visual tokens with multi-modal self-attention and learns separate MLP layers for each modality, similar to Stable Diffusion 3. However, our visual stream has nearly 4 times as many parameters as the text stream via a larger hidden dimension. To unify the modalities in self-attention, we use non-square QKV and output projection layers. This asymmetric design reduces inference memory requirements. Many modern diffusion models use multiple pretrained language models to represent user prompts. In contrast, Mochi 1 simply encodes prompts with a single T5-XXL language model.
Рекомендации по теме
Комментарии
Автор

🔗 Public Open Access Article Used in Video ⤵

Amazing Ultra Important Tutorials with Chapters and Manually Written Subtitles / Captions



Main Windows SwarmUI Tutorial (Watch To Learn How to Use)

How to install and use. You have to watch this to learn how to use SwarmUI

Cloud Tutorial (Massed Compute - RunPod - Kaggle)

If you don't have a powerful GPU or you want to use powerful GPU this is the tutorial you need


Free Kaggle Account Notebook for GPU-Poor

Installs latest version of SwarmUI on a free Kaggle account

Works with Dual T4 GPU at the same time

Supports SD 1.5, SDXL, SD3, FLUX and Stable Cascade and more :

SECourses
Автор

Blogger, I was wondering if this model could use API calls to generate videos ?

KuiDong-bu
Автор

I am on his Patreon, it's incredible how many one click installers he does, the value is insane, and you are supporting someone who produces some of the best tuts in this field. Worth every penny. I'm sure an image video open source will come soon, it's great to see the pace we are moving forward in this field.

marksutherland
Автор

I really tried to make this work on massed compute, but I get this error when I try to generate. No idea what to do:
All available backends failed to load the model

PS: I used the model loader and loaded a model from citivai.

metasaman
Автор

Thank you sir!
going to try to run this on my 8gig video card 😂

neokortexproductions
Автор

Didn't worked for me. Installed SwarmUI v0.9.3.1 (2024-11-07 19:14:16).
Downloaded GB) in
Load model and applied settings as shown in video.
In Log > Debug (2024-11-10 15:48:53.281 [Info] User local requested 1 image with model
and nothing is happening after that, no VRAM increasing, waited for 20 mins.

indecomsh
Автор

nice now we back to the root .. open public posts ... thx for the video

solomslls
Автор

Thanks. Is it possible to generate a video using an existing image (img2img)?

sternkrieger
Автор

Thank you so much for your videos, Doc. We need to keep open source going to prevent eventual corporate monopoly.

gawni
Автор

Im not sure what Im missing but the options for Text to Video arent visible in my swarmui. Im on the newest version (0.9.3.1) and I chose the Mochi model in the models tap.

Gothdir
Автор

Strange I receive only black screen after video generation no matter do I use Tile Size and Overlap or not, what could be the problem?

arder_D
Автор

I tried it... I wouldn't call this "state of the art"... the output looks like AI videos from 2023

RikkTheGaijin