It’s over…my new LLM Rig

Показать описание

This runs faster than a Thunderbolt eGPU

🛒 Gear Links 🛒

🎥 Related Videos 🎥

— — — — — — — — —

❤️ SUBSCRIBE TO MY YOUTUBE CHANNEL 📺

— — — — — — — — —

Join this channel to get access to perks:

— — — — — — — — —

#machinelearning #llm #softwaredevelopment

CHAPTERS
0:00 Unboxing
1:14 Installing RTX 4090
2:00 Setting Up Power Supply
2:46 Assembling GPU Dock
5:49 Software Installation
7:13 Running LLMs
9:59 Testing Larger Models
12:25 Testing Stable Diffusion

Рекомендации по теме

Комментарии

If you are running a GGUF model, ollama will split the process, putting as many layers as it can on GPU and the rest on CPU. IT will run slower, but faster than CPU only.

irrelevantdata

Some points here Alex:
1. The power cable that's split into three power plugs is a "dongle" that converts the 12V HPWR plug that's too recent for most power supplies out there, so they supply a splitter that is powered by 3 or 4 of the 8 pin PCI-E connectors.
2. The drivers for your GPU are provided by Nvidia themselves (just Google game ready drivers for RTX 4090), as the AIB drivers (Gigabyte's) are outdated.
3. All modern GPUs (from 2010 onwards) are all set to have zero fan revs at sub 60° C.

blackhorseteck

Mini PCs have revolutionized the boring PC market. The power they are able to squeeze inside these small boxes gives me hope for the future of computing.

blackhorseteck

small advice regarding ollama:
use verbose

example: ollama run llama3.1:8b --verbose
technically these commands, including what you ran, keeps the model loaded. You have to manually unload it, or you can tell the model to unload after you go /bye:

ollama run llama3.1:8b --verbose --keepalive 10s

the verbose will tell you the tokens per second generated.
the keepalive 10s will drop the model from memory after 10s

serikazero

Chinese modders transplanted a chip from RTX 4090D to a custom board or a 3090 board and soldered 48 GB of memory. Real beast for AI rig. However, I'm not sure about the warranty for such Frankenstein card.

eternalnightmare

Serious question: why not just use the power directly if you're UPS isn't big enough? This isn't a mission critical server with important information, it doesn't need 24/7 operation during a power outage.

harryhall

Haven't seen anyone do a video using multiple video cards in parallel to run a large model. So that is my humble request. Love the content.

Krath

10:10 The GPU spikes while running Ollama could be due to
Batch processing: The AI might be processing data in phases, causing short bursts of high GPU usage.
Resource optimization: The GPU is only used for certain tasks, leading to inconsistent usage.
Power management: The GPU adjusts its power consumption based on demand, resulting in spikes.

TechGameDev

I experienced the same VRAM problems. I have a i9-32 thread and 128GB system RAM and runs the large models in slow motion but works. Small models run fast enough to use on the i9, but if it fits the gpu 16GB its really fast and enough to use as a service for a few clients. I'm using 4090 mobile=4080 desktop. Large models, these days, seems the Mac unified ram is the way to go for running large models but slower, but at least it runs and the wait is not too long.

autoboto

I think you meant 40Gigabits/s for Thunderbolt 4 - instead of gigabytes.

monsterbeast

I've got a 1080 ti from 2017, and a PC I built in 2016 overclocked to almost 5Ghz, 6 cores 12 threads, with 64 GB Ram
I am currently running Llama 3.2 7B models lightning fast with my PC

ldandco

I have a 3090 with 24GB and yes you can run 13b. Nice setup

Heythisismychannel

ollama is smart enough to use two gpus simultaneously, so for that 40GB LLM you really have to use two gpus with 24GB vram each,
once you get over gpu vram capacity, things go into ram and though cpu which is terribly slow - at such point Apple Silicon Macs have advantage of utilizing shared ram, so something like 64GB Mac Studio "outperforms" and PC with lack of gpu vram

TazzSmk

The white video cards are usually purchased by people building a "snow blind" PC - white case, white video card, white power supply, white cables, etc. These white video cards can be difficult to source and during periods of short supply they command a premium price with no other benefit than matching the color of the build.

Larger LLMs yield very poor performance when spilling over from the maximum RAM of the nVidia RTX 4090 into the 128 Gb RAM in my tower PC. I get much better performance running up to 70 billion parameter LLMs on my MacBook Pro M3 Max. This is why I will be purchasing a Mac Studio M4 Ultra with maximum RAM installed when it is available.

gaiustacitus

i love this man, never bored watching

jukiy

Thanks for the demo. Now i understand how the llm work, especially the part where it consume power and how it consume the memory. With this info, i can manage the usage properly.

albertjeremy

running "ollama ps" will show you how much of the model is loaded on system ram vs GPU ram. You want 2x4090s for enough VRAM to run a 70b at a good speed.

Tarbard

You can see additional stats on model performance like tokens per second by using the --verbose flag with Ollama run.

So Ollama run llama3.1 --verbose.

Love the videos!

jake-epwq

This was a crazy video!! One of your best!!!

itiswhatitis-yes

Also ISTA-DASlab (on huggingface) managed to squeeze original Llama 70B 140Gb model into 22Gb remaining 90+% quality ratio, so it can run on one 3090 card. 8B model they've made possible to run on smartphones.

fontenbleau

It’s over…my new LLM Rig

It’s over…my new LLM Rig

Local LLM Challenge | Speed vs Efficiency

Buying a GPU for Deep Learning? Don't make this MISTAKE! #shorts

Cheap mini runs a 70B LLM 🤯

host ALL your AI locally

6 Best Consumer GPUs For Local LLMs and AI Software in Late 2024

INSANE Ollama AI Home Server - Quad 3090 Hardware Build, Costs, Tips and Tricks

Run Local LLMs on Hardware from $50 to $50,000 - We Test and Compare!

SUPER Cheap Ai PC - Low Wattage, Budget Friendly, Local Ai Server with Vision

Building My Ultimate Machine Learning Rig from Scratch! | 2024 ML Software Setup Guide

TULU 3 70B LLM Model Testing on Quad 3090 Local Ai Server

THIS is HARDEST MACHINE LEARNING model I've EVER coded

All You Need To Know About Running LLMs Locally

Ollama AI Home Server ULTIMATE Setup Guide

Using Clusters to Boost LLMs 🚀

Andrew Ng's Secret to Mastering Machine Learning - Part 1 #shorts

AWS vs Custom PC for Deep-learning | RTX 4080 compared | TheMVP

I can't STOP reading these Machine Learning Books!

The RTX 4090 Is Pathetic

Run an AI Large Language Model (LLM) at home on your GPU

Best AI Tools You Need to Know! (Free & Paid)

NEW DataGemma-27B LLM Uncover the Truth: RIG + RAG

home automation

Building a $1000 Gaming PC in 2024