LocalAI LLM Testing: i9 CPU vs Tesla M40 vs 4060Ti vs A4500

preview_player
Показать описание
Sitting down to run some tests with i9 9820x, Tesla M40 (24GB), 4060Ti (16GB), and an A4500 (20GB)

Rough edit in lab session

GPUs being tested: (These are affiliate-based links that help the channel if you purchase from them!)

GPU Bench Node Components: (These are affiliate-based links that help the channel if you purchase from them!)

Recorded and best viewed in 4K
Your results may vary due to hardware, software, model used, context size, weather, wallet, and more!
Рекомендации по теме
Комментарии
Автор

Thank you. It would be interesting to see some evaluation of multiple consumer gpus working on the same llm.

andrewowens
Автор

I want to run big models cheaply, I use a 1080 TI now on 8b llama, fast enough but would like a reliable code assistant with bigger model. Suggestions? Can you test multiple 3060s in parallel on big model?

C
Автор

great content and relevant to me since I recently bought a 4060 ti 16gb for ai.

fooboomoo
Автор

Great breakdown. Since Ollama support for AMD has become decent, a good bang for buck is the MI50 16Gb. I did a similar test for comparison and it comes in a bit about the 4060ti for output, prompt tokens faster due to sheer memory speed (HBM2). ~20 toks/sec out. Not bad for a card that can be had on eBay for $150-$200 usd.

DarrenReidAu
Автор

1. Is it possible to run the LLM on both the CPU & GPU at the same time ? 2. And how come AMD GPU's aren't used that much in AI ? 3. What do you believe is the minimum Nvidia GPU for AI ? 4. How important is the amount of RAM ?

Matlockization
Автор

Thanks for comparing the different GPU hardware.

Can you run a test like, there is 6k input token and 1k output token.
So, we can known that how large LLM perform under 6k input and 1k output token.

nithinbhandari
Автор

P40 vs 3090ti .. just because there is so much of a price difference and what can you get in loading speeds if your files are on a P900 Optane (280GB) [assuming that one is setting up batch processing]

tsclly
Автор

Can you try to run the llama3.1 405B model on the CPU and see what kind of response we can get?

ZIaIqbal
Автор

so I swung a 4060 laptop and a 4070tisuper and have spent the last couple days migrating my PC into an AI server, haven't yet gotten to the AI but in the meanwhile I'm putting the warranties to the test with some hardcore mining, almost nestalgic to when bitcoin was $10/btc

I am realizing the 16Gvram is a bit of a bottleneck though, do you think adding an M40 or two would help? will the GPUs be able to crosstalk each others vram?

sixfree
Автор

great content my problem is choosing an am5 motherboard, I have 3 that I have got my eye on but I don't know which one is more future-proof
msi meg x670e ace
asus proart x670e
asus rog strix x670e-e gaming


can you help?
i want it mostly for AI art and such, msi costs more, rog and proart are the same price (but I still don't know between these two which one is better, proart 2 PCI x8 x8 but rog is x8 x4) is msi is better than proart?

fulldivemedia
Автор

I wish someone test those x99 motherboards with two xeon processors with 64 threads and up to 256 Gigabytes of ram. Would that run 70b models at at least 3 tokens per second?

delightfulThoughs
Автор

it would be interesting to compare a 4070 ti super to the 4060 ti if the scaling is proprtional to cost

georgepongracz
Автор

what software is this ? The gui i mean that you use where can i download it ?

PedroBackard
Автор

If I run Codestral 22b Q4_K_M on my P5000 (Pascal architecture), I get 11 t/s evaluation, so that means the P5000 performs around 75% of a 4060TI. But now, when I open Nvidia Power Management I can observe it only consumes 140W when under load while it should be ablte to go up to 180W. B.T.W. both these cards have 288GB/s memory bandwidth. I must have a bottleneck in my system which is a Intel 11th gen i7 laptop (4-core CPU) and eGPU over Thunderbolt 3.

jeroenadamdevenijn
Автор

I'm planning to buy gpu i have 2 choice P100 and M40 24GB i want to run 8B model is it's enough for it currently i have RYZEN 5 3600 16GB DDR4 1T NVME

mohammdmodan
Автор

What application are you using to run this?

donaldrudquist
Автор

iS POSSIBLE TO USE USE AN RX 6800 TO DO THIS TASK?

STEELFOX
Автор

Llama 3 7B runs in near real-time on an Apple M1 processor, and presumably faster on an M2 or M3.

marsrocket
Автор

running optiplex 7040 sff with 24 gig ddr4. i5 6700 .3.4 gig 4 cores. no gpu. i get 5 tokens per sec in ollama run llama3.1 8b --verbose, 9 tps on the new 3.2 3b. on the single test "write a 4000 word lesson on the basics of python". its usable. ollama run codstral. 22b pulled a 12 gig file. same test : results use 99%cpu 0%gpu 13 gig ram. it crawled 7 min.. 1.8 tps. it ran.

Автор

With these kinds of tests, 2 x 4060 ti 16gb must be included. And how it performs. 24gb is not enough 32gb on a Quadro kind of is 2700 euro"s. So it seems its a sweetspot. That you shpuld cover. Know your audience know sweetspots and that are the video's people want to see.

Johan-rmec