LocalAI LLM Testing: i9 CPU vs Tesla M40 vs 4060Ti vs A4500

Показать описание

Sitting down to run some tests with i9 9820x, Tesla M40 (24GB), 4060Ti (16GB), and an A4500 (20GB)

Rough edit in lab session

GPUs being tested: (These are affiliate-based links that help the channel if you purchase from them!)

GPU Bench Node Components: (These are affiliate-based links that help the channel if you purchase from them!)

Recorded and best viewed in 4K
Your results may vary due to hardware, software, model used, context size, weather, wallet, and more!

Рекомендации по теме

Комментарии

Thank you. It would be interesting to see some evaluation of multiple consumer gpus working on the same llm.

andrewowens

I want to run big models cheaply, I use a 1080 TI now on 8b llama, fast enough but would like a reliable code assistant with bigger model. Suggestions? Can you test multiple 3060s in parallel on big model?

C

great content and relevant to me since I recently bought a 4060 ti 16gb for ai.

fooboomoo

Great breakdown. Since Ollama support for AMD has become decent, a good bang for buck is the MI50 16Gb. I did a similar test for comparison and it comes in a bit about the 4060ti for output, prompt tokens faster due to sheer memory speed (HBM2). ~20 toks/sec out. Not bad for a card that can be had on eBay for $150-$200 usd.

DarrenReidAu

1. Is it possible to run the LLM on both the CPU & GPU at the same time ? 2. And how come AMD GPU's aren't used that much in AI ? 3. What do you believe is the minimum Nvidia GPU for AI ? 4. How important is the amount of RAM ?

Matlockization

Thanks for comparing the different GPU hardware.

Can you run a test like, there is 6k input token and 1k output token.
So, we can known that how large LLM perform under 6k input and 1k output token.

nithinbhandari

P40 vs 3090ti .. just because there is so much of a price difference and what can you get in loading speeds if your files are on a P900 Optane (280GB) [assuming that one is setting up batch processing]

tsclly

Can you try to run the llama3.1 405B model on the CPU and see what kind of response we can get?

ZIaIqbal

so I swung a 4060 laptop and a 4070tisuper and have spent the last couple days migrating my PC into an AI server, haven't yet gotten to the AI but in the meanwhile I'm putting the warranties to the test with some hardcore mining, almost nestalgic to when bitcoin was $10/btc

I am realizing the 16Gvram is a bit of a bottleneck though, do you think adding an M40 or two would help? will the GPUs be able to crosstalk each others vram?

sixfree

great content my problem is choosing an am5 motherboard, I have 3 that I have got my eye on but I don't know which one is more future-proof
msi meg x670e ace
asus proart x670e
asus rog strix x670e-e gaming

can you help?
i want it mostly for AI art and such, msi costs more, rog and proart are the same price (but I still don't know between these two which one is better, proart 2 PCI x8 x8 but rog is x8 x4) is msi is better than proart?

fulldivemedia

I wish someone test those x99 motherboards with two xeon processors with 64 threads and up to 256 Gigabytes of ram. Would that run 70b models at at least 3 tokens per second?

delightfulThoughs

it would be interesting to compare a 4070 ti super to the 4060 ti if the scaling is proprtional to cost

georgepongracz

what software is this ? The gui i mean that you use where can i download it ?

PedroBackard

If I run Codestral 22b Q4_K_M on my P5000 (Pascal architecture), I get 11 t/s evaluation, so that means the P5000 performs around 75% of a 4060TI. But now, when I open Nvidia Power Management I can observe it only consumes 140W when under load while it should be ablte to go up to 180W. B.T.W. both these cards have 288GB/s memory bandwidth. I must have a bottleneck in my system which is a Intel 11th gen i7 laptop (4-core CPU) and eGPU over Thunderbolt 3.

jeroenadamdevenijn

I'm planning to buy gpu i have 2 choice P100 and M40 24GB i want to run 8B model is it's enough for it currently i have RYZEN 5 3600 16GB DDR4 1T NVME

mohammdmodan

What application are you using to run this?

donaldrudquist

iS POSSIBLE TO USE USE AN RX 6800 TO DO THIS TASK?

STEELFOX

Llama 3 7B runs in near real-time on an Apple M1 processor, and presumably faster on an M2 or M3.

marsrocket

running optiplex 7040 sff with 24 gig ddr4. i5 6700 .3.4 gig 4 cores. no gpu. i get 5 tokens per sec in ollama run llama3.1 8b --verbose, 9 tps on the new 3.2 3b. on the single test "write a 4000 word lesson on the basics of python". its usable. ollama run codstral. 22b pulled a 12 gig file. same test : results use 99%cpu 0%gpu 13 gig ram. it crawled 7 min.. 1.8 tps. it ran.

With these kinds of tests, 2 x 4060 ti 16gb must be included. And how it performs. 24gb is not enough 32gb on a Quadro kind of is 2700 euro"s. So it seems its a sweetspot. That you shpuld cover. Know your audience know sweetspots and that are the video's people want to see.

Johan-rmec

LocalAI LLM Testing: i9 CPU vs Tesla M40 vs 4060Ti vs A4500

LocalAI LLM Testing: i9 CPU vs Tesla M40 vs 4060Ti vs A4500

LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4

LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes

Which nVidia GPU is BEST for Local Generative AI and LLMs in 2024?

LocalAI LLM Single vs Multi GPU Testing scaling to 6x 4060TI 16GB GPUS

Run Local LLMs on Hardware from $50 to $50,000 - We Test and Compare!

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3 CPU Inference i9-9820X

LocalAI LLM Testing: Viewer Questions using mixed GPUs, and what is Tensor Splitting AI lab session

LLMs with 8GB / 16GB

LocalAI LLM Testing: Part 2 Network Distributed Inference Llama 3.1 405B Q2 in the Lab!

PC Hardware Upgrade For Running AI Tools Locally

LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB

LocalAI LLM Testing: Can 6 Nvidia A4500's Take on the WizardLM 2 8x22b?

FREE Local LLMs on Apple Silicon | FAST!

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X

LocalAI LLM Tuning: WTH is Flash Attention? What are the effects on memory and performance? Llama3.2

Local LLM Challenge | Speed vs Efficiency

When M1 DESTROYS a RTX card for Machine Learning | MacBook Pro vs Dell XPS 15

AWS vs Custom PC for Deep-learning | RTX 4080 compared | TheMVP

Casually Run Falcon 180B LLM on Apple M2 Ultra! FASTER than nVidia?

The ULTIMATE Budget Workstation.

Llama 3.1 405b model is HERE | Hardware requirements

REALITY vs Apple’s Memory Claims | vs RTX4090m

RIP Stable Diffusion! BEST FREE UNCENSORED AI Model Is HERE!