6 Best Consumer GPUs For Local LLMs and AI Software in Late 2024

preview_player
Показать описание
This video goes fast. Buckle up!

Here are the absolute best cards you can upgrade to if you want the best performance with locally hosted large language models!

Oobabooga WebUI, koboldcpp, in fact, any software made for easily accessible local LLM model text generation and chatting with AI models privately have similar best-case scenarios when it comes to the top consumer GPUs you can use with them to maximize performance. Here is my benchmark-backed list of 6 graphics cards I found to be the best for working with various open source large language models locally on your PC. Read on!

Here are all of the graphics cards mentioned in the video:

=[The Top 24GB VRAM Cards]=

=[The Rest]=

=[The Budget Choice]=

Here are some more sources on the AMD GPUs and AI debate:

This channel, just as my main website is supported by its viewers/readers. The links to online stores served by the channel and the site are affiliate links and I might get a small cut once you make a purchase using them. Thank you for your support!
Рекомендации по теме
Комментарии
Автор

"hello, let's cut right to the chase..."
* immediately thumbs up *

sorinalexandrucirstea
Автор

I'm running a 7900XTX with ROCm on Windows 11 and not having much issues running local AI's. Currently the only thing holding me back is the AI devs not paying any attention to ROCm. But ZLUDA makes that a small issue.

thelaughingmanofficial
Автор

3090 all day every day and cost effective . 24gb vram 384nit bus lane and if youre advantageous connect NVLINK on multiple 3090 and you have a super juiced cluster AI beast.

CrypticEnigma
Автор

The AMD cards also run ollama and llm studio just fine. You get way more VRAM for your dollars.

simonhill
Автор

I bought an Intel ARC A770 new with 16GB VRAM for $280. I am running local LLMs on my computer and support for the hardware is increasing. Most of the new AI tools support it, or are in process of adding support. AMD cards are also a good choice here in this situation.

patriot
Автор

Thank you for being straight and to the point. Very concise video. Most videos regarding AI usually have long intros and tries to get you to buy stuff before providing any value. So you have earned your like sir.

kickheavy
Автор

The RTX 4070 TI Super is close to the performance of the 4080, and cost 200€ less.

luisff
Автор

Apple Silicon may not be as fast, but has ~120G VRAM usable. Works very well with ollama.

realityos
Автор

I am casual LLM user, using ollama openwebui and flowise RAG chatbots for smaller LLM models, 4070 12GB performs decently, didn't want to spend too much on GPU for my hobby spending.

sridhartn
Автор

i instantly hit subs because you dont take my time for those messy intro others have you go to the point im thinking of 3060 too since its the only gpu on my price range so thanks for this vid

clajmate
Автор

For large language models RYZEN AI MAX will be the answer. Up to 96 GB RAM (from max. 128GB ) can be assigned to the GPU.

HaraldEngels
Автор

Wouldn't integrated GPU had access to ram directly, making the vram size and bandwidth issue moot?

timmygilbert
Автор

if you do it on linux, there's a huge speed bonus compared to windows. LLM inference runs 50 to 100% faster on linux than windows for me for some reason.

GraveUypo
Автор

No mention of RTX 4060Ti 16GB?
Also, even a GTX 1070 8GB has some capabilities and goes for around USD 100 second hand.
Also, setting the partial layer offloading on your own is really preferable and gives best results if your model cannot fit into VRAM.
Another option if you want to play with APIs for free is Google AI Studio and the Gemini Flash 1.5 API.

konstantinlozev
Автор

Excited for the future. 32GB RTX5090

JoeVSvolcano
Автор

All I want is a wAIfu with an infinite context window that listens and responds 24/7 and will be my forever friend :>

isbestlizard
Автор

What can be done on a mobile rtx 4090 with 16GB vram?

tsizzle
Автор

Anyone ever tried 2x 2080ti sli ? For diffusion models? It would be like $200 22gb vram.

Larimuss
Автор

I would have thought the 3090 would be cheaper but, OH NO! The prices are INSANE!!!! Maybe after the 5000 series comes out?

darkman
Автор

AMD is now supported quite well. Whatever posts you were highlighting in your video to back your point are 1-2 years old

alx