How Important Is VRAM Bandwidth?

preview_player
Показать описание

The width of your graphics card's memory bus is an oft-discussed topic - but how much does it matter? Is there a magic number on the spec sheet you should be looking for, or should you be approaching your next GPU purchase differently?

Leave a reply with your requests for future episodes.

FOLLOW US ELSEWHERE
---------------------------------------------------
Рекомендации по теме
Комментарии
Автор

Thank you for watching! We’re writing all the time at work whether it’s emails, drafting up video scripts, etc. but having a tool like Grammarly will help improve your productivity and save time! It’s FREE, why not? Sign up for a FREE account and get 20% off Grammarly Premium: grammarly.com/techquickie

techquickie
Автор

For most GPU compute applications, VRAM bandwidth is everything. I'm developing the computational fluid dynamics (CFD) software FluidX3D, and that is purely limited by bandwidth. The A100 80GB with 2TB/s is 2x faster than the 3080 Ti with 912GB/s, and the 3080 Ti is >2x as fast the RX 6900 XT with poor 512GB/s. The 2060 Super is as fast as the 3070, both 448GB/s, although the 3070 is 3x the TFLOPs/s.
TFLOPs/s don't matter here, neither does large infinity cache or branding. Only VRAM bandwidth.
The problem with newer GPUs is that their TFLOPs/s go through the roof, but VRAM bandwidth does not increase, sometimes even decrease (that "4080" 12GB has half the bandwidth of the 3080 12GB). This means that almost any algorithm will be bandwidth-bound in the future, so the GPU cores are idle most of the time waiting for new data.

ProjectPhysX
Автор

We often see memory bandwidth is important to maintain good FPS at go higher resolutions like 4K - 2160p or if you are super sampling. Cards with smaller bus width and lower overall bandwidth will see reduced framerates at 4K and above. This is one of the reasons why we see a number of 6XXX series AMD cards with lower memory bandwidth not scaling well to 4K compared to its competition.

micbrd
Автор

The bus width usually corresponds with the number of VRAM chips on the board (32 bits of data available at the I/O per chip). 40 series cards appear to use vram with a density of 2GB per chip so while it would be possible to have both 4080s have the same bus width, you’d need to compensate by using 1GB chips if you were committed to the memory difference but doing that would mean having to order two different VRAM chips which could be much more expensive at volume than everything using the same chip. From an architecture standpoint it may also be simpler to assume every memory channel has the same address range and you just adjust the bus size to compensate for available memory

MegajsX
Автор

But the problem with 4080 16 vs 12 Gb models was not only VRAM-related, they even have different GPUs inside, AD103 and AD104 respectively! That's more important reason that made NVIDIA cancel it and rename it properly, because there are way more differences behind than just "memory amount" and "bus width".

ly_bt
Автор

Long story short is that it depends on what you are using that VRAM for. In MOST typical consumer use cases, even the lower end of bandwidth spec available on the market won't run into any real issues at the typical resolutions. The cards that CAN power the resolutions where it starts to become an issue, typically already have that higher bandwidth anyway.

jtnachos
Автор

Proposal to GPU manufacturers: since they're smaller chunks of 1s and 0s that need to be loaded fast and often, the cache should get rebranded as *quickbits*

GSBarlev
Автор

If you're into AI, it's super important.

jtjames
Автор

Upgraded my 2060S(256 bus)to RX6750(192 bus)... it's like we're going backwards.

rolieg
Автор

No mention of Infinity cache? That changes the bus width requirements drastically.

juzujuzu
Автор

Oh Riley, Nvidia didn't cancel the RTX 4080 12 GB, they unlaunched it! Huge difference!

ddthegreat
Автор

Just a friendly reminder to everyone.
The 12GB RTX 4080 has a smaller die than the RTX 2060 and is about the same size as the RTX 3050/3060/1660
The memory bus is the same as the RTX 3060 and the GTX 1660
The price was going to be $900
The card only used ~70w more power than the 2060.
This should have been called a 4060 OC for $450 as it is the same size or smaller than a traditional 60 class card, and is the first non 60 card to have a 192 bit bus, the main upside is that it was clocked much higher than you'd expect, hense the OC and reasonably higher price for the added board and cooling costs, but $900? What a joke

Before you say "well its faster than last gen"
I'd like to remind you that GPUs seem to be the only time this argument is acceptable.
The ford Mustang doesnt shoot from 40k to 90k over night because it has more power than the previous models, it goes up in price because it costs more to make.
AMD has a larger die based on the same node tech(both 5nm class, with nvidia using 5nm+), as well as more dies in total, and a much wider memory bus with nearly double the RAM

they're only asking $100 more

Now thats not to say that i think the 7900xtx is not itself overpriced, just not as much as the 12GB 4080

denverag
Автор

they lowered its specs all around so that it didn't need the 16GB version's 256-bit bus, so complaining about 192-bit bus on an 80 series card is valid

Derelict_Doug
Автор

Thank you for the Adventure Time reference . Much appreciated.

sylvesteruchia
Автор

What i know is that when the VRAM is full, then the gpu with less bandwidth stutters more.

abcdefgh
Автор

Wow at 4:53 you guys finally did a shutout for Earnest Lee!! <3

LoveBbyJay
Автор

I feel like being disagreeable so the nitpick of mine is the reason why VRAM bit bus width isn't that important is it is all spilt up into units of 16 bits due to that is how the texture compression algorithm works. The chips itself are 32-bit per chip the reason why you never see a 32-bit bus is it is the accumulated number of all the chips. The total will always be able to go into 32 due to this. The reason behind why 16-bit texture compression on 32-bit hardware is FP16 works faster than FP32 and FP64. Once found the data is stored as a 32-bit data chuck as most code is written for 32-bit systems not 16-bit,

Now when you go over to how HBM works it is 1024-bit due to how the different physical RAM works in chucks of 1024 not in checks of 32. Realistically HBM is good but expensive to get enough of for it to be economical for a gaming card. As the 4090 isn't economical to begin with i am hoping the 5090 will just go to the better HBM3 instead of GDDR7 since it won't be that affordable to the avg consumer anyways.

yumri
Автор

I have a 4060 ti good with 16 vram but 128 bit bus, paired with a 10700k and it works great on ALL AAA games in 2025 at 1440p settings on high or ultra. ZERO complaints with the performance of the card test on some of the most demandinf games such as CyperPunk, Indiana Jones and the great circle, Control, the Forever Winter, AVATAR froniters of Pandora, etc.

slcgtx
Автор

It's just a segmentation of the offer. Cards with 192 bandwidth are basically broken on purpose. The same shortage of vram compared to consoles is ridiculous.

mnemonic
Автор

On PC VRAM bandwidth will matter for DirectStorage "memory-to-memory" particularly useful in extremely demanding 8K PC ports that are forthcoming on RTX 5000 / RX 8000. The DirectStorage Memory-to-Memory feature is not well publicized but, it is far more relevant than SSD asset streaming. Why, it's because even cheap DDR4 3200 is way faster than a PCIe 5.0 SSD maxed out on a x4 interface blasting a path of just over 15GB/s - and doing it at a very very high price I might point out. I will step through the process: Memory-to-memory capability means that relevant nearby compressed assets around the player get copied directly from the SSD to RAM by the CPU free of decompression load. Effectively this means the game developers are creating their own managed DDR4/5 DRAM-drive. Next up, those still-compressed assets stream from that managed DDR4/5 DRAM-drive directly to the GPU asset decoder which then places those assets in GPU VRAM.

Mr.Morden