Secret GPU: RTX 2080 in the RTX 2060 KO, Up to +47% Workstation Performance

preview_player
Показать описание
We discovered something that slipped past NVIDIA and EVGA alike: The RTX 2060 KO performs 26% to 47% better than a normal 2060 in some applications.

It's not just that TU-104-150 is now in the RTX 2060 KO, but that NVIDIA might have forgotten to disable some of its performance in the new RTX 2060 SKU. The TU-104 GPU came from an RTX 2080 -- or RTX 2070 Super -- and it's supposed to be a salvage chip with disabled features that should result in a simple RTX 2060 that's bigger. In reality, it's actually an RTX 2060 that has a lot more performance in professional applications like Blender, SolidWorks, 3DSMax, Maya, CATIA, Siemens NX, and more. If you're looking for one of the best budget workstation video cards, this may be the new winner.

** Please like, comment, and subscribe for more! **

Links to Amazon and Newegg are typically monetized on our channel (affiliate links) and may return a commission of sales to us from the retailer. This is unrelated to the product manufacturer. Any advertisements or sponsorships are disclosed within the video ("this video is brought to you by") and above the fold in the description. We do not ever produce paid content or "sponsored content" (meaning that the content is our idea and is not funded externally aside from whatever ad placement is in the beginning) and we do not ever charge manufacturers for coverage.

Follow us in these locations for more gaming and hardware updates:

Editorial, Testing: Steve Burke
Video: Keegan Gallick, Andrew Coleman
Рекомендации по теме
Комментарии
Автор

This was extremely fun and I hope it gets some attention online: We spent days on this and it was really intriguing to work on and a great break from the usual reviews, sort of like a mystery. My hope is that some people deeper at NVIDIA see it and contact me to educate us on what's really happening here. We have confirmed the results with NVIDIA and EVGA, and now it's time to understand them. I've also reached out to David Kanter for assistance in learning why this happened.

GamersNexus
Автор

This is the type of investigative technology journalism that brought me to this channel. Nicely work surprising Nvidia about its own product!

piers
Автор

GN: So it's a cheaper 2060 that performs the same as a regular 2060, but faster in some non-gaming cases.
Nvidia: What?
GN: What?

hasnihossainsami
Автор

Now watch them disappear from everywhere.

PradhumanRehal
Автор

I wonder why they put huge wads of pre chewed gum on this card

godtiermedic
Автор

evga engineer 1: let's play tetris with thermal pads
evga engineer 2: dope, brah!

tomunterwegs
Автор

thicc pads yo, looks like enough C4 to open a bank vault.

SilkMilkJilk
Автор

"AMD got the stupid prize today" That made me laugh for far too long.

infinitelyexplosive
Автор

Always nice to find a pleasant surprise like these, even if you have absolutely no idea why. These are an amazing value for people that exclusively need to use Blender but are budget-bound

genius
Автор

Whats up with those HUGE Thermalpads O.o

SuperUltimateLP
Автор

I don't expect to ever find this card for a reasonable price on the used market now.

Marc_Wolfe
Автор

According to Anandtech: "TU106 packs 12 SMs to a GPC, versus 8 to a GPC in TU104" Each GPC has a raster engine, so the TU104 ends up with more raster engines for a given cuda core count. Oddly TU102 is also 12 SMs per GPC. Probably to save die space.

C_C-
Автор

Reminds me of that one guy on reddit who got an 8 core ryzen 1600... but this time it's not a one time thing :D

Collin
Автор

Good job Steve. While everyone is distracted with the 5600 reviews, Steve strikes gold in a boring low end card and it is super effective !!!

First big tech story of 2020

andrewjatz
Автор

9:32 lol whats going on is your gpu throwing a fit rendering that chart lol

charlesballiet
Автор

Gotta get one of those, those thermal pads look _DELICIOUS_

budgetbajur
Автор

There was a period of time where not only was this card delivering insane workstation performance for the cost, but the 1600 AF was a 6-core, 12 thread Ryzen for dirt cheap. And you'd be able to build a really good workstation with a 1600 AF, 2060 KO, and 32gb of RAM. Something that could beast away at workstation tasks without breaking the bank, and game well. And I'm irritated I didn't build a PC during that time.

bluesy
Автор

Would you be willing to test the 2060 KO for video transcode performance? This could potentially be a great card for hardware transcoding in media server applications.

shadowtheimpure
Автор

@Gamers Nexus Just a speculation based on my job as a CUDA programmer: could it be that what Nvidia did with these "faulty" dies is fuse off failed areas resulting in some SMs (streaming multiprocessors) which contain fewer than the full allotment of 64 INT+64 FP cores, but have more SMs than the usual 2060, which result in the total number of cores for the entire die still reaching the 1920 core count needed to be classified as a 2060 product?
Normally a Turing SM consists of four processing blocks, each with 16 INT+16 FP cores (plus warp scheduler, warp dispatcher, 16K 32-bit register file, 2 tensor cores, 4 load/store units, 1 special functions unit). These four processing blocks are combined with 4 texture units, 1 ray tracing core, and a 96KB combined L1 cache/shared memory, which all together compose a Turing SM.
Now what if in some of the SMs of a 2060-KO die they had to fuse off some of the processing blocks due to faults, which results in some SMs having less than four processing blocks, but with the total die having more SMs than the standard 2060. Now in the extra SMs they will also have to fuse off the extra texture units and ray tracing cores so that the total count of texture units and RT cores for the entire die still matches that required for a regular 2060.
Now they cannot fuse off any of the L1 cache/shared memory because under Turing's CUDA compute capability 7.5 they must be able to support 64KB of shared memory per SM plus a 32KB L1 cache, hence the entire 96KB must be retained to satisfy the CUDA programming specification for Turing.
The result is you have some SMs which consist of less than 4 processing blocks, but have the full allotment of 96KB of L1 cache/shared memory.
Now because one CUDA thread block is actively running per SM, if you happened to make your program such that you have a certain number of threads in your thread block which just so happen to fit into those SMs with reduced processing blocks (ie. each processing block has a 32-thread-per-clock scheduler and dispatcher, so for example, a "reduced" SM with 2 processing blocks can schedule and dispatch 64 threads per clock), then the result is you effectively have more SMs each with their own full L1 cache/shared memory, thus effectively giving your thread blocks more shared memory to use. And if your thread block size used in your program happens to fit the "reduced" SMs processing blocks scheduler and dispatcher capacity, then you get less warp context switches as the thread warps in your thread block are executed.
This would be a significant computation boost for a CUDA kernel if you happen to be able to fit your blocks sizes to the "reduced" SMs capacities.
It wouldn't be any help to any tasks needing texture units since the only extra units retained (ie. not fused off) was the L1 cache/shared memory, so tasks which require a lot of fast graphics rendering per second would still be limited by the total number of texture units in the die (ie. since the texture units in the extra SMs had to be fused off).
The end result is better CUDA compute performance, while still retaining the same gaming performance.
Just my .02 speculation.

xenoaltrax
Автор

Nvidia “when life gives you lemons, sell them as 2060’s”

brandonyoung
welcome to shbcf.ru