comparing GPUs to CPUs isn't fair

Показать описание

In my previous video, I talked about why CPUs cannot have thousands of cores. While this is true, due to thermal, electrical, and memory limitations, alot of the comments in the video were about how CPU's have thousands of cores. In this video, we discuss the subtle differences in GPU microarchitecture, which makes CUDA "cores" and CPU cores significantly different.

CPU cores are heavy computing machines, that are able to process arbitrary input from users using arbitrary programs. Because of this, CPUs are more generalized. GPUs on the other hand, are good at one thing: bulk processing on bulk data.

🏫 COURSES 🏫

🔥🔥🔥 SOCIALS 🔥🔥🔥

Рекомендации по теме

Комментарии

Don't forget that a CPU core also implements the entire x86/x64 instruction set while a shader core is only going to implement a much smaller and simpler instruction set. This is how they fit so many more cores on a GPU die in the first place.

CharlesVanNoland

I remember when NVIDIA did this Tegra presentation and I had to cringe when they claimed they had the first 200 core (or something like that) mobile processor. They really just had a generic arm design and a GPU and added those cores up like they were equivalent.

CjqNslXUcM

Basically, CPUs are optimized to minimize latency, GPUs are optimized to maximize throughput (bandwidth)
While at first glance they seem to imply the same thing, they do not. You could get a result from the CPU in 1ms, but only process 10 items, but a GPU can process 10, 000 items in 100ms. You would expect this to mean 10, 000/100 = 100 items in 1ms, but yeah that's not how GPUs work. You pay for the the high bandwidth in latency
It is nuanced, but once you understand it, the difference is actually night and day.

GPUs aren't also flexible. The programs you write, are "inherently" parallel. No std::thread kinda stuff. You write a scalar like program that is "automagically" parallelized, so you have to write thinking about parallel access from the get-go

dexterman

4090 has 83 TFLOPs. It’s the 4080 that has the 49.

Maxim_Espada

You should have done the mechanical layout of the a CPU vs GPU core. Its a clear the difference that way. Way more parts for the CPU core as they are very different and not even in the same realm.

BentonL

4:38 former maintainer of Intel's OpenCL driver for Linux here, on Intel the Y branch threads would execute after the X branch has finished(reached the "else" statement) and block the X branch threads until the end of the if/else. I'm not familiar with Nvidia but I think they do the same.

Also with AVX512 the line seems to be blurring somewhat, AVX512 has the same lane masking capability just like Intel's GPU ISA.

linnaea_lavia

This kind of parallelism is actually called Single Instruction Multiple Threads, as it is slightly different from Single Instruction Multiple Data. In fact a Warp can be Single Instruction Multiple Threads like explained in the video and process multiple pixels at once (and following every branch in unison), while every core can be Single Instruction Multiple Data and process a vec4 at a time, not just a float.

naturallyinterested

A great explanation. Thank you!

In a base analogy, it can be taken like thus; CPUs are like 3 architect/builders. They can perform numerous amounts of complex stuff well, but they're limited in number, and therefore efficiency when complexity isn't required.

GPUs are like ant colonies; not smart enough to build wonders, but there are many enough to work fast and efficiently on singular tasks.

aeureus

Good video. I'd like to add that programming GPUs in a way which approaches the advertised performance is rather difficult. You have mentioned the branches, but also, they are lacking features (no call stack, no dynamic memory allocation), ideally need specific memory access patterns (search for memory coalescing, and bank conflicts), have manually managed in-core caches, and important technical info is a trade secret (like their instruction sets).

soonts

Would really appreciate more videos on this style explaining these kinds of concepts

danielray

You just shouldnt call GPU FP units cores, that's just a marketing term from NVIDIA. Shaders or PF32 units would be better names for what they are. The closest thing in a NVIDIA GPU to a CPU core would be something like a SM. And there are only like 128 SMs even in the highest End GPUs.

lukas_ls

Gpu cores also run at a lower clock speed which allows stacking more of them in a small chip

aaron

Love the video. Really interesting and pretty simple to understand.

no-one

Great presentation!

Size / yield / energy : a big clever CPU core is harder to manufacture. Scaling out CPU to the same amount of cores as in a GPU, but keeping the complexity of the CPU… you’ll get insane power draw, low yield and insane prices like super computers.

One thing cut for time here: that memory architecture is very different because they are built for so different purposes. GPUs have specialized memory that shuffle a lot of memory at very wide buses; read a lot of closely aligned memory. CPU have very narrow busses (in DDR5, 2x32bit per stick). So a CPU chip can shuffle a lot of different data at the same time while GPUs are good at shuffling a lot more of the same data. So the GPU memory model is bad for running multiple different programs at the same time. So the literal hardware interfaces of the chip are built for extremely different purposes, entire different programming idea :)

randomgeocacher

I think it's useful to mention that GPUs will frequently and deliberately block on memory as the memory subsystem is geared towards throughput with little caching in the way of reducing access latency. Hence, a SM core may theoretically switch context after every warp instruction.

michaelprantl

One thing to note. Starting with Ampere, Nvidias Cuda cores have a FPU, and a combined Int32 and FPU. They aren't completely split anymore. They did this to increase the max theoretical performance, however, there's almost never a case where some Integer calculations are being run. It's really quite interesting actually. AMD has done the same thing with Navi 31, in the Radeon 7000 series. I guess it's a way to squeeze out some extra performance without increasing die size

hufthenerd

Saying a GPU is faster than a CPU is like saying a rock is a much better car than a tennis racket.
Unless you have an explicit context and an exact specification compared to _what_ it is supposedly faster there is absolutely no point in even trying to reason what such a statement means.

Finkelfunk

That branch blocking fits a GPU perfectly, because it short-circuits the computation path if the view of the object is blocked, or not in view.

patrickvolk

Nice explanation of warp scheduling and stuff, I used those ideas a lot in my path tracer

kylebowles

That's some deep level of knowledge
Thank you sir for the infos ❤️

oussamakhlif

comparing GPUs to CPUs isn't fair

comparing GPUs to CPUs isn't fair

CPUs vs GPUs As Fast As Possible

Intel Responds to mobile CPU Stability | AMD runs CUDA now? | Ryzen 9000 | So much more!!!

Should You Put TWO CPUs In Your PC?

GOODBYE RX 7950 XTX!

We built a PC more efficient than a console!

The Worst CPU & GPU Purchases of 2017

3D Modeling & Design – Do you REALLY need a Xeon and Quadro??

R9 9950X VS R9 7950X3D VS R7 9700X VS İ7 14700K VS R7 7700X Zen5 Ryzen 9000 series benchmark

Spark 3 Demo: Comparing Performance of GPUs vs. CPUs

The Worst CPU & GPU Purchases of 2018

Ryzen 7 5700X3D, now the top VALUE CPU for your Gaming PC?

CPU vs GPU Comparison | Difference Between CPU vs GPU | Best 1440p GPU 2024

GAMING on a £0.25 Graphics Card!

GPU Price Drops!!! | RTX 5000 Leaks | Ryzen 9000 & AI 300 Official Details | Intel Arrow Lake Le...

AMD Strikes Back: Zen 5 CPU Architecture Changes & Chipset Differences (X870E vs. X870, B850, B8...

Tensorflow GPU vs CPU performance comparison | Test your GPU performance for Deep Learning - English

GPU COMPARISON on INTEL & AMD CPUs, including a 4090! #shorts #gaming #fps #pcbuild

This $5000 Graphics Card Can’t Game - NVIDIA CMP 170HX Mining GPU

The BEST 👑 Gaming GPUs to buy in July 2024!

Best VALUE RTX 4070 Super Gaming PC Build! 😄 [Sub $1200 Build Guide w/ Benchmarks]

Desktop vs Laptop Gaming Performance, How Far Behind Are Today's Laptop GPUs?

DDR5 up to 20% Faster vs. DDR4 (don't buy it)

Best GPU under $200 in 2024? RTX 3050 vs RX 6600- The Ultimate Comparison!!!