CPU vs GPU (What's the Difference?) - Computerphile

preview_player
Показать описание
What does a GPU do differently to a CPU and why don't we use them for everything? First of a series from Jem Davies, VP of Technology at ARM.

This video was filmed and edited by Sean Riley.

Рекомендации по теме
Комментарии
Автор

One thing worth mentioning: GPUs actually do 4 dimensional matrix calculations rather than 3, because with 3 dimensions, Rotation and magnification require matrix multiplication while translation requires matrix addition. By adding an arbitrary dimension, the GPU is able to unify all three key transformations under a single multiplicative architecture.

scheimong
Автор

CPU - 10 Ph.D guys sitting in a room trying to solve one super hard problem.
GPU - 1000 preschoolers drawing between lines.

tomasotto
Автор

Here’s a key acronym to remember about GPUs: “SIMD”. That‘s “Single-Instruction, Multiple-Data”. It has to do with the fact that a GPU can operate on a hundred or a thousand vertices or pixels at once in parallel, but it has to perform exactly the same calculation on all of them.

Whereas a single CPU core can be described as “SISD” -- “Single-Instruction, Single-Data”. With multiple CPU cores, you get “MIMD” -- “Multiple-Instruction, Multiple-Data”, where each instruction sequence can be doing entirely different things to different data. Or in other words, multithreading.

So even with all their massive parallelism, GPUs are still effectively single-threaded.

lawrencedoliveiro
Автор

"you've never seen a triangle that isn't flat."

I just came from the non-euclidean geometry video.

magikarpusedsplash
Автор

Hi missed one of the most important things. Why they are more efficient.

Running GPU load in parallell don´t make it more power efficient. It also don´t use less die surface.

Whats is done is that the GPU it just have one very narrow sett of instruction that is the same width and same length regardless. Typically it can be a 128 bit by 128 bit instruktion that can run on 2x64 bit, 4x32 bit or 8x16 bit. And it can only run a very limited number of instruction.

Also, there is no branch, and no branch prediction. This cuts away a load of piples. In a GPU every pipeline does say 128 bit of instruction for every clock regardless of instruction. If you only need 16 by 16 bit of instruction, though luck, still have to fill up the whole 128 by 128 bit register.

Also, a CPU got a FPU and a IPU and logic, and SIMD pipes. And every pipe have duplicates for bransch prediction. Some have 4 way branch prediction, so they have 4 pipelines for one instruction. So in a normal CPU you might have 4 pipes for integer, 4 for floating, 4 for SIMD and 4 for general logic (often combined with integer). Total of 16 pipes just to calculate one value. Why.. well to get the speed up. .. the clock speed that is and keep branch prediction errors down.

A GPU just does one type of pipeline (in the older days there was two different, one for transform and one for rendering, but in most modern one they are integrated in the same pipeline). Also, there is one instruction for every 128 or 256 set of data. That is if you uses 32 bit on 128 bit pipe you get a 1:4 instruction reduction... because most data uses the same instruction.

Typical 128 bit data can be a HDR rendering scen where two colors are mixed RGBA mixed with a other RGBA. Insteed of running first the R, then the G and then the B and so on. They just put the whole load in to the pipe and calculate all the data with the same instruction. Almost everything in graphics come in set of four. If there is a polygon, well triangle. It have 3 corners, and then a additional value for the polygon. So it still have 4 values.

Some Graphics load use half precision. Then you can do two pixel or two polygon calculation at the same time in the same pipe. Actually, CPU have had that capability since Pentium 3. But its still just have one output per core. (even if its a 128bit simd output, most modern CPU even have 256 bit simd)

The other part is the missing of branch .This remove a ton of problem for the chip designer. Firstly you don´t need any branch prediction. Secondly you don´t need and branching instructions and pipes. Removing the whole logic pipe. Thow you can still make branch like calculation multiplying with a matrix that gives a very set 0 or 1 value in the end matrix. But the GPU can never make any decision about it.

This is also the main drawback with the GPU. it will continually calculate the same set of work orders on and on and it will only calculate set given work order. It can´t make work order. This makes the load very predictable. The GPU know what it will do several 10 000 clocks in for hand. This helps parallellism very much. But it can´t make a instruction.

So the CPU still have to make a instruction for the GPU. But the load of the CPU can be very low. For example the CPU kan tell the GPU to render a trea of data. The trea in turn is a given list of objects, that have a given list of subobjekt, that have a given list of vertexes, that have a given list of textures and so on. This way the CPU can give the GPU very limited amount of information to make quite a lot of work. This have not always been the case. GPU prior to 2000 have to have specific list of textures and vertexes directly from the CPU, giving the CPU quite a lot of work load.

The problem nowdays is that the game dev want the word to be "living" They there for want as little amount of data to be pre fixed to a set trea of object. There for the CPU can be running in feeding the GPU with 1000 and 1000 of objects for every frame. In DX11 this is a problem, because in DX11 just one CPU core can do the GPU feeding. Someone just never thought this would be a problem. This have been in DX since the first version. Finally in DX12 it will be updated.

matsv
Автор

2:45 - Nearly bringing up the 'w' coordinate and quick fix by baptizing it "transparency" :P

michaelgeorgoulopoulos
Автор

woobly computerphile camera XDDD nice one hahaha.

fz
Автор

One of the features of OS X a few versions back was the ability to off-load repetitive, non-graphical tasks to the GPU rather than the CPU to operate faster.

HaniiPuppy
Автор

This is one of the better simplified explanations of 3D graphics I've seen. Particularly the part about why triangles are used.

MacBeach
Автор

it's amazing that he didn't explain SIMD or Vector computing, because that's exactly what the difference is - and it's the real answer to why CPU and GPU are different. instead of looping over an expression of 256 array elements, you create huge registers that are a gang of 256 floating point elements. a CPU would have 256 threads progressing at different rates on each array element. a GPU would force all 256 items to be on the same instruction because it's working on these huge registers. this is why you can't run all CPU algorithms efficiently on a GPU. ie: for each i in [0, 256]: c[i] = a[i] * b[i] ... is CPU thinking. Each thread progresses at its own rate. GPU thinking is: float32_times256 c, b, a; c = b * a; .... where c=b*a is one instruction, with three huge operands.

robfielding
Автор

You should have asked him exactly what role a CPU plays vs. a GPU in something like a game. What exactly does the GPU need from the CPU, etc.

richmahogany
Автор

We've learned all sorts of things about CPUs. Some more videos on GPUs would be really nice.

hr
Автор

Can you guys eventually do a video on how a cpu works? thanks

MystycCheez
Автор

An entire episode with no explanations on how stuff actually works. Well done.

MrDajdawg
Автор

1:27 That's a suboptimal way of tessellating circles. All the triangles meeting in the center will increase the chance of visible artifacts occurring.








(Youtube smartass answer mitigation disclaimer: I know it's not a circle and I saw that the triangles don't meet. Leaving a smaller copy of the same polygon for the center doesn't solve the problem though, does it?)

unvergebeneid
Автор

am I the only one who finds the way he says pixel weird, the stress seems to be in completely the wrong place

FoxDren
Автор

Watching this video on one monitor, seeing my GTX970 busy folding away on the other monitor. How appropriate...

rjfaber
Автор

It is extremely easy to complain about a tiny glich in the game while sitting in front of a console without any clue how everything works. When u think of about the math involved in making it, it is mind blowing what a human mind is capable of.

smaklilu
Автор

*GPU:* Graphic objects can generally be broken into pieces and rendered in *parallel* at the machine level.

*CPU:* Generally sequential by design to care of problems which have many *_dependencies_* which *cannot be solved in parallel* by their very nature.

Ex.
Seq1: *A = b + c;*
Seq2: *D = A + E;*
Seq3: *y = D - A;*

Obviously, you have to do the thing in *sequence* in order to get the value of *y.*

However, the *tasks* of *_relegating_* as to which problem (i.e. gaming or any other apps) during *execution time* is mainly a *function of the OS.* This is so as *Gaming Apps* is not *all* a *_graphic rendering_* task but also a combination of *logic, rules, and some AI.* Those are the reasons why you need *both.* Kinda like *_specialization_* of tasks.

zegzezon
Автор

I've been told that for computer modelling, GPU's aren't always a good solution. I've been told that for certain kinds of Finite Element Analysis, you can generally only calculate one step at a time, one node at a time, as the result of one node can affect the calculation required for all the other nodes it influences, and the same is true for those nodes at each time step. So these still need high speed CPU's to do the grunt work, as the task isn't easy to parallelize.

Croz