Apple M3, M3 Pro & M3 Max — Chip Analysis

preview_player
Показать описание
In-depth analysis of Apple's new 3nm chips: M3, M3 Pro and M3 Max. Silicon deep-dive, die-shot analysis and a closer look at CPU, GPU, NPU and the TSMC N3B process node.

0:00 Intro
0:47 M3 Silicon Analysis
4:17 M3 Pro Silicon Analysis
6:26 M3 Max Silicon Analysis
8:25 Why is the M3 Pro a downgrade?
10:47 NPU deep-dive
12:16 CPU deep-dive
13:17 GPU deep-dive
14:44 GPU architecture / Apple family 9 GPU
16:22 TSMC N3B Process Node
18:44 Wrap-up
Рекомендации по теме
Комментарии
Автор

I watched the entire 20 minutes. As a long retired chip designer, this 90B active device world is incomprehensible, but I enjoy following anyway.

silvercs
Автор

As an embedded and FPGA engineer, CPU design like this has always felt like the major leagues. Watching this video must feel to me like watching a sports game with good color commentary feels to a typical American. Thank you for producing this

DeadCatX
Автор

Still watching and man, these deep dives are so fascinating to learn more about silicon design and engineering in our current era. Absolutely amazing work!

zkeltonETH
Автор

Easily the most informative M3 breakdown. Kudos.

cp
Автор

Sorry for the long wait, the video got longer and longer the more I worked on it... Let me know if you enjoy these (very) deep-dives, or if it's too long/detailed for you.

PS: the dynamic caching doesn't have anything to do with the system memory, but it's about the on-chip GPU memory. The whole GPU seems to be complete game changer, something a lot of ppl seem to have missed. This might very well be the most advanced GPU architecture right now and it will take a while until we see it's full potential.

HighYield
Автор

Who wouldn't watch your entire breakdown of apples silicon? Personally I enjoy how this channel focuses on the less talked about features of hardware design, it really makes you understand how much a company can care or not about a product they are launching into the market. Keep up the great work, I cannot wait to watch more of these breakdowns in the future!

Fractal_
Автор

Great video I was really looking forward to this one! On the "Dynamic Caching" in the new shader core (aka. register file + image block + group shared = L1). You've watched Apple's video already so I'll try to add some additional practical context to why it's important:

It doesn't require new shaders to be written, old shaders are forward-compatible with taking advantage of this feature, however most shaders were indeed written with the limitations that came before it, thus the big advantage would only be felt on shaders that had low occupancy previously and can now maybe have higher occupancy.

A lot of shaders are written with say reading a bunch of buffers, and reading a bunch of textures at -some point- typically early, and at this point they'll greatly benefit from high occupancy to hide latency and avoid stalling. But typically, later in the shader, you do a bunch of math that require -a lot of registers- for a short time, and this spike in register count in the old method required that the whole shader demand many registers the whole time, even though for fetching buffers and textures it only needs enough to store the read results in just then.

So the benefit here is that you get to have low register pressure when you need high occupancy early in a shader to hide memory latency, and later during "just math" where you don't need occupancy to saturate the math you can now go nuts with registers. Having the freedom to use many registers can make for better algorithms that can take advantage of large amounts of registers without worrying about hurting memory latency in another part.

It also provides freedom, you don't have to spend a lot of optimization time getting a magical register count, the shader core does it for you (almost, you still need to make sure you don't need many registers at the time of doing these memory reads), and most importantly, you can now make dynamically branching uber shaders that don't trash your register file usage! Previously we've always had to make many shader variants for specialized cases and compile them either at build or run-time, because a huge shader with tons of branches would have register pressure as bad as the worst case "everything is on" scenario, well now the register pressure is dynamic based on what's enabled!

I probably got some parts wrong but I think it's really interesting how much having an L1 cache changes for shaders.

ikarosav
Автор

Love these long deep dive videos. When executed well they provide extraordinary value. Time is valuable and this video did not disappoint. Keep up the great work!

gimmedaloot
Автор

I’m a high school computer teacher and I played it for my students. My students love to keep up with the latest chip news. Thanks for sharing!

schwartn
Автор

I'm certainly watching your every video till the end! Just recently discovered your channel and it's a godsend in terms of amazing in-depth explanations of how exactly all those performances and features are achieved and realized on the silicon level!
I've always wanted for someone to explain things like that, like on a truly low level - in terms of hardware - literally talking about transistor counts and how it's all allocated on a chip, designed, interconnected, etc.
Thank you so so much for what you're doing on this channel! Keep these amazing videos coming!

Frytech
Автор

I watched the whole thing and subscribed. This was a very nice level of analysis for me, and I think you did a great job of overviewing the changes.

It seems to me that this gen is taking to heart one of the original RISC tenets, where spending transistors on caches (vs cpu, etc) is a huge win. The tricky part, also from RISC heritage, is that you have to have compilers that can take advantage of the opportunities for caching (and the exposure of opportunities for parallelism).
I enjoyed your video a lot. Thanks.

jimgolab
Автор

Thanks for such a detailed analysis. I find these deep dives really interesting and I'm pretty sure many others would agree too. Would surely love to see more of these in future. Cheers!

cyan_aura
Автор

You have a gift for articulating these subjects. I have zero chip background but was easily able to follow through to the end.

mrfin
Автор

Just completed watching this video. As a current chip designer, absolutely love your content and this video in particular was very well done. Would like to see more deep dives like this video.

m.s.psrikar
Автор

I’m in a somewhat similar industry trying to rebalance our product line portfolio and create distinct segmentation and know how many meetings and difficult it is. Im sure there was a ton of stress by folks at Apple (and thus a ton of meetings) when they relanded the M3 Pro calling it a “downgrade”. I can see the product planners and engineers arguing in my head. Watched the whole thing and subscribed. Thanks for doing this.

stefanbuscaylet
Автор

I watched the entire video and I don't think 20 min is particularly long for this kind of content, you did a great job 👌👌

HardwareScience
Автор

I always watch your videos from the beginning to the end, since your content is excellent. Thank you again this time.

Cofenotthatone
Автор

I really wish Apple would give more details about the dynamic caching stuff. I read the patent filing and it looks interesting. I was hoping the new GPU design is optimized for training ML. Watched the whole video. Hopefully as more people analyze the chip, you can update and identify where the dynamic caching logic sits on the Max chip.

woolfel
Автор

It’s remarkable how much effort you’ve put into producing and researching this, keep it up! 👏

papsaebus
Автор

It doesn't matter how transistors change, I just know that the knife skills are superb and they are getting expensive again.

ancientsword