M4 Deep Dive - Apple's First Armv9 Processor

preview_player
Показать описание
Here is my look at the new Apple M4 processor. Apple's next gen chip has been redesigned to utilize Armv9 and extensions like Scalable Matrix Extension 2 (SME2). This will be the foundation for the next several years of processors from Cupertino.
---

#garyexplains
Рекомендации по теме
Комментарии
Автор

Love that we live in a world where '28 billion transistors' is a throw-away line! Lets take a moment to admire that: three transistors for each person on the planet in an area smaller than a postage stamp. Mind, boggled!

john_hind
Автор

Well done video. I’ve owned almost every model of iPad and have a household of M-series latptop/desktop machines. The M4 iPad that I picked 5 days ago now has wildly changed my workflow. Doing very high resolution wildlife photography (and some videography), speed is 100% the most important variable in whatever equipment I use, besides the display. I’m now spending more time working on my M4 iPad than I am on my Mac Studio or 16” MacBook Pro. It’s performance running Lightroom, Photoshop, DaVinci Resolve, Affinity Photo 2 and Final Cut Pro is absolutely amazing. Now, I shoot, I hookup my Thunderbolt CFExpress card reader, dump hundreds of 50MP Sony .ARW images, and get down to processing my photos (and some 8K/30 and 4K/120 video) all without touching my desktop machines. My Mac Studio M1 Ultra 128GB is faster when doing huge batches of very intensive image processing, but that really just comes down to having more than double the CPU and GPU cores in addition to still very strong NPU performance even though that machine is several generations behind.

Anyhow, it’s amazing. I care very little about benchmarks because the only thing I care about are the results I can accomplish and how effectively I can get that work done. There are still VERY strong use cases for each of my machines in different scenarios. I won’t be one of those people ignorantly stating that it’s can do everything. It can’t. Neither can my MacBook Pro or Mac Studio, they all work together excelling in their own ways.

Also, being a gamer, I love playing more and more games on my iPad including FINALLY being able to load up my retro games via Delta Emulator.

If all you care about is AAA gaming, go buy a Windows PC or Steam Deck. If you troll YouTube, Twitch, and barely use any real capabilities of any of your machines, buy whatever you want and still pretend to be an expert online to make yourself feel good. What it really comes down to is buy the technology that lets you be creative, productive, or have fun in the manner that is best suited to your budget and personal preferences! 😊 I use all major operating systems daily (yes that includes Linux), and all manner of hardware (yes, that even includes an Android device and Windows on ARM). I have stuff I love and hate personally…but at the end of the day it is ALL pretty darn amazing technology and it’s a good time to be alive if you enjoy this sort of thing. 👍

BrockGunterSmith
Автор

Excellent video, great idea to not just repeat the press kit and come with some additional information 😊 I liked the idea to divide by the GHz to understand the change better

BenjaminDirgo
Автор

i was wondering when they’d switch to armv9. i always thought it was a bit unfortunate that the early M chips were stuck on the older ISA with the less robust vector support. glad to see they’re still trying to compete and not just sitting back.

octagonPerfectionist
Автор

This is why I subscribe this channel and even put on the bell. Gary goes deeper than the usual Youtube tech reviewer and I really want to know these details in computer tech!

therealmarv
Автор

Thanks Gary for this information and clarification video on the new M4.
Have a great day!

Kw
Автор

Thank you for being super honest when you are just guessing at something and you do not have actual information. That is extremely helpful.

There’s somebody on YouTube who talks about Apple stuff all the time and it’s obvious he’s just taking wild guesses and stabs in the dark, his videos are a complete waste of time.

You give the good information when you have it and you admit when you are just pontificating and guessing at things. Thank you!

juddrizzo
Автор

It's impressive that yes, in numbers the Qualcomm Elite-X should be faster than the M4, but there are no single device with it, even half a year after it was announced, yet the M4 is already here and it was just a few days after, ON AN IPAD. If they take more time there will be a desktop M4 with a lot more performance cores and the Elite-X won't be as disruptive as they thought it was going to be, only in the PC-area, not as the competition for Apple silicon

smokeduv
Автор

Hi Gary, nested virtualization will be available on a macOS Sequoia, but only for M3 series chips and higher, even though M2 series chips already had a hardware support for it. Please test both cases when OS update drops, it would be a great info for devs. Thanks and good luck with the channel.

qwe
Автор

The dynamic caching has nothing at all to do with system memory.

What it is talking about is the on die local cache (within the GPU) (think of it like L1/2 cache but for the GPU) and how this is divided into Cache, thread group memory and registers.

On (all other) gpus when you run a task the GPU will look at the maximum amount of local memory and registers that tasks will need throughout the runtime of the task (this includes optional branches that it might never take but could... you cant know before you run it after all). It will then look at how any registers and local mem each core has and from that figure out how many copies of that shader it can run at once. However most real world shaders have very non uniform mem/regsiter usage were 95% of the shader time will only use a tiny fraction but that 5% will use a huge amount. What this means in practise is the GPU is still limited on how many copies it can run but 95% of the time there is other capacity that it could use but it cant do anything about that since the registers and or local memory are resolved for that high demand point (that might not even happen in each instance of the shader as it likly is behind some optional branch).

Dynamic caching has 2 key changes to this:
1) the gpu at runtime can dyanmicly change the local per core (l1) memory to re-alocate how much is used for registers, thread group memory or cache. Rather than on other gpus were the GPU vendor needed to in advance fix this ratio. This is a big deal as differnt tasks have differnt demands on the ratio so now they can make better use across more use cases.
2) Due to being able to dyanmicly alocate more registers or local memory during runtime the gpu can run more instances of a task at once since if it hits that high mem/register demanding point it can get some more registers (or thread group memory) by kicking something out of cache...


These 2 things combined has a HUGE impact on performance for branching code pathways were your you cant predict before running it what pathways the code will take so on other GPUs the GPU must anticipate the worst case scenario leaving lots of GPU un-der used due to some optional pathways that maybe non of the threads end up hitting. The biggest culprit of this is RT like operations were your sending a load of rays out to intersect objects it he scene and then you need to do shuddering computation for each interaction but there are lots of differnt types of objects out there leading to very few threads being run once just in case all the rays end up hitting the most costly (from a thread group mem or register) martial function.

But the key takeaway is it has nothing at all to do with your system memory it is all about the tiny amount of local memory (registers, thread group and cache) that is within each GPU core.

hishnash
Автор

How much do you think Qualcomm is leaving on the table in terms of performance for not being able to implement ARMv9 instruction set in their Elite professors? I am guessing that must've been part of the bump in Apple numbers along with Ghz, improved process node and all that you mentioned.

akarimsiddiqui
Автор

It is V9?! Hell yeah! So my new device isnt a total waste of money ;) Joking aside, I'm mostly curious on the powerdraw in pure watts

EyesOfByes
Автор

For most of us, M4 is like a 2000mph car, but we live in a 60mph world. Hard to tell M1 from M4 in everyday use. The YouTube content generators won't be happy until they can render a 30 minute 8k video file in 10 seconds.

eddiegardner
Автор

This is an incredible synopsis of M4. I will spread the URL to your video.

softwaremaniacpsm
Автор

I’ve been getting scientific American since I was a kid in 1962. In the early 1970s, I believe 1973, they began a series of articles that came out one a year. This kept track of progress of integrated circuits that were just coming out. The writer, an engineer, was amazed that with an incredible several dozen transistors, he “could barely see the transistors with my bare eyes”. That to almost 100 billion on an M2 Ultra. I’d like to bring up an important point ab out clock speed. In order to run at a higher clock without using a good deal more power, because power increase is not linear, the entire chip needs to be redesigned to run at that higher speed. Apple has been very good at doing that while adding functions. It’s also amusing that many people still believe that pure CPU or GPU scores tell the entire story. In reality, current SoCs use a combination of the different parts of the chip to do a complex task. So the rating of the CPU or GPU doesn’t always tell us what the actual performance in real world software that is optimized for the system.

melgross
Автор

As I always say, for consumers the single core performance is the best metric for a computer's snappiness - most consumers who have a ton of cores just spend their time with most cores idle.

If you're a creative or graphics designer or do a lot of transcoding, you probably use software which is multi-threaded in which case you can make use of that multi-core performance.

Really though for most folks, something like Speedometer 3.0 is probably the best benchmark you can use to compare computer performance.

vernearase
Автор

People keep getting distracted by the iPad when the big story here is the impact the M4 / Pro / Max / Ultra will have on Macs.

mranalog
Автор

7:00 the Instruction set is ARMv9.2-A and 13:07 even the M3 chip from Apple supports hardware-accelerated GPU

Ronin-frwm
Автор

you received a new sub by the 4 min mark of the video. thanks.

MrKeedaX
Автор

When comparing the M4 and Snapdragon X, the M4 (currently) is passively cooled in a sheet of glass, whereas the Snapdragon Xs are actively cooled with much lower single core performance.

I recently purchased an 11" iPad Pro M4 512 GB with cellular, a Magic Keyboard, and a Pencil Pro and this sucker flies.

It's got a tandem OLED display which is for all practical purposes a reference display, and is the fastest browsing experience I've yet to see on _any_ compute platform, in a thin and light magical sheet of glass.

The engineering is magnificent, and with the Magic Keyboard this thing can be navigated from the keyboard, but you can pop off the iPad and you have this futuristic bright (1000 NITs for SDR content, 1600 peak HDR) Promotion display, with practically instantaneous performance due to the speed of the P-Cores. I can pop the whole kit and caboodle into a sleeve and be off for portable compute on-the-go anywhere whether there's wifi or not. It makes my 2020 16" M1 Max MacBook Pro feel like a boat anchor.

I got it with 512 GB not because I expected to need the RAM, but to insure that both NAND slots are propagated to maximize SSD speed.

Add a Paperlike screen protector and a MagEasy graphene case (compatible with the Magic Keyboard) and the iPad is protected from minor trauma when seperated from the Magic Keyboard with a docking port for the pencil (so it's not just magnetically attached).

I _love_ this thing.

My previous iPad as a 6th gen which I bought with a cramped Logitech bluetooth keyboard and a 1st gen Apple pencil which left me so unimpressed I give it all to my wife (who still uses it intermittently to this very day).

vernearase