This Server CPU is so FAST it Boots without DDR5

preview_player
Показать описание
This server CPU has 64GB of HBM2e memory onboard like a GPU or AI accelerator (e.g. the NVIDIA A100 or Habana Gaudi2) that lets it do so many cool things. We take a look at the supercomputer CPU and find that it can be used for a number of other use cases. The Intel Xeon Max 9480 is a really cool server processor.

Note: Intel loaned us not just the CPUs, but also the system we used for this piece. The system has already been returned. We are saying Intel is sponsoring this video.

----------------------------------------------------------------------
Become a STH YT Member and Support Us
----------------------------------------------------------------------

----------------------------------------------------------------------
Where to Find STH
----------------------------------------------------------------------

----------------------------------------------------------------------
Other STH Content Mentioned in this Video
----------------------------------------------------------------------

----------------------------------------------------------------------
Timestamps
----------------------------------------------------------------------
00:00 Introduction
01:47 Explaining Intel Xeon Max and HBM2e Memory
05:23 Using Intel Xeon Max
09:53 Performance
14:42 Power Consumption
17:00 Key Lessons Learned
19:05 Wrap-up
Рекомендации по теме
Комментарии
Автор

I worked on this CPU! Specifically the bridge dies between the CPU tiles. I figured I'd share some fun facts about those CPU tiles here for you guys:

Each CPU tile has 15 cores. Yes, 15. The room that the 16th would occupy is instead taken up by the combined memory controllers and HBM PHYs.

There is not one continuous interposer. Instead, each CPU tile sits on top of EMIB "bridge" dies as I call them. this strategy is more similar to Apple's than AMD's, or even Meteor Lake's. This is because Sapphire Rapids is so enormous that it exceeds the reticle limit of the machines that make normal interposers.

There are 4 CPU tiles, but and 10 bridges. The tiles each have 5 connections, 3 on one edge and then 2 on the neighboring edge. 2 of the tiles are mirror images of the other 2. You can get a diagonal pair by rotating one about the center axis 180 degrees, but the other 2 have to be mirrored to keep the connections in the right place.

DigitalJedi
Автор

One of the nice things about "Serve the HOME" (Emphasis on HOME) is that we get to have a glimpse to see what we'll be running in our HOMES as low end servers in 30 years....

I'm 5 minutes in and I can't imagine the cost of those things when they come to market, not to mention the REST of the hardware costs.

MrPontiac
Автор

Considering new AMX instructions and all that bandwidth afforded by HBM, it would be very interesting to see benchmarks for AI tasks, like running stable diffusion or llama models. How would they stack up against GPUs performance wise, or power and cost efficiency wise? Would be very relevant in current datacenter GPU shortage!

LSNSDeus
Автор

Great content Patrick!! Been waiting to hear about these for a while... And you always get the cool stuff first. 😉

shammyh
Автор

I hope they evolve this and bring it to the workstation xeons. I would love to have a unlocked xeon with built in memory.

stefannilsson
Автор

Without the RAM slots taking up width, you could pack a HBM-only server incredibly dense - maybe 3 dual socket modules across a 19" rack? Not many data centres could handle that power density, but it would be pretty neat to see

maxhammick
Автор

I can't wait to 5-10 years from now when see this come to high end gaming machines.

edplat
Автор

Love a good datacenter CPU discussion!

Strykenine
Автор

Damn, that localized memory is incredible for SQL instance/shard, web server cache and so much more.
HBM memory runs at lower wattage than DDR memory, with significantly higher bus width and lower frequency required to achieve high bandwidth (afaik).

p.s. Didn't show the bottom of it even once =\

Gastell
Автор

Yes! Waited long time for this monster cpu

BusAlexey
Автор

Some of us remember when CPUs had L2 cache external to the CPU. Then the Slot 1 had the cache integrated onto the same card as the CPU, and when the Pentium III came out, L2 cache was completely internal to the CPU die. I don't see external RAM going away any time soon, just because of how useful it can be to just add more RAM, but this seems to be following the same evolution, and the performance it brought. Perhaps one day we'll see internal RAM on consumer CPUs as well!

BlackEpyon
Автор

Can't wait to buy these 5 years from now and use it for my homelab 🤣

cy
Автор

Amazing content. Thank you intel for sponsoring this.

thatLion
Автор

While I work with virtualisation a lot compared to specific high performance workloads, this has always begged the question for me, even when playing around with a legacy Xeon Phi 5110p CoProcessor, how would a chip like this handle memory failure? Nowadays whenever we have memory failure, ECC kicks in as a first resort and then you have options such as Memory Mirroring so your workloads can continue with a reduced amount of available memory.

How would a chip like this handle it, say, one of the HBM packages was defective or outright didn't work, does the BIOS of the system have any form of mirroring? Considering this is four seperate packages working as one, would this prevent the chip from booting up at all?

Great coverage though, always fun to see what new products in the HPC sector brings to the table.

CobsTech
Автор

Honestly it feels like once AMD did their monster sized CPU chip everyone stopped caring about keeping things conventional like how it took on couple to make everyone start dancing at the school dance.

jmd
Автор

Pfff the cpu in my server is so fast it boots with ddr3

shiba
Автор

On the topic of 1000W power draw, I believe these are the same CPU power delivery topology that Intel showed a while back during some of the lab tours (e.g. I believe one of der8auer's videos in the extreme OC labs showed this off), where you have a relatively small number of VRM phases on the motherboard providing an intermediate package voltage, followed by a massive number of on-die power stages (100+) parallelised into a huge segmented polyphase buck converter, which helps reduce ohmic losses and PDN impedance by moving the regulation closer to the point of load on the die. The combined continuous output current of the on-package converters appears to be 1023A, logically limited by the number of bits in the relevant power management control register. This kind of current delivery would be unworkable with a traditional VRM, but since the phases are physically distributed around the package the average current density is heavily reduced.

gsuberland
Автор

I remember you telling me this episode was coming a few weeks ago! The idea of memory-on-a-chip would be sweet for the consumer audience. It was worth the wait. :)

hermanwooster
Автор

8:53 I always figured HBM was the endgame for the entire Optane thing. Too bad it never really panned out since it had mad potential and could have changed how we think about, for example, database servers all together. Intel sometimes is so far ahead of themselves even they can't catch up to them (and then something like Arc happens 🤦‍♀)

sehichanders
Автор

What is the cinebench results, single and multi? That is all that counts at the end of the day....

MYNAME_ABC
join shbcf.ru