Cerebras Co-Founder Deconstructs Blackwell GPU Delay

Показать описание

Cerebras Chief System Architect and Co-Founder, J.P. Fricker explains the technical challenges with Nvidia's Blackwell.

00:12 Introduction to Interposers
02:54 Differences between Blackwell and previous GPUs
04:12 Silicon Alignment challenges
05:42 Thermal Expansion challenges
08:28 Cerebras design decisions that solve for alignment, expansion, and core failures
17:37 Package Analysis: Diving deeper into CTE mismatches and how Cerebras' Wafer "slides"
21:50 Conclusion: What happens when AI reaches scale

Cerebras Systems

Комментарии

Great video honestly, give the marketing guy a kudos. I know it's a marketing video with all setups for the founder to easily layup but it almost felt like an actual interview that's educational.

That being said going into the challenges faced by nvidia only makes me more impressed by them not less. blackwell is an amazing feat and I'm excited to see the improvements in rubin.

MiraPloy

fault tolerant design without cutting something, only to reconnect it later is simple and genius. The Internet also was build upon the assumption that it continues to work well, even if parts fail. And look how great that scaled! Happy that they are succeeding with the truly different approaching to chip manufacturing.

tristanwegner

Nailed it! With thousands of connections between these chips, alignment is not easy! Cerebras is clever to take WSI route to circumvent this issue!

MarathiPremi

This was very informative thanks. Looking forward to trying out your chips.

zalins

Sometimes you have to see the forest for the trees. Ingenious solution to latency, thermal, and manufacturing issues.

ebodshojaei

flexible connectors in chips to compensate different thermal expansions is interesting. I wonder if a future approach could also be to never let the chip cool down much after initial installation, so having a idle load applied, so it does not cool off. Accepting limited number of big thermal cycles might be a worthwhile tradeoff, it is buys wiggle room in other areas. Would not work for smartphones or end user devices with peak loads and many off periods, but on a server farm, it sounds reasonable.

tristanwegner

You could try to use the Nvidia Broadcast feature to get rid of room noise and echo

zenwheat

Very interesting presentation. But I am sure Nvidia experts might have a different story. And I think that software support for AI processors is also very critical in future. Nvidia has CUDA behind it which is a huge bonus for them. In fact, the reason that RISC was in the shadows of Intel's x86 architecture is precisely because of software stack issues. Nevertheless, still one can nothing but admire Cerebras team for their vision and innovation.
If anyone knows a good book or review article about modern GPU/NPU/TPU architecture (not CPU) please post in reply to my comment; I really appreciate it!

dennissinitsky

Here’s the Nvidia solutions as I see it. 1) load balance. This will save differential thermal expansion issues. An IR camera can be used and the s/w and microcode guys can look at this. 2) shut down any cores that get too hot and let them cool to ambient (ie active thermal management on a package level), 3) do redundant compute at the rack level. This will guarantee that if anything goes wrong that the results are correct. TMR can be done but it’s more efficient to do a RAID style redundant compute. Whichever is more efficient on compute. You sacrifice a bit of performance for reliability. These are all tractable problems that can be solved. New technology always has stuff like this and Engineers find creative solutions. That’s what we do!!

GodzillaGoesGaga

Incredible explanation. Please do another asap!

IakobusAtreides

My knowledge of the complexities involved in chip manufacturing has grown tremendously over the past 5 days. This video is like the icing on the cake for what I believe might be where the engineers at Nvidia may end up drowning trying to drastically change the architecture of the processed wafer which I recall is a game of chance. While Nvidia's gaming GPU market has seen them cherry pick the best GPU's to fit in their top-of-the-line cards. So every point that was made in the video I HOPE the engineers at Nvidia are listening!

TheRobertChannel

14:12 - From my understanding, Cerebras is a new semicon fab, like Intel or ARM, building computers from scratch and not using common architecture to build computers. Great way to be innovative and this would require a lot of testing and validation since it is completely new process of making computers. Basically an SoC on a wafer, 100% custom made.

sto

Thanks for the great explenation J.P.!

Capeau

20:20 "in the center .. you have direct connection", and assuming meaning the (I/O) pins there only (not on/near the edges, because of expansion). You must send in data (and calculated output from) to all the million cores from there then. Since the cores are small, then this is very special purpose (likely no caches or global address space), though you have good bandwidth no next code, but different latency depending on how far.

pallharaldsson

Very impressive explanation... definitely the way forward for large scale AI.

stachowi

Please make a cerebras brand wafer cookie! Preferably wafer-scale too.

nemodelingat

Thanks for the video: Transformers are now HBM bound due to KV cache, what's the per core bandwidth of Cerebras compared to HBM3e or beyond?

ericchang

Nice video. Curious about one part. You mention having logic & memory (L/M, time 12:00) right next to each other as being a great innovation. Yet how is this any different than NV shared memory?, that is fast memory local to a group of cores. Could you elaborate on why your logic-memory design is better?

ramakarl

That's people at TSM is working 12 by 7 right now. Miracle workers. Can they save the launch?

xiangli

great innovation. will love to explore the software optimizations to run a llm on this...

MrGss