First in Class AI GPU? | AMD Instinct MI300X Runs Llama 65B FP16 Inference on Single GPU

preview_player
Показать описание
AMD has officially unveiled its beefy Datacenter GPU, Instinct MI300X With Up To 192 GB HBM3 Memory

This is part of the race to capture the promise of a giant AI market.
The biggest Chip manufacturers, AMD, Intel, and NVIDIA, are currently in a sprint to offer the world's first CPU and CPU combo.
Intel's offering was going to be Falcon Shores which was going to combine an x86 CPU with a leading-edge GPU but it seems they took a time-out, and their plans are on paused, for now.

I want to focus on Datacenter GPUs and the darling of high performance computing, AMD.

AMD just announced what could be their biggest announcement this for AI, the MI300X Datacenter GPU.

Now, What is surprising is MI300X is actually a slightly simpler chip than the CPU-GPU chip from which is was designed, the MI300A.

What AMD did was to replaced the MI300A's three CPU chiplets with just two CDNA 3 GPU chiplets. The result was a 12 chiplet GPU design.
There are 8 GPU chiplets and another 4 IO memory chiplets.

Given the Large Language Boom or craze, depending on who you ask, having a whooping 192GB Vram on one GPU is a big deal.
GPU memory capacity is the constraining factor for the current generation of large language models (LLMs) for AI.
AI Startups are snapping up GPUs and other accelerators as quickly as they can get them,
all the while demanding more memory to run even larger models.

Even with 10B from Microsoft, Sam Altman says he needs cannot get enough GPUs, 10B.

So Clearly, being able to offer a massive, 192GB GPU that uses 8 channels of HBM3 memory will be a sizable advantage for
AMD in the current market, especially if MI300X starts shiped on time.
It is projected for 3rd Quarter, 2023.

AMD did not stop at the GPU. They had one more thing to announce. The AMD Infinity Architecture Platform

This is an 8-way MI300X design, that allows for up to 8 of AMD’s top-end GPUs to be interlinked together to work on larger workloads

This setup is similar to NVIDIA’s 8-way HGX boards, Intel’s own x8 UBB for Ponte Vecchio and even google's TPU server Motherboads.
So the 8-way processor topology is currently a sweet spot for high-end servers.
This is both for physical design reasons – allowingroom to place the chips and room to route cooling through them and – It allows the best topologies that are available to link up a large number of chips without putting too many hops between them.

It is clear that If AMD wants to close in on NVIDIA and capture more than just the crumbs of the HPC GPU market, Then turn key server Hardware is one more area where they’re going to need to match NVIDIA’s hardware offerings

#amdai #nvidiaai #amdgpu #nvidiagpu
#aitools #ainews #deeplearning #metaai #ai
#googleai #openai #transformer #aisota
#amd #nvidia #jasonhuang
#mlinfrastructure #gpuserver #llmtraining #llmfinetuning #llminference
#NLP #naturalLanguageProcessing #ML #machinelearning #aiassistant
#softwaredeveloper #coding
#aibot #aibenchmark #samaltman
#LLM
#Largelanguagemodel
#chatgpt
#ArtificialIntelligence
#NeuralNetworks
#Robotics
#DataScience
#IntelligentSystems
#Automation
#TechInnovation
Рекомендации по теме