Building a GPU cluster for AI

Показать описание

Learn, from start to finish, how to build a GPU cluster for deep learning. We'll cover the entire process, including cluster level design, rack level design, node level design, CPU and GPU selection, power distribution, storage, and networking.

This talk is based on the Lambda Echelon GPU Cluster whitepaper. The whitepaper can be found above.

Slides for the talk can be found here:

Errata:
- Slide 46 contains an erroneous diagram with a connection from the storage server to the compute fabric network, the storage server does not connect ot the compute fabric network. The correct diagram is available in the whitepaper.

Рекомендации по теме

Комментарии

Thanks. I’m planning on building a “massive” 2 GPU system for home use.

peterxyz

Extraordinary presentation. Covered all the important topics in depth and with real teaching talent. Many thanks!!

randahan

One of the best presentations on GPU cluster design, even at 3 years old. Great teaching skills!

fundyourhustle

Ground level details with all the critical aspects covered nice for GPU Cluster to the last cable length calculation.

onlooker

Its nice to see a holistic explanation of designing / building / installing a complex multi-rack system...As someone that has spent years working on both sides of the "analog/digital divide" (physical data center world / digital world's various segments), the un-sexy physical aspects of available rack space / power / cooling / floor loading / network uplink bandwidth are often overlooked (often assumed)...A semi arrives with a pallet: "Hey Carl, you can have this online in a couple days, right?"

carlschumacher

Thank you for highlighting an underrated topic/options that company should re-consider within their compute infrastructure.

yassinebouchoucha

Best of the best presentation on server clusters. Author presented deep understanding of server clusters so that he explains things in an easy way. thank you!!!

lovanda

Thank you. You got me started years ago with your lambda stack -- the only way I could get TensorFlow installed on Linux.

dr.mikeybee

Lots and lots of A100 GPUs. Every single one of them is a monster, almost 2x faster memory than the next best GPU. An entire room full of A100 racks... holy cow.

ProjectPhysX

What an amazing presentation - one of the better videos I have watched. Great breadth and depth.

AjaySimha-sy

Most professional and holistic explanation I heard about this topic.
Thank you so much!!

randahan

Highly appreciated...Youtube should have a separate category called Founder's video.

cyberspider

Very expert suggestions for hpc and compute sizing.

NSPK-

Hey Stephen, this is highly informative. I work on this clustering. Now am able to connect the dots and get the bigger picture.
where can i read about the relationship between numa topology and GPU peering capability.

HarishN.J

I want to build a multi dual epyc 7742 based system for goofing around learning this stuff.

loadmastergod

My machine learning team consists of me baby

HankGallows

Really good analysis and presentation!

ilyboc

Still most relevant today, 2 years later. Thanks.

julianfiacconi

I have three computers, and a nas, and a external hub. I think that I don’t need a another server because of the NAS. As far as my architecture goes, is there anything else that you can advise?

glennisholcomb

Tell me how difficult it is so i can buy your solution kind of talk

rosenangelow

Building a GPU cluster for AI

Building a GPU cluster for AI

Supermicro’s H13 GPU Systems ft. @LinusTechTips

Build your own Deep learning Machine - What you need to know

RTX 3090 8Way GPU Server

Build Your Own GPU Accelerated Supercomputer - NVIDIA Jetson Cluster

Nvidia L40s - The Ultimate GPU for Deep-Learning | Enabling Generative AI for Enterprises.

Lambda Echelon GPU cluster - an introduction

Deploy a production ready genAI chatbot to query data, enhance productivity

$90000 NVIDIA A100 GPU Server

AI/ML/DL GPU Buying Guide 2024: Get the Most AI Power for Your Budget

YOU Get a Supercomputer! - Supermicro AI Workstation @ CES

How to build a GPU Server for AI & Deep Learning | CPU/GPU for Training & Inference | TheMVP

Tyan's 10x GPU Server

How to build a GPU Server for AI & Deep Learning I Watch the Full Video | TheMVP

$200000 NVIDIA H100 80GB 4Way GPU Server

Mythbusters Demo GPU versus CPU

The COOLEST Deep Learning Machine - The Comino Grando RM V2 Short

GPU Gaming Server Build

NVIDIA REFUSED To Send Us This - NVIDIA A100

Lex Fridman reacts to George Hotz's six GPU tinybox build

How to Choose an NVIDIA GPU for Deep Learning in 2023: Ada, Ampere, GeForce, NVIDIA RTX Compared

$15000 GPU? NVIDIA A100 80GB GPU Server

The Making of the NVIDIA DGX Station

Downgrading My GPU For More Performace