First Principles: building a high performance network in the public cloud

preview_player
Показать описание

In this episode of Oracle's video blog First Principles, we learn how architects at Oracle Cloud Infrastructure (OCI) are using Remote Direct Memory Access (RDMA) to deliver high performance networking with very low latency for customers' most demanding workloads.

For more architectural breakdowns, catch up on more First Principles videos:

00:00 Introduction to OCI Cluster Networks
01:56 What is RDMA?
03:35 History of RDMA at OCI
04:56 Why is RDMA Challenging?
07:15 Importance of RoCE
09:18 Pitfalls of RoCE
12:34 Overcoming Pitfalls of RoCE
15:40 Limited use of PFC
16:46 Tailored QoS for multiple workloads
18:10 How to use ECN in RDMA networks
19:11 Tuning ECN to HPC workloads
20:05 Tuning ECN to GPU and DB workloads
21:25 Are OCI Cluster Networks in the same network?
22:36 Why do we need a separate RDMA network?
27:16 Performance optimizations for workloads
28:31 Flow aware traffic distribution
31:53 Traffic locality optimization
33:33 Traffic topology information vending service
35:54 Why OCI RDMA network is better, differentiated
36:24 Balancing scale and latency
Рекомендации по теме
Комментарии
Автор

The engineering excellence adopted by OCI appears unmatched. Great work Jag&Pradeep.

StevenDake
Автор

Great rapport between the presenters makes this video fun to watch. Love it.

JillMcHale
Автор

Right to the point and chock full of examples - nice work.

LordOfTheThreeWorlds