filmov
tv
Все публикации
0:08:39
Serving Very Large Models on K8s with Leader Worker Set
0:07:39
Access GCP Resources Securely with Workload Identity Federation for GKE
0:04:09
Serve Llama 3.1 405B on Kubernetes on Multi Host GPUs
0:04:39
GKE Time Sharing for GPUs
0:05:38
GPU Sharing in GKE with NVIDIA MPS
0:08:52
Improve Infrastructure Autoscaling with Custom Compute Classes in GKE
0:05:16
GPU Sharing on GKE with Multi Instance GPU
0:05:43
Different ways of Running RayJob on Kubernetes
0:03:48
Simplify Kuberay with Ray Operator on GKE
0:14:34
GKE Multi Tenancy with Teams
0:11:30
Fleet Level Feature Management with Feature Manager
0:12:07
Build Internal Developer Platforms on GKE using GKE Enterprise
0:08:37
Tips for Securing your Ray Cluster on GKE
0:05:35
Effective GPU Sharing Strategies in GKE
0:04:32
Serving Gemma on GKE on TPU using Jetstream
0:08:04
Improve Resource Obtainability (GPUs, TPUs) with Dynamic Workload Scheduler on GCP
0:08:59
Reducing data pre-processing time by 95% using Ray
0:05:24
Serving Gemma on GKE using Nvidia TRT LLM and Triton Server
0:05:43
Serving Gemma on GKE using Text Generation Inference (TGI)
0:04:56
Serving Gemma on GKE using vLLM
0:19:50
Improve LLM accuracy and performance with Retrieval Augmented Generation
0:06:53
Monitoring ML Training Platform using Kueue Metrics and Cloud Monitoring
0:08:18
AI/ML on GKE: 2023 A Year in Review
0:16:41
Architecture of a ML Platform with Resource Sharing on Kubernetes
Вперёд