Все публикации

Serving Very Large Models on K8s with Leader Worker Set

Access GCP Resources Securely with Workload Identity Federation for GKE

Serve Llama 3.1 405B on Kubernetes on Multi Host GPUs

GKE Time Sharing for GPUs

GPU Sharing in GKE with NVIDIA MPS

Improve Infrastructure Autoscaling with Custom Compute Classes in GKE

GPU Sharing on GKE with Multi Instance GPU

Different ways of Running RayJob on Kubernetes

Simplify Kuberay with Ray Operator on GKE

GKE Multi Tenancy with Teams

Fleet Level Feature Management with Feature Manager

Build Internal Developer Platforms on GKE using GKE Enterprise

Tips for Securing your Ray Cluster on GKE

Effective GPU Sharing Strategies in GKE

Serving Gemma on GKE on TPU using Jetstream

Improve Resource Obtainability (GPUs, TPUs) with Dynamic Workload Scheduler on GCP

Reducing data pre-processing time by 95% using Ray

Serving Gemma on GKE using Nvidia TRT LLM and Triton Server

Serving Gemma on GKE using Text Generation Inference (TGI)

Serving Gemma on GKE using vLLM

Improve LLM accuracy and performance with Retrieval Augmented Generation

Monitoring ML Training Platform using Kueue Metrics and Cloud Monitoring

AI/ML on GKE: 2023 A Year in Review

Architecture of a ML Platform with Resource Sharing on Kubernetes