Scaling LLM Inference: AWS Inferentia Meets Ray Serve on EKS | Ray Summit 2024

Показать описание

The race for efficient, scalable AI inference is on, and AWS is at the forefront with innovative solutions. This session showcases how to achieve high-performance, cost-effective inference for large language models like Llama2 and Mistral-7B using Ray Serve and AWS Inferentia on Amazon EKS.

Vara Bonthu and Ratnopam Chakrabarti will guide you through the intricacies of building a scalable inference infrastructure that bypasses GPU availability constraints. They'll demonstrate how the synergy between Ray Serve, AWS Neuron SDK, and Karpenter autoscaler on Amazon EKS creates a powerful, flexible environment for AI workloads. Attendees will explore strategies for optimizing costs while maintaining high performance, opening new possibilities for deploying and scaling advanced language models in production environments.

--

Interested in more?

--

🔗 Connect with us:

Anyscale

Рекомендации по теме

Scaling LLM Inference: AWS Inferentia Meets Ray Serve on EKS | Ray Summit 2024

Scaling LLM Inference: AWS Inferentia Meets Ray Serve on EKS | Ray Summit 2024

AWS re:Invent 2023 - Scaling FM inference to hundreds of models with Amazon SageMaker (AIM327)

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

From PoC to Production: Deploying Gen AI workloads on AWS Inferentia | AWS Infrastructure Day 2024

How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

Deploying Llama3 with Inference Endpoints and AWS Inferentia2

Scaling ML Workloads on Amazon EC2 | Performance or Productivity in AI | Intel Innovation 2022

Machine Learning in 15: Amazon SageMaker High-Performance Inference at Low Cost

Build high-performance foundation models with AWS Trainium & Inferentia | AWS AI Infrastructure ...

🤗 Hugging Cast S2E1 - LLMs on AWS Trainium and Inferentia!

Large Scale Inference on Amazon EKS | Amazon Web Services

Machine Learning in 15: Getting Started With Generative AI Using Amazon SageMaker

Rustem Feyzkhanov - Leverage ML Inference for Generative AI Models on AWS

Deploying machine learning models for inference- AWS Virtual Workshop

Accelerating LLM Inference with vLLM

Trainium and Inferentia: AWS's A.I. Accelerators

Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

AWS Summit ASEAN 2023 | Scaling large language models with AWS Silicon (EMT302)

Deep Dive: Hugging Face models on AWS AI Accelerators

AWS re:Invent 2023 - Deploy gen AI apps efficiently at scale with serverless containers (CON303)

AWS re:Inforce 2024 - Securely accelerating generative AI innovation (SEC203-INT)

AWS re:Invent 2023 - Secure, private LLMs in your cloud with Anyscale Endpoints (AIM251)

Deploying Llama3 on Amazon SageMaker

How Amazon Search leverages PyTorch and its eco systems to build and deploy LLM into production