Strategies for Efficient LLM Deployments in Any Cluster -Angel M De Miguel Meana & Francisco Cabrera

Показать описание

Strategies for Efficient LLM Deployments in Any Cluster - Angel M De Miguel Meana, VMware & Francisco Cabrera, Microsoft

Undoubtedly, Large Language Models (LLMs) are the technological advancement of 2023. These models show many capabilities, from chatting like a historical character to converting unstructured data into JSON format. However, their substantial size (GBs), resource demands, and the management complexity present considerable challenges. At the same time, Kubernetes has emerged as the de facto technology for orchestrating workloads, and LLMs are no exception. In this talk, we will explore multiple strategies to reduce the footprint of these models in your cluster, making it possible to move them from the cloud to the edge. We will answer questions like how to select the right model, reduce their size, and optimize resource utilization by running them in a lightweight environment provided by WebAssembly. The end goal is to find a balance between resource usage and quality. It is a challenge, but this ecosystem is moving fast, and new technologies, projects and models are emerging.

CNCF [Cloud Native Computing Foundation]

Рекомендации по теме

Strategies for Efficient LLM Deployments in Any Cluster -Angel M De Miguel Meana & Francisco Cabrera

Strategies for Efficient LLM Deployments in Any Cluster -Angel M De Miguel Meana & Francisco Cab...

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference

All LLM Deployment explained in 12 minutes!

Navigating LLM Deployment Tips, Tricks and Techniques

LLM Optimization Part 4 - 5 Techniques to reduce cost of LLM implementation

Frugal GPT 3 Strategies or Steps to Reduce LLM Inference cost

A Survey of Techniques for Maximizing LLM Performance

LLM Deployment Tips | Q3 2024 | AI Quarterly | Intel Software

LLM Strategy and Career Perspectives

Building the NeurIPS LLM Efficiency Challenge Leaderboard

LLM Tips, AI Leaderboard, Chatbot Guide, & System Strategy | Q3 2024 | AI Quarterly | Intel Soft...

My Secret 'Intern' Hack for LLM Use Cases

The REAL cost of LLM (And How to reduce 78%+ of Cost)

LLM Explained | What is LLM

How to Accelerate Generative AI & LLM Deployment

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kub... D. Gray

Deploying an LLM For $9 - Qwen 2

Boost LLM Efficiency on CPUs: Simplified Inference Techniques for Optimal Performance

There will not be one #LLM to rule them all | Big Ideas In App Architecture #podcast

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mist...

Accelerate LLM fine tuning and production deployment with NVIDIA NIM and Domino

Databricks' vLLM Optimization for Cost-Effective LLM Inference | Ray Summit 2024

Optimizing LLM Training with Airbnb's Next-Gen ML Platform