Strategies for Efficient LLM Deployments in Any Cluster -Angel M De Miguel Meana & Francisco Cabrera

preview_player
Показать описание

Strategies for Efficient LLM Deployments in Any Cluster - Angel M De Miguel Meana, VMware & Francisco Cabrera, Microsoft

Undoubtedly, Large Language Models (LLMs) are the technological advancement of 2023. These models show many capabilities, from chatting like a historical character to converting unstructured data into JSON format. However, their substantial size (GBs), resource demands, and the management complexity present considerable challenges. At the same time, Kubernetes has emerged as the de facto technology for orchestrating workloads, and LLMs are no exception. In this talk, we will explore multiple strategies to reduce the footprint of these models in your cluster, making it possible to move them from the cloud to the edge. We will answer questions like how to select the right model, reduce their size, and optimize resource utilization by running them in a lightweight environment provided by WebAssembly. The end goal is to find a balance between resource usage and quality. It is a challenge, but this ecosystem is moving fast, and new technologies, projects and models are emerging.
Рекомендации по теме