Parallel inferencing with KServe Ray integration

preview_player
Показать описание
KServe is a Opensource production-ready model inference framework on Kubernetes utilizing many knative's features such as routing for canary traffic and payload logging. However, the one model per container paradigm limits the concurrency and throughput when sending multiple inference requests. With RayServe integration, a model can be deployed as individual Python workers allowing for parallel inference. This enables concurrent inference requests to be processed simultaneously, improving overall efficiency. In this talk, we will share how you can configure, run, and scale machine learning models in Kubernetes using KServe and Ray.

About Anyscale
---
Anyscale is the AI Application Platform for developing, running, and scaling AI.

If you're interested in a managed Ray service, check out:

About Ray
---
Ray is the most popular open source framework for scaling and productionizing AI workloads. From Generative AI and LLMs to computer vision, Ray powers the world’s most ambitious AI workloads.

#llm #machinelearning #ray #deeplearning #distributedsystems #python #genai
Рекомендации по теме
Комментарии
Автор

I thought kserve only uses ray serve as a package which would behave more like multiprocessing package? I didn't notice installing Ray Cluster when installing the kserve manifest

mmoya