filmov
tv
ML-at-Scale '23 - LLM Batch Inference with Determined
Показать описание
Speaker: Corey Staten
In this talk we used Determined's Core API and Hugging Face Transformers to build and optimize batch inference workflows. We also discussed some advanced parallelization techniques, and showed how to achieve them using Determined's DeepSpeed integration. Warning: This session is code-heavy!
In this talk we used Determined's Core API and Hugging Face Transformers to build and optimize batch inference workflows. We also discussed some advanced parallelization techniques, and showed how to achieve them using Determined's DeepSpeed integration. Warning: This session is code-heavy!