ML-at-Scale '23 - LLM Batch Inference with Determined

preview_player
Показать описание
Speaker: Corey Staten
In this talk we used Determined's Core API and Hugging Face Transformers to build and optimize batch inference workflows. We also discussed some advanced parallelization techniques, and showed how to achieve them using Determined's DeepSpeed integration. Warning: This session is code-heavy!
Рекомендации по теме