Optimizing FastAPI for Concurrent Users when Running Hugging Face ML Models

preview_player
Показать описание
To serve multiple concurrent users accessing FastAPI endpoint running Hugging Face API, you must start the FastAPI app with several workers. It will ensure current user requests will not be blocked if another request is already running. I show and describe it in this video.

Sparrow - data extraction from documents with ML:

0:00 Introduction
0:30 Concurrency
2:50 Problem Example
4:10 Code and Solution
6:10 Summary

CONNECT:
- Subscribe to this YouTube channel

#python #fastapi #machinelearning
Рекомендации по теме
Комментарии
Автор

Great video - how do you scale this to handle 500 requests per second with only 4 workers?

marka
Автор

FastAPI by default is multi-threaded, it runs in a threadpool. If you change your endpoints from "async def" just normal "def", then while you are running inference(Hugging Face API call), the get stats endpoint should return instantly.

juvewan
Автор

Hello,
what about running another python subprocess which extract data and waiting for a response, that shouldn't block the current thread.Or it's bad idea?

hodiks