Optimizing FastAPI for Concurrent Users when Running Hugging Face ML Models

Показать описание

To serve multiple concurrent users accessing FastAPI endpoint running Hugging Face API, you must start the FastAPI app with several workers. It will ensure current user requests will not be blocked if another request is already running. I show and describe it in this video.

Sparrow - data extraction from documents with ML:

0:00 Introduction
0:30 Concurrency
2:50 Problem Example
4:10 Code and Solution
6:10 Summary

CONNECT:
- Subscribe to this YouTube channel

#python #fastapi #machinelearning

Рекомендации по теме

Комментарии

Great video - how do you scale this to handle 500 requests per second with only 4 workers?

marka

FastAPI by default is multi-threaded, it runs in a threadpool. If you change your endpoints from "async def" just normal "def", then while you are running inference(Hugging Face API call), the get stats endpoint should return instantly.

juvewan

Hello,
what about running another python subprocess which extract data and waiting for a response, that shouldn't block the current thread.Or it's bad idea?

hodiks

Optimizing FastAPI for Concurrent Users when Running Hugging Face ML Models

Optimizing FastAPI for Concurrent Users when Running Hugging Face ML Models

How to Make 2500 HTTP Requests in 2 Seconds with Async & Await

Performance tips by the FastAPI Expert — Marcelo Trylesinski

How FastAPI Handles Requests Behind the Scenes

Optimizing ML Model Loading Time Using LRU Cache in FastAPI 📈

Async in practice: how to achieve concurrency in FastAPI (and what to avoid doing!)

How I scaled a website to 10 million users (web-servers & databases, high load, and performance)

How REST APIs support upload of huge data and long running processes | Asynchronous REST API

Concurrent Request Handling comparison: Express Vs Fast API Vs Play Framework Vs Go Gin

Cache Systems Every Developer Should Know

Secret To Optimizing SQL Queries - Understand The SQL Execution Order

Async is used to optimize the execution of independent python function #shorts

FastAPI: Hitting the Performance Jackpot - Maciej Marzęta

PYTHON : How to do multiprocessing in FastAPI

How to make your Node.js API 5x faster!

Massively Speed Up Requests with HTTPX in Python

FastAPI with Docker in 1 Minute ⚡

Why do we need middleware in FASTapi?

RabbitMQ in 100 Seconds

How Much Memory for 1,000,000 Threads in 7 Languages | Go, Rust, C#, Elixir, Java, Node, Python

Want Faster HTTP Requests? Use A Session with Python!

Why node.js is the wrong choice for APIs (and what to use instead)

FastAPI Background Tasks for Non-Blocking Endpoints

Requests vs HTTPX vs Aiohttp | Which One to Pick?