Web Deployment (5) - Testing & Deployment - Full Stack Deep Learning

preview_player
Показать описание
New course announcement ✨

We're teaching an in-person LLM bootcamp in the SF Bay Area on November 14, 2023. Come join us if you want to see the most up-to-date materials building LLM-powered products and learn in a hands-on environment.

Hope to see some of you there!

--------------------------------------------------------------------------------------------- How to deploy your models to the web?

Summary
- For web deployment, you need to be familiar with the concept of REST API.
- You can deploy the code to Virtual Machines, and then scale by adding instances.
- You can deploy the code as containers, and then scale via orchestration.
- You can deploy the code as a “server-less function.”
- You can deploy the code via a model serving solution.
- If you are making CPU inference, you can get away with scaling by launching more servers (Docker), or going serverless (AWS Lambda).
- If you are using GPU inference, things like TF Serving and Clipper become useful with features such as adaptive batching.
Рекомендации по теме
Комментарии
Автор

There’s a GCP service, called Cloud Run that let’s you run a container in a serverless fashion. I think you could use GPUs with it as well

robertocatapang