AWS re:Invent 2021 - Serverless Inference on SageMaker! FOR REAL!

preview_player
Показать описание
At long last, Amazon SageMaker supports serverless endpoints. In this video, I demo this newly launched capability, named Serverless Inference.

Starting from a pre-trained DistilBERT model on the Hugging Face model hub, I fine-tune it for sentiment analysis on the IMDB movie review dataset. Then, I deploy the model to a serverless endpoint, and I run multi-threaded benchmarks with short and long token sequences. Finally, I plot latency numbers and compute latency quantiles.

*** Erratum: max concurrency factor is 50, not 40.

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️

Рекомендации по теме
Комментарии
Автор

You literally saved the day. I was super stuck with this serverless deployment. All the vids out there are pretty confusing as I am new to these cloud based works. Your instructions are super straight forward. Thanks

saniazahan
Автор

UnexpectedStatusException: Error hosting endpoint Failed. Reason: Ping failed due to insufficient memory. ( I can provide only up to 3GB of memory)

calendr
Автор

Hey, Thanks For the Video I had a query that What if I delete my Endpoint can I still be able to use my Model for predictions [ as the point of using serverless is to pay based on how much we used it for, but the endpoint cost based on time deployed ] I am very new to this so will be great to hear from you .
Thanks

dhiraj
Автор

Dear Julien, I deploy my ML models in Lambda directly with a container image. It works just fine. I am doing research if I need to opt into Serverless Inference. Did you notice any significant advantages of using Serverless Inference on SageMaker?

uunuu
Автор

I have a question, If I want to deploy a model untouched, do i still need to put it in s3 to make a serverless inference api? for instance)

learningprogram
Автор

Hi Julien, thanks a lot for your videos. I am pretty much new in this word and I have the task to understand how to deploy serverless using SageMaker. Will you do a new video about Serverless using Sagemaker? There is something new that we can use ? In the video you mention we needed to use boto3, is this still like this? Thanks a lot!

SantiagoBima
Автор

Do you have some reference to use serveless inference on detectron2, please? Thanks for support.

deidy
Автор

How do you use a SentenceTransformer using the HuggingFace estimator and predictor in an AWS serverless instance?

laser
Автор

awesome content as always, thanks! I have a few questions though, I wonder if you could help me understand it better:

- is it possible to use your own customized container? (instead of one of the predefined frameworks)
- what does “MaxConcurrency” number means? Number of maximum parallel jobs? What happens if this number is reached? Does sagemaker have a queue manager in that case? Can we config this?
- maybe a bit more general: what would be the advantage of using serverless sagemaker instead of, for example, AWS Batch or Fairgate?

luiztauffer
Автор

Is it possible to run a large model in a serverless way? Namely, Gpt-J-6B. You need something like 12 or 16 GB of GPU ram to run it, if I'm not mistaken.
I think SageMaker caps you at 6GB of memory in serverless inference. Is there another way? Thanks for your answer!

MrSchweppes
Автор

Merci pour la vidéo, Iron Maiden 4ever

BenjiBaret
Автор

Thanks for nice video! Will GPU inference be supported?

Mrluongduy
Автор

how much time does the serverless sagemaker endpoints to stay active once invoked?

JD-nzri