AWS re:Invent 2021 - Serverless Inference on SageMaker! FOR REAL!

Показать описание

At long last, Amazon SageMaker supports serverless endpoints. In this video, I demo this newly launched capability, named Serverless Inference.

Starting from a pre-trained DistilBERT model on the Hugging Face model hub, I fine-tune it for sentiment analysis on the IMDB movie review dataset. Then, I deploy the model to a serverless endpoint, and I run multi-threaded benchmarks with short and long token sequences. Finally, I plot latency numbers and compute latency quantiles.

*** Erratum: max concurrency factor is 50, not 40.

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️

Рекомендации по теме

Комментарии

You literally saved the day. I was super stuck with this serverless deployment. All the vids out there are pretty confusing as I am new to these cloud based works. Your instructions are super straight forward. Thanks

saniazahan

UnexpectedStatusException: Error hosting endpoint Failed. Reason: Ping failed due to insufficient memory. ( I can provide only up to 3GB of memory)

calendr

Hey, Thanks For the Video I had a query that What if I delete my Endpoint can I still be able to use my Model for predictions [ as the point of using serverless is to pay based on how much we used it for, but the endpoint cost based on time deployed ] I am very new to this so will be great to hear from you .
Thanks

dhiraj

Dear Julien, I deploy my ML models in Lambda directly with a container image. It works just fine. I am doing research if I need to opt into Serverless Inference. Did you notice any significant advantages of using Serverless Inference on SageMaker?

uunuu

I have a question, If I want to deploy a model untouched, do i still need to put it in s3 to make a serverless inference api? for instance)

learningprogram

Hi Julien, thanks a lot for your videos. I am pretty much new in this word and I have the task to understand how to deploy serverless using SageMaker. Will you do a new video about Serverless using Sagemaker? There is something new that we can use ? In the video you mention we needed to use boto3, is this still like this? Thanks a lot!

SantiagoBima

Do you have some reference to use serveless inference on detectron2, please? Thanks for support.

deidy

How do you use a SentenceTransformer using the HuggingFace estimator and predictor in an AWS serverless instance?

laser

awesome content as always, thanks! I have a few questions though, I wonder if you could help me understand it better:

- is it possible to use your own customized container? (instead of one of the predefined frameworks)
- what does “MaxConcurrency” number means? Number of maximum parallel jobs? What happens if this number is reached? Does sagemaker have a queue manager in that case? Can we config this?
- maybe a bit more general: what would be the advantage of using serverless sagemaker instead of, for example, AWS Batch or Fairgate?

luiztauffer

Is it possible to run a large model in a serverless way? Namely, Gpt-J-6B. You need something like 12 or 16 GB of GPU ram to run it, if I'm not mistaken.
I think SageMaker caps you at 6GB of memory in serverless inference. Is there another way? Thanks for your answer!

MrSchweppes

Merci pour la vidéo, Iron Maiden 4ever

BenjiBaret

Thanks for nice video! Will GPU inference be supported?

Mrluongduy

how much time does the serverless sagemaker endpoints to stay active once invoked?

JD-nzri

AWS re:Invent 2021 - Serverless Inference on SageMaker! FOR REAL!

AWS re:Invent 2021 - Serverless security best practices

AWS re:Invent 2021 - Deep dive: Large-scale modernization to serverless in action

AWS re:Invent 2021 - {New Launch} Amazon SageMaker serverless inference (Preview)

AWS re:Invent 2021 - Accelerating your serverless journey with AWS Lambda

AWS re:Invent 2021 - What’s new in serverless

AWS re:Invent 2021 - {New Launch} Introducing Amazon Redshift Serverless

AWS re:Invent 2021 - Architecting your serverless applications for hyperscale [REPEAT]

AWS re:Invent 2021 - Getting started building your first serverless application

AWS re:Invent 2021 - Neiman Marcus and Waitrose: Utilizing serverless microservices

AWS re:Invent 2021 - Best practices of advanced serverless developers [REPEAT]

AWS re:Invent 2021 - Building real-world serverless applications with AWS SAM and Capital One

AWS re:Invent 2021 - {New Launch} Introducing Amazon EMR Serverless

AWS re:Invent 2021 - Build high-performance .NET serverless architectures on AWS

AWS re:Invent 2021 - Productizing a serverless MVP

AWS re:Invent 2021 - Best practices for building interactive applications with AWS Lambda

AWS re:Invent 2021 - Inside a working serverless SaaS reference solution

Demo of Amazon Redshift Serverless. re:Invent 2021 recaps

AWS re:Invent 2021 - Building a serverless banking as a service platform on AWS

AWS re:Invent 2021 - Using events and workflows to build distributed applications

AWS re:Invent 2021 - Instant and fine-grained scaling with Amazon Aurora Serverless v2

AWS re:Invent 2021 - AWS storage solutions for containers and serverless applications [REPEAT]

AWS re:Invent 2021 - Reinvent your business for the future with AWS Analytics

AWS re:Invent 2020: Getting started building your first serverless web application

AWS re:Invent 2021 - {New Launch} Introducing Amazon MSK Serverless