Deploying Serverless Inference Endpoints

Показать описание

VIDEO RESOURCES:

OTHER TRELIS LINKS:

TIMESTAMPS:
0:00 Deploying a Serverless API Endpoint
0:17 Serverless Demo
1:19 Video Overview
1:44 Serverless Use Cases
3:31 Setting up a Serverless API
13:31 Inferencing a Serverless Endpoint
17:55 Serverless Costs versus GPU Rental
20:02 Accessing Instructions and Scripts

Рекомендации по теме

Комментарии

Your videos are a treasure in this space.

AbdennacerAyeb

Brother I scoured the whole internet for this explanation, you just rock

acidrain

Hi Ronan. Honestly never heard of you but YouTube recommended me, I assume because I've been looking into this recently and have had a RunPod account for about 4 months now. Great info I just subbed and I'll check out your Repo. This is a perfect way for creating a testing ground for my new SAAS while keeping the initial costs down. Cheers!

truthwillout

Clear and straightforward turorial, once again. Thanks!

walterpark

amen, this so valuable content ! thanks for sharing this, please continue and KUTGW !

gamawoodev

very helpful and informative video. thank you

ArdeniusYT

Request to provide Student discount code for the buying repo access. Thanks !

abhisheksingh

Great video — as always! This seems like it could work for ‘offline’ inference? I.e. if we want to expose an endpoint to allow mass text summarisation, extraction etc, as a batch job? (Latency isn’t important)

joshuaswords

Thanks for the video! I just spoke to RunPod support and they said that you pay for an active worker whether it is running (i.e. processing API endpoint requests) or not. But in your video at 18:07 you mention that you only pay when the worker is actually running (processing requests).

I'm a bit confused now as this means for a single worker GPU costing $0.00019 p/s this would be approx. $345 p/m! For a new web app without much traffic or revenue this would be too expensive to run.

Would you mind clarifying the severless costs again please? And in-particular if you have run serveless endpoints with a single active worker, and if so, were you charged only when the active worker is processing requests or 24/7 even when it is idle?

This is a direct quote from RunPod support: "To clarify, the cost is incurred for each active worker, and it applies while the worker is active, even if it's not actively processing requests. The worker is dedicated to your endpoint and cannot be shared with others, so you incur a cost as long as the worker is running, not just when it is processing requests."

davidgwyer

what vs code theme are you using? The light green color looks good to me on 14:06

xtu

In the "Container Disk" setting, you mentioned that it needed to be "as large as the largest weight". In the case of Mistral Instruct, why does the Storage Volume size need to be 15GB, but the Container Disk size is 10GB?

timlee

what if i have my own gpus and have a vllm, what be the best way for serverless inference endpoint

Wanderlust

Deploying Serverless Inference Endpoints

Deploying Serverless Inference Endpoints

How I deploy serverless containers for free

Introduction to Amazon SageMaker Serverless Inference | Concepts & Code examples

AWS re:Invent 2021 - Serverless Inference on SageMaker! FOR REAL!

Serverless was a big mistake... says Amazon

AWS On Air ft. Amazon Sagemaker Serverless Inference

AWS On Air ft. Amazon SageMaker Serverless Inference | AWS Events

AWS re:Invent 2021 - {New Launch} Amazon SageMaker serverless inference (Preview)

AWS re:Invent 2022 - Deploy ML models for inference at high performance & low cost, ft AT&T ...

Deploy LLMs using Serverless vLLM on RunPod in 5 Minutes

AWS Summit DC 2022 - Amazon SageMaker Inference explained: Which style is right for you?

#3-Deployment Of Huggingface OpenSource LLM Models In AWS Sagemakers With Endpoints

Deploy Your ML Models to Production at Scale with Amazon SageMaker

Amazon SageMaker ML Inference | Amazon Web Services

AWS On Air San Fran Summit 2022 ft. Amazon SageMaker Serverless Inference

Hugging Face Inference Endpoints live launch event recorded on 9/27/22

AWS re:Invent 2020: How CATCH FASHION built a serverless ML inference service with AWS Lambda

Serverless ML Inference at Scale with Rust, ONNX Models on AWS Lambda + EFS

AWS re:Invent 2020: Deploying PyTorch models for inference using TorchServe

How to build ML Architecture with AWS SageMaker + Lambda + API Gateway | HANDS-ON TUTORIAL

Microservices Explained in 5 Minutes

🆕 HTTPS endpoints for your AWS Lambda functions with Function URLs

How to serve your ComfyUI model behind an API endpoint

Deploy ML model in 10 minutes. Explained