Deploying Serverless Inference Endpoints

preview_player
Показать описание

VIDEO RESOURCES:

OTHER TRELIS LINKS:

TIMESTAMPS:
0:00 Deploying a Serverless API Endpoint
0:17 Serverless Demo
1:19 Video Overview
1:44 Serverless Use Cases
3:31 Setting up a Serverless API
13:31 Inferencing a Serverless Endpoint
17:55 Serverless Costs versus GPU Rental
20:02 Accessing Instructions and Scripts
Рекомендации по теме
Комментарии
Автор

Your videos are a treasure in this space.

AbdennacerAyeb
Автор

Brother I scoured the whole internet for this explanation, you just rock

acidrain
Автор

Hi Ronan. Honestly never heard of you but YouTube recommended me, I assume because I've been looking into this recently and have had a RunPod account for about 4 months now. Great info I just subbed and I'll check out your Repo. This is a perfect way for creating a testing ground for my new SAAS while keeping the initial costs down. Cheers!

truthwillout
Автор

Clear and straightforward turorial, once again. Thanks!

walterpark
Автор

amen, this so valuable content ! thanks for sharing this, please continue and KUTGW !

gamawoodev
Автор

very helpful and informative video. thank you

ArdeniusYT
Автор

Request to provide Student discount code for the buying repo access. Thanks !

abhisheksingh
Автор

Great video — as always! This seems like it could work for ‘offline’ inference? I.e. if we want to expose an endpoint to allow mass text summarisation, extraction etc, as a batch job? (Latency isn’t important)

joshuaswords
Автор

Thanks for the video! I just spoke to RunPod support and they said that you pay for an active worker whether it is running (i.e. processing API endpoint requests) or not. But in your video at 18:07 you mention that you only pay when the worker is actually running (processing requests).

I'm a bit confused now as this means for a single worker GPU costing $0.00019 p/s this would be approx. $345 p/m! For a new web app without much traffic or revenue this would be too expensive to run.

Would you mind clarifying the severless costs again please? And in-particular if you have run serveless endpoints with a single active worker, and if so, were you charged only when the active worker is processing requests or 24/7 even when it is idle?

This is a direct quote from RunPod support: "To clarify, the cost is incurred for each active worker, and it applies while the worker is active, even if it's not actively processing requests. The worker is dedicated to your endpoint and cannot be shared with others, so you incur a cost as long as the worker is running, not just when it is processing requests."

davidgwyer
Автор

what vs code theme are you using? The light green color looks good to me on 14:06

xtu
Автор

In the "Container Disk" setting, you mentioned that it needed to be "as large as the largest weight". In the case of Mistral Instruct, why does the Storage Volume size need to be 15GB, but the Container Disk size is 10GB?

timlee
Автор

what if i have my own gpus and have a vllm, what be the best way for serverless inference endpoint

Wanderlust