How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

preview_player
Показать описание
Open-source LLMs are all the rage, along with concerns about data privacy with closed-source LLM APIs. This tutorial goes through how to deploy your own open-source LLM API Using Hugging Face + AWS.

Рекомендации по теме
Комментарии
Автор

Thank you. Better than the amazon videos prepared by 100 people. The only thing is how cheap is sagemaker😅

nat.serrano
Автор

Great tutorial that focuses on the aspect of practical deployment. I've come across multiple tutorials that involves Google Colab, which are great for testing things out but API access to the LLM is the thing we need for building practical applications.

I've few questions:
Questions:
1. What's the estimated cost per day, considering no change is made to the underlying infrastructure ?
2. Does the steps remain same for deploying other LLM's ?
3. How can we maintain the context (of the previous response) in conversation ?
4. How can we provide custom information through RAG ?

Thanks again for making this tutorial.

ajith_e
Автор

Great video!

I noticed that you’ve set the Lambda function timeout to three minutes. However, it’s triggered by the AWS API Gateway, which has a maximum timeout of 30 seconds. Therefore, if the Lambda function execution exceeds 30 seconds, its response will never be sent unless you’ve configured the response to be asynchronous. Just an observation I thought I’d share.

justcreate
Автор

informative video!
I have requested for the ml.m5.2xlarge instance - after 2 working days I will be given permission !!

VIVEKKUMAR-kxup
Автор

Why is that everyone skip the most important part of AWS service for automation which is how to create the Lambda code! Is there a resource about how to write or make the Lambda context/code?

MoThoughNo
Автор

Great video. I'm pretty unfamiliar with cloud, I just wanna make sure that I can get a LLM to service multiple endpoints for multiple users. If so how do I get to know the number of users that can be serviced

georgekuttysebastian
Автор

is there an easier way to do this with Ollama?

thecryptobeard
Автор

Can you make a video tutorial using nextjs with it?

brayancuenca
Автор

I'm deploying a LLava model (image + text), how can I invoke that?

CsabaTothMr
Автор

I received an error running the code above. UnexpectedStatusException Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check.

MichaelMcCrae
Автор

Would really help if you can provide an approximate cost of trying out this tutorial on AWS. Is there any info someone can share?

njrvyrs
Автор

Yes what about costs? Any easier platforms you have tried?

davidthiwa
Автор

Great video, thank you.
I can not find the source code deploying.ipynb in the repository
I appreciate adding a link to it here and in the description

mohammadkhair
Автор

the response generated by the model is of very limited characters how can adjust the model to generate more data. can anyone please help

Vijayakumar-nkfr
Автор

I am setting this up for the first time, can you share role config

arvindshelke
Автор

Only helpful if you already know what you are doing.

smokyboy
Автор

Amazon sagemaker is pretty complex and the ui horrible, any other ways to deploy? The compute model tried to charge me like 1000 dollars for the free usage. Because it spins up like 5 instances. Instance that don’t show up in the console directly, you have to open the separate sage maker instance viewer,

prestonmccauley
Автор

@buksa7257
0 seconds ago
Im having a bit trouble undestanding the following: i believe you're saying the lambda function is calling the endpoint of the sagemaker (the place where we stored the llm). But then who calls the lambda function? When is that function triggered? Does it need another endpoint?

buksa
Автор

Many thanks for the great video! Just wondering if you share your code / notebooks anywhere e.g. github etc

MichealAngeloArts
Автор

Bro I have a question kindly reply please Kindly suggest a channel which describes the NLP in detail from scratch mean total beginner friendly

indianhub