How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

Показать описание

Open-source LLMs are all the rage, along with concerns about data privacy with closed-source LLM APIs. This tutorial goes through how to deploy your own open-source LLM API Using Hugging Face + AWS.

Рекомендации по теме

Комментарии

Thank you. Better than the amazon videos prepared by 100 people. The only thing is how cheap is sagemaker😅

nat.serrano

Great tutorial that focuses on the aspect of practical deployment. I've come across multiple tutorials that involves Google Colab, which are great for testing things out but API access to the LLM is the thing we need for building practical applications.

I've few questions:
Questions:
1. What's the estimated cost per day, considering no change is made to the underlying infrastructure ?
2. Does the steps remain same for deploying other LLM's ?
3. How can we maintain the context (of the previous response) in conversation ?
4. How can we provide custom information through RAG ?

Thanks again for making this tutorial.

ajith_e

Great video!

I noticed that you’ve set the Lambda function timeout to three minutes. However, it’s triggered by the AWS API Gateway, which has a maximum timeout of 30 seconds. Therefore, if the Lambda function execution exceeds 30 seconds, its response will never be sent unless you’ve configured the response to be asynchronous. Just an observation I thought I’d share.

justcreate

informative video!
I have requested for the ml.m5.2xlarge instance - after 2 working days I will be given permission !!

VIVEKKUMAR-kxup

Why is that everyone skip the most important part of AWS service for automation which is how to create the Lambda code! Is there a resource about how to write or make the Lambda context/code?

MoThoughNo

Great video. I'm pretty unfamiliar with cloud, I just wanna make sure that I can get a LLM to service multiple endpoints for multiple users. If so how do I get to know the number of users that can be serviced

georgekuttysebastian

is there an easier way to do this with Ollama?

thecryptobeard

Can you make a video tutorial using nextjs with it?

brayancuenca

I'm deploying a LLava model (image + text), how can I invoke that?

CsabaTothMr

I received an error running the code above. UnexpectedStatusException Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check.

MichaelMcCrae

Would really help if you can provide an approximate cost of trying out this tutorial on AWS. Is there any info someone can share?

njrvyrs

Yes what about costs? Any easier platforms you have tried?

davidthiwa

Great video, thank you.
I can not find the source code deploying.ipynb in the repository
I appreciate adding a link to it here and in the description

mohammadkhair

the response generated by the model is of very limited characters how can adjust the model to generate more data. can anyone please help

Vijayakumar-nkfr

I am setting this up for the first time, can you share role config

arvindshelke

Only helpful if you already know what you are doing.

smokyboy

Amazon sagemaker is pretty complex and the ui horrible, any other ways to deploy? The compute model tried to charge me like 1000 dollars for the free usage. Because it spins up like 5 instances. Instance that don’t show up in the console directly, you have to open the separate sage maker instance viewer,

prestonmccauley

@buksa7257
0 seconds ago
Im having a bit trouble undestanding the following: i believe you're saying the lambda function is calling the endpoint of the sagemaker (the place where we stored the llm). But then who calls the lambda function? When is that function triggered? Does it need another endpoint?

buksa

Many thanks for the great video! Just wondering if you share your code / notebooks anywhere e.g. github etc

MichealAngeloArts

Bro I have a question kindly reply please Kindly suggest a channel which describes the NLP in detail from scratch mean total beginner friendly

indianhub

How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

Deploy LLMs (Large Language Models) on AWS SageMaker using DLC

Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference

#3-Deployment Of Huggingface OpenSource LLM Models In AWS Sagemakers With Endpoints

Deploy FULLY PRIVATE & FAST LLM Chatbots! (Local + Production)

Go Production: ⚡️ Super FAST LLM (API) Serving with vLLM !!!

Ep 28. How to Host Open-Source LLM Models

API For Open-Source Models 🔥 Easily Build With ANY Open-Source LLM

Artificial Intelligence: Foundations of Large Language Models (LLMs)

Run Your Own LLM Locally: LLaMa, Mistral & More

Deploy Large Language Model (LLM) using Gradio as API | LLM Deployment

End To End LLM Project Using LLAMA 2- Open Source LLM Model From Meta

All You Need To Know About Running LLMs Locally

Running a Hugging Face LLM on your laptop

1-Click LLM Deployment!

Containerizing LLM-Powered Apps: Part 1 of the Chatbot Deployment

Introduction to large language models

Build and Deploy a Machine Learning App in 2 Minutes

End To End LLM Conversational Q&A Chatbot With Deployment

How ChatGPT Works Technically | ChatGPT Architecture

Building and Deploying LLM Applications with Apache Airflow

FREE Local LLMs on Apple Silicon | FAST!

LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners

Building Recommender Systems with Large Language Models // Sumit Kumar // LLMs in Production