Deploy LLM to Production on Single GPU: REST API for Falcon 7B (with QLoRA) on Inference Endpoints

preview_player
Показать описание
How to deploy a fine-tuned LLM (Falcon 7B) with QLoRA to production?

After training Falcon 7B with QLoRA on a custom dataset, the next step is deploying the model to production. In this tutorial, we'll use HuggingFace Inference Endpoints to build and deploy our model behind a REST API.

00:00 - Introduction
01:42 - Google Colab Setup
02:35 - Merge QLoRA adapter with Falcon 7B
05:22 - Push Model to HuggingFace Hub
09:20 - Inference with the Merged Model
11:31 - HuggingFace Inference Endpoints with Custom Handler
15:55 - Create Endpoint for the Deployment
18:20 - Test the Rest API
21:03 - Conclusion

Cloud image by macrovector-official

#chatgpt #gpt4 #llms #artificialintelligence #promptengineering #chatbot #transformers #python #pytorch
Рекомендации по теме
Комментарии
Автор

Shout out to him🎉 he’s always bringing us particle tutorials

DawnWillTurn
Автор

please try this deploying with chainlit or langchain locally

shivamkapoor
Автор

when you enable scale down to 0, is ther cold start? how long it takes to warm up ? in my experience in google cloud run it takes 20s to start the docker :D

marwentrabelsi
Автор

Can you make video on how to deploy falcon model with text generation interference server of hugging face

manojpatilm
Автор

do you need those custom config file changes for llama 2? Is this same process for llama2?

Ryan-yjsd
Автор

Considering your expertise in LLM, I want to consult you about fine tuning my own llm falcon in my corpus. Ant idea how to get this?

wilfredomartel
Автор

Does it cost more to deploy via hugging face than directly via AWS?

RonanMcGovern
Автор

can we deploy this model using langchain and chainlit for free the version you showed id paid?

shivamkapoor
Автор

I have created and merge but he given billow error when load from huggingface

Error :

HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.'


cannot start or end the name, max length is 96:

Please help

rwskfvs
Автор

do you need Google colab after you deploy the model ?

dabbler
Автор

Hey, Thanks for the tutorial... What's the speed its generating (token/sec)?

mrtipxm
Автор

Is there any way to load falcon7b model locally after training with custom dataset.
Thanks in advance

ommblqm
Автор

my man looks like frenchie from the boys

moqysqu
Автор

I can't find the text-tutorial from MLExper, and I subscribed : (, any ideas why?

DawnWillTurn
Автор

But OpenAI.emebding Ada002 cost $0.0001 / 1K tokens, which looks better

skarloti
Автор

can we deploy this model using langchain and chainlit for free the version you showed id paid?

shivamkapoor