Deploy LLM to Production on Single GPU: REST API for Falcon 7B (with QLoRA) on Inference Endpoints

Показать описание

How to deploy a fine-tuned LLM (Falcon 7B) with QLoRA to production?

After training Falcon 7B with QLoRA on a custom dataset, the next step is deploying the model to production. In this tutorial, we'll use HuggingFace Inference Endpoints to build and deploy our model behind a REST API.

00:00 - Introduction
01:42 - Google Colab Setup
02:35 - Merge QLoRA adapter with Falcon 7B
05:22 - Push Model to HuggingFace Hub
09:20 - Inference with the Merged Model
11:31 - HuggingFace Inference Endpoints with Custom Handler
15:55 - Create Endpoint for the Deployment
18:20 - Test the Rest API
21:03 - Conclusion

Cloud image by macrovector-official

#chatgpt #gpt4 #llms #artificialintelligence #promptengineering #chatbot #transformers #python #pytorch

Рекомендации по теме

Комментарии

Shout out to him🎉 he’s always bringing us particle tutorials

DawnWillTurn

please try this deploying with chainlit or langchain locally

shivamkapoor

when you enable scale down to 0, is ther cold start? how long it takes to warm up ? in my experience in google cloud run it takes 20s to start the docker :D

marwentrabelsi

Can you make video on how to deploy falcon model with text generation interference server of hugging face

manojpatilm

do you need those custom config file changes for llama 2? Is this same process for llama2?

Ryan-yjsd

Considering your expertise in LLM, I want to consult you about fine tuning my own llm falcon in my corpus. Ant idea how to get this?

wilfredomartel

Does it cost more to deploy via hugging face than directly via AWS?

RonanMcGovern

can we deploy this model using langchain and chainlit for free the version you showed id paid?

shivamkapoor

I have created and merge but he given billow error when load from huggingface

Error :

HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.'

cannot start or end the name, max length is 96:

Please help

rwskfvs

do you need Google colab after you deploy the model ?

dabbler

Hey, Thanks for the tutorial... What's the speed its generating (token/sec)?

mrtipxm

Is there any way to load falcon7b model locally after training with custom dataset.
Thanks in advance

ommblqm

my man looks like frenchie from the boys

moqysqu

I can't find the text-tutorial from MLExper, and I subscribed : (, any ideas why?

DawnWillTurn

But OpenAI.emebding Ada002 cost $0.0001 / 1K tokens, which looks better

skarloti

can we deploy this model using langchain and chainlit for free the version you showed id paid?

shivamkapoor

Deploy LLM to Production on Single GPU: REST API for Falcon 7B (with QLoRA) on Inference Endpoints

Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference

Go Production: ⚡️ Super FAST LLM (API) Serving with vLLM !!!

Deploy FULLY PRIVATE & FAST LLM Chatbots! (Local + Production)

Deploy LLM to Production on Single GPU: REST API for Falcon 7B (with QLoRA) on Inference Endpoints

3-Langchain Series-Production Grade Deployment LLM As API With Langchain And FastAPI

Deploy LLM App as API Using Langserve Langchain

Developing and Serving RAG-Based LLM Applications in Production

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mist...

End to End Machine Learning Project Implementation using AWS Sagemaker

A Must for Deploying LLM applications to Production

Containerizing LLM-Powered Apps: Part 1 of the Chatbot Deployment

Deploying LLM in production at scale | Anurag Mishra and Puneet Narang | EY Delivery services India

Building a RAG Based LLM App And Deploying It In 20 Minutes

The Secret Sauce for Deploying LLM Applications into Production

Deploying LLM Chat Bot Models To Production

API For Open-Source Models 🔥 Easily Build With ANY Open-Source LLM

Building LLM Applications for Production // Chip Huyen // LLMs in Prod Conference

Langsmith LLMOPS Platform By Langchain-Debug ,Monitor And Build Production Grade LLM Application

LLM Deployment with NLP Models // Meryem Arik // LLMs in Production Conference Lightning Talk 2

Building Production-Grade LLM Apps

Building and Deploying LLM Applications with Apache Airflow

Deploy (Tiny) LLM to Production: Merge Lora Adapter, Push to HF Hub, Rest API with FastAPI & Doc...

LLMOps (LLM Bootcamp)

LLM in Practice: How to Productionize Your LLMs