Deploy ANY Open-Source LLM with Ollama on an AWS EC2 + GPU in 10 Min (Llama-3.1, Gemma-2 etc.)

Показать описание

In this video, I demonstrate how to set up and deploy a Llama 3.1 Phi Mistral Gemma 2 model using Olama on an AWS EC2 instance with GPU. Starting from scratch, I guide you through the entire process on AWS, including instance setup, selecting the appropriate AMI, configuring the instance, and setting up the environment with CUDA drivers. We also cover installing Go, cloning a simple Go server, configuring API keys, and securing the server for persistent deployment. By the end, you'll have a functional, customizable setup to run your own AI models efficiently and economically. Steps include selecting the appropriate instance type, setting up SSH, installing dependencies, running Olama, and securing the web service. Whether you're a developer looking to integrate AI or just starting, this tutorial will help you achieve a smooth deployment.

00:00 Introduction to Deploying Llama 3.1 Phi Mistral Gemma 2
00:52 Setting Up Your EC2 Instance
02:25 Configuring Your Instance and Storage
03:28 Connecting to Your Instance via SSH
04:08 Installing Dependencies and Cloning the Repository
05:05 Running the Model and Setting Up the Server
05:58 Configuring Security and Testing the Endpoint
07:33 Ensuring Server Persistence
08:53 Conclusion and Final Thoughts

Developers Digest

Рекомендации по теме

Комментарии

The best way to support this channel? Comment, like, and subscribe!

DevelopersDigest

Great concise presentation. Thank you so much!

hpongpong

for models at ~70b, i am getting timeout issues using vanilla ollama. It works with the first pull/run, but times out when i need to reload model. Do you have any recommendations for persistently keeping the same model running?

alejandrogallardo

maybe a dumb question. how do you turn the stream data you received into readable sentences

dylanv

This is very informative! Thanks :)

Curious why you used a g4dn.xlarge GPU ($300/month) instead of a t3.medium CPU ($30/month)? I assumed the 8 Billion parameter model was out of reach with regular hardware. What max model size works with the g4dn.xlarge GPU? To put into perspective, I have a $4K macbook (16gb ram) that can really only run the large (150 million) or medium (100 million parameter) sized model, which i think the t3.medium CPU on AWS can only run the 50 million param (small model).

danielgannage

Deploy ANY Open-Source LLM with Ollama on an AWS EC2 + GPU in 10 Min (Llama-3.1, Gemma-2 etc.)

API For Open-Source Models 🔥 Easily Build With ANY Open-Source LLM

OpenLLM: Fine-tune, Serve, Deploy, ANY LLMs with ease.

How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

Deploy and Use any Open Source LLMs using RunPod

Ep 28. How to Host Open-Source LLM Models

#3-Deployment Of Huggingface OpenSource LLM Models In AWS Sagemakers With Endpoints

Deploy ANY Open-Source LLM with Ollama on an AWS EC2 + GPU in 10 Min (Llama-3.1, Gemma-2 etc.)

Should You Use Open Source Large Language Models?

How To Build CV & AI Applications With Workflows - OpenCV Live! 143

Deploy LLM App as API Using Langserve Langchain

Deploy FULLY PRIVATE & FAST LLM Chatbots! (Local + Production)

Run ANY Open-Source Model LOCALLY (LM Studio Tutorial)

End To End LLM Project Using LLAMA 2- Open Source LLM Model From Meta

Deploy Open LLMs with LLAMA-CPP Server

Get Started with Langfuse - Open-Source LLM Monitoring

Run ANY Open-Source LLM Locally (No-Code LMStudio Tutorial)

Deploying Open Source LLM Model on RunPod Cloud with LangChain Tutorial

FREE Local LLMs on Apple Silicon | FAST!

Training and deploying open-source large language models

3-Langchain Series-Production Grade Deployment LLM As API With Langchain And FastAPI

Running a Hugging Face LLM on your laptop

Run Your Own LLM Locally: LLaMa, Mistral & More

Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem)

All You Need To Know About Running LLMs Locally