Self-Host and Deploy Local LLAMA-3 with NIMs

Показать описание

In this video, I walk you through deploying Llama models using NVIDIA NIM. NVIDIA NIM uses microservices to enhance the deployment of various AI models, offering up to three times improvement in performance. I demonstrate how to set up an NVIDIA Launchpad, deploy the Llama 3 8 billion instruct version, and stress test it to see throughput. I also show you how to utilize OpenAI compatible API servers with NVIDIA NIM.

LINKS:

💻 RAG Beyond Basics Course:

Let's Connect:

Signup for Newsletter, localgpt:

TIMESTAMPS
00:00 Introduction to Deploying Large Language Models
00:13 Overview of NVIDIA NIM
01:02 Setting Up and Deploying a NIM
01:51 Accessing and Monitoring the GPU
03:39 Generating API Keys and Running Docker
05:36 Interacting with the Deployed Model
07:16 Stress Testing the API Endpoint
09:53 Using OpenAI Compatible API with NVIDIA NIM
12:32 Conclusion and Next Steps

All Interesting Videos:

Рекомендации по теме

Комментарии

It's not clear can I run NIM locally and get 5x in perfomance or not.

DearGeorge

i dnt have friend kind enough to give me acess to H100

zikwin

Hi, are you sure that inference speed on H100 is correct? Because on my RTX 4090 with Llama 3 Instruct 8B Q8_0 inference speed is about 72t/s, so you have lower speed than me

petergasparik

How do you get/build Grafana dashboard?

chirwatra

Is it possible to deploy the llama3 in sagemaker?, i mean able to download it as NIM and use with in sagemaker. Let me know if this work out?

AnilKumar-imur

Is this correct?
For production use, NIM is part of NVIDIA AI Enterprise, which has different pricing models:
- On Microsoft Azure, there's a promotional price of $1 per GPU per hour, though this is subject to change.
- For on-premises or other cloud deployments, NVIDIA AI Enterprise is priced at $4, 500 per year per GPU.

rousabout

Can you provide a benchmark comparison fortwhen using ollama server? I really want to see if the claimed performance improvement is actually there.

orlingueorguiev

Thanks ! what do you actually pay for, when buying NIM ?

Nihilvs

how is Nvidia optimize it for their software only? I'm curious what's the difference both using the same CUDA

RickySupriyadi

So I'll say this because evidently other people are too polite, but this is absolute garbage. Who has an H100 hanging around to do this? Don't post stuff that 99% of the people can't do. If you want to post stuff that only people with tens of thousands of dollars and access to this type of hardware can use, go work for one of those companies. Otherwise, you're wasting everybody's time.

eod

Self-Host and Deploy Local LLAMA-3 with NIMs

Self-Host and Deploy Local LLAMA-3 with NIMs

host ALL your AI locally

Llama 3 8B: BIG Step for Local AI Agents! - Full Tutorial (Build Your Own Tools)

How to Install and test LLaMA 3 Locally [2024]

How to Run Llama 3 Locally on your Computer (Ollama, LM Studio)

'I want Llama3 to perform 10x with my private knowledge' - Local Agentic RAG w/ llama3

Run Your Own LLM Locally: LLaMa, Mistral & More

This Llama 3 is powerful and uncensored, let’s run it

Build Anything with Llama 3 Agents, Here’s How

How to Run Llama 3.1 Locally on your computer? (Ollama, LM Studio)

Fully local RAG agents with Llama 3.1

Llama 3.1 is ACTUALLY really good! (and open source)

Run your own AI (but private)

FINALLY! Open-Source 'LLaMA Code' Coding Assistant (Tutorial)

This new AI is powerful and uncensored… Let’s run it

Zuck's new Llama is a beast

How To Run Llama 3 8B, 70B Models On Your Laptop (Free)

How to Download Llama 3 Models (8 Easy Ways to access Llama-3)!!!!

API For Open-Source Models 🔥 Easily Build With ANY Open-Source LLM

LLaMA 3 Tested!! Yes, It’s REALLY That GREAT

PrivateGPT 2.0 - FULLY LOCAL Chat With Docs (PDF, TXT, HTML, PPTX, DOCX, and more)

Build Anything with Llama 3.1 Agents, Here’s How

Aider + Llama 3.1: Develop a Full-stack App Without Writing ANY Code!

Groq+Streamlit: Summarize VIDEOS in seconds with this Llama-3 based 100% LOCAL & FREE Tool!