Self-Host and Deploy Local LLAMA-3 with NIMs

preview_player
Показать описание
In this video, I walk you through deploying Llama models using NVIDIA NIM. NVIDIA NIM uses microservices to enhance the deployment of various AI models, offering up to three times improvement in performance. I demonstrate how to set up an NVIDIA Launchpad, deploy the Llama 3 8 billion instruct version, and stress test it to see throughput. I also show you how to utilize OpenAI compatible API servers with NVIDIA NIM.

LINKS:

💻 RAG Beyond Basics Course:

Let's Connect:

Signup for Newsletter, localgpt:

TIMESTAMPS
00:00 Introduction to Deploying Large Language Models
00:13 Overview of NVIDIA NIM
01:02 Setting Up and Deploying a NIM
01:51 Accessing and Monitoring the GPU
03:39 Generating API Keys and Running Docker
05:36 Interacting with the Deployed Model
07:16 Stress Testing the API Endpoint
09:53 Using OpenAI Compatible API with NVIDIA NIM
12:32 Conclusion and Next Steps

All Interesting Videos:

Рекомендации по теме
Комментарии
Автор

It's not clear can I run NIM locally and get 5x in perfomance or not.

DearGeorge
Автор

i dnt have friend kind enough to give me acess to H100

zikwin
Автор

Hi, are you sure that inference speed on H100 is correct? Because on my RTX 4090 with Llama 3 Instruct 8B Q8_0 inference speed is about 72t/s, so you have lower speed than me

petergasparik
Автор

How do you get/build Grafana dashboard?

chirwatra
Автор

Is it possible to deploy the llama3 in sagemaker?, i mean able to download it as NIM and use with in sagemaker. Let me know if this work out?

AnilKumar-imur
Автор

Is this correct?
For production use, NIM is part of NVIDIA AI Enterprise, which has different pricing models:
- On Microsoft Azure, there's a promotional price of $1 per GPU per hour, though this is subject to change.
- For on-premises or other cloud deployments, NVIDIA AI Enterprise is priced at $4, 500 per year per GPU.

rousabout
Автор

Can you provide a benchmark comparison fortwhen using ollama server? I really want to see if the claimed performance improvement is actually there.

orlingueorguiev
Автор

Thanks ! what do you actually pay for, when buying NIM ?

Nihilvs
Автор

how is Nvidia optimize it for their software only? I'm curious what's the difference both using the same CUDA

RickySupriyadi
Автор

So I'll say this because evidently other people are too polite, but this is absolute garbage. Who has an H100 hanging around to do this? Don't post stuff that 99% of the people can't do. If you want to post stuff that only people with tens of thousands of dollars and access to this type of hardware can use, go work for one of those companies. Otherwise, you're wasting everybody's time.

eod