Part 1: Self Service Deployments of Nvidia RAG Application for Data Scientists and Developers

Показать описание

In this video, we will see how Gen AI developers and data scientists can easily deploy and use NVidia’s RAG application on infrastructure from various cloud providers such as AWS, Azure, GCP and OCI.

The Nvidia team develops and maintains various examples for showcasing how developers can implement RAG for Generative AI in this Git repo.

These examples showcase how users can perform RAG efficiently. They are heavily optimized by the Nvidia team to run on high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.

The Rafay team has collaborated with Nvidia to develop a reference template based on Rafay Environment Manager for our mutual customers.

With this template, developers and data scientists can deploy and experiment with the Linux based RAG application in a single click on the “Top-4” public clouds i.e. AWS, Azure, GCP and OCI.

Developer Experience

Let’s start by looking at the user experience for the developer or data scientist that wishes to try this out. Once the developer logs into Rafay, they are presented with a catalog of environments they can experience.

Let us search for the Nvidia RAG application. The user is presented with four options for the user, with each option optimized for one of the top-4 public clouds. For now, let’s pick AWS and proceed.

Let’s review what and how the developer can customize this environment. Clicking on the readme provides a quick summary for the developer. In this example, the platform team has configured the template in a manner so that the developer has to provide some basic input and can override some defaults. For example, they may wish to test the application on a specific instance type that has a certain type of Nvidia GPU.

Clicking on the source code button takes the developer to the Git repository where they can optionally look at the Infrastructure as Code backing the environment template.

Let’s now provide some inputs and launch an environment on AWS. This can take a few minutes to complete. Behind the scenes, the user-provided overrides are applied on top of the template to provision the environment.

Let’s follow the provisioning process:

First, a Nvidia GPU based ec2 instance is provisioned

Next, the llama LLM is downloaded and staged on the ec2 instance. Note that this is very large, multi GB file and can take a few minutes to download

Next, the Nvidia RAG application artifacts are downloaded from the NGX repository. Docker Compose is then used to dynamically build the required container images. The RAG application is then deployed and made operational on the ec2 instance.

As you can see, the environment has been successfully created. Let’s now figure out how the developer can access the application. Conveniently, Rafay provides the user with the exact URLs for this in the newly created environment. Let’s copy the URL for the RAG application and access it using a web browser.

The web application presents a Gen AI chat interface and is powered by a LLAMA 2 based LLM that we selected when we configured the environment. Let’s test if the application works by asking it a simple question and it provides us a response.

Now, let’s perform RAG. Let’s provide the LLM with new data by uploading a proprietary PDF file. The RAG application processes the uploaded PDF, creates embeddings and stores these in a vector database. Now, let’s ask the Gen AI application a question based on the PDF and notice that it is able to answer it accurately.

In summary, in this video we saw how easy it was for a developer or data scientist to easily configure and provision the Nvidia RAG application and associated infrastructure using Rafay Environment Manager.