Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes

Показать описание

The open source NVIDIA TensorRT Inference Server is production‑ready software that simplifies deployment of AI models for speech recognition, natural language processing, recommendation systems, object detection, and more. It integrates with NGINX, Kubernetes, and Kubeflow for a complete solution for real‑time and offline data center AI inference. It can run inference on GPUs and CPUs. It supports all popular AI frameworks and maximizes GPU utilization by serving multiple models per GPU and dynamically batching client requests, which is crucial to avoiding under‑ or over‑provisioning and managing costs.

In this session, Davide:

Shows how TRTIS simplifies AI deployment in production environments based in the data center, cloud, or edge.
Shares best practices and a sample deployment
Explores integration with Kubernetes, Kubeflow, Prometheus, Kubernetes autoscaling, gRPC, and the NGINX load balancer.

NGINX

Рекомендации по теме

Комментарии

This is EXACTLY what I needed. Give this man a cookie 🎉

MarxOrx

Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes

Building and deploying AI applications and systems at scale

Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes

Building and deploying AI applications and systems at scale, Ben Lorica and Roger Chen

Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference

Deploying an application with Generative AI best practices

Deploying And Scaling Microservices • Sam Newman • GOTO 2016

The Best Way to Deploy AI Models (Inference Endpoints)

Top 5 Most-Used Deployment Strategies

End To End Machine Learning | ML Model Deployment | Part VI #10hoursofml

Deploying ML Models in Production: An Overview

Scale AI On How to Build and Deploy Custom, Enterprise Grade Large Language Model Apps

Introducing Deep Java Library for building and deploying AI applications in Java

Everything You NEED to Know About WEB APP Architecture

AWS re:Invent 2023 - Deploy gen AI apps efficiently at scale with serverless containers (CON303)

Deploy ML model in 10 minutes. Explained

Google Vertex AI: Build, Deploy, and Scale ML Models

Is Your Enterprise AI-Ready? Deploying and Managing AI-First Applications at Scale

Deploying the Ultimate GPU Acceleration Tech Stack to Scale AI, Sciences & HPC

Deploying Generative AI in Production with NVIDIA NIM

What It Actually Takes to Deploy GenAI Applications to Enterprises Custom Evaluation Models

OnDemand: All-In-One AI Platform - Integrate LLMs, Plugins, and Create AI Agents/Apps!

Deploying and Scaling Cassandra - Composure.ai

NVIDIA and AWS Explain How to Optimize, Deploy and Scale Edge AI and Video Analytics Apps (Preview)

The Challenges of Model Training vs Deployment