Boost Your AI System: Dockerized Ollama Servers on 4x Nvidia 3090 GPUs

Показать описание

Ready to scale your AI infrastructure? In this video, I’ll walk you through a step-by-step guide to setting up and load balancing multiple Ollama servers using Nginx and Docker. We’ll cover everything from configuring Docker containers to setting up Nginx as a robust load balancer. Whether you're an AI enthusiast or a seasoned developer, this tutorial will help you optimize your AI services for better performance and reliability.

🔧 **What You'll Learn:**
- Removing existing Ollama server setups
- Creating a dedicated Docker network
- Running multiple Ollama server instances with GPU allocation
- Configuring Nginx as a load balancer
- Testing the configuration to ensure seamless operation

📂 **Resources:**

**Complete Nginx Configuration File:**
```bash
mkdir -p ~/nginx-conf
```

```nginx
events { }

http {
upstream ollama_backend {
server ollama1:11434;
server ollama2:11434;
}

server {
listen 80;

location /ollama/ {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
}
```

**Docker Commands and Setup Instructions:**

```bash
docker stop ollama
docker rm ollama

docker network create ollama-network

docker run -d --gpus '"device=0,2"' -v ollama:/root/.ollama -p 11435:11434 --restart always --name ollama1 --network ollama-network ollama/ollama

docker run -d --gpus '"device=1,3"' -v ollama:/root/.ollama -p 11436:11434 --restart always --name ollama2 --network ollama-network ollama/ollama

```

Don't forget to like, comment, and subscribe for more tech tutorials and AI insights!