GPU Setup and Configuration

This guide covers GPU setup for the AI Service, including both Docker and local development configurations.

Overview

The AI Service supports GPU acceleration for faster inference:

CPU mode: Works everywhere, ~200-500ms per prediction
GPU mode: Requires NVIDIA GPU, ~50-150ms per prediction

GPU support is optional - the service automatically falls back to CPU if GPU is unavailable.

Quick GPU Status Check


# Check if GPU is available
just test-ai-setup
 
# Or manually
cd services/ai_service
uv run python -c "from src.core.models import check_gpu_available; check_gpu_available()"

Docker GPU Setup (Recommended)

Prerequisites

NVIDIA GPU with compute capability 3.5+
NVIDIA Driver installed on host
NVIDIA Container Toolkit (install instructions below)

Install NVIDIA Container Toolkit

Ubuntu/Debian:


# Add NVIDIA package repositories
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
 
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
 
# Install toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
 
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify installation:


# Test GPU access from Docker
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi

Running AI Service with GPU


# Using justfile
just up-ai-gpu
 
# Or with docker compose directly
docker compose -f docker-compose.yml -f docker-compose.gpu.yml --profile ai-only up -d
 
# Check GPU is detected
curl http://localhost:8080/health | python3 -m json.tool
# Look for "gpu_available": true

Verify GPU Detection


# Check service logs for GPU messages
docker compose logs ai_service | grep -i gpu
 
# Expected output:
# INFO:__main__:GPU available: True
# INFO:__main__:  GPU 0: /physical_device:GPU:0

Local Development GPU Setup

Requirements

NVIDIA GPU (GeForce, Quadro, Tesla)
CUDA Toolkit 12.3
cuDNN 8.9

Installation Steps

1. Check GPU and Driver


# Verify GPU is detected
nvidia-smi
 
# Should show GPU name, driver version, and CUDA version

2. Install CUDA Toolkit

Ubuntu 24.04:


# Remove old CUDA versions (if any)
sudo apt-get remove --purge cuda-* nvidia-cuda-toolkit
 
# Add NVIDIA package repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
 
# Install CUDA 12.3
sudo apt-get install -y cuda-toolkit-12-3
 
# Add to PATH
echo 'export PATH=/usr/local/cuda-12.3/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Verify CUDA:


nvcc --version
# Should show CUDA 12.3

3. Install cuDNN

Download cuDNN from NVIDIA Developer (requires free account)
- Version: 8.9.x for CUDA 12.3

Extract and install:


tar -xvf cudnn-linux-x86_64-8.9.x.x_cuda12-archive.tar.xz
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda-12.3/include
sudo cp cudnn-*-archive/lib/libcudnn* /usr/local/cuda-12.3/lib64
sudo chmod a+r /usr/local/cuda-12.3/include/cudnn*.h /usr/local/cuda-12.3/lib64/libcudnn*

4. Install TensorFlow with GPU Support


cd services/ai_service
 
# Install TensorFlow with CUDA support
uv add tensorflow[and-cuda]
 
# Or use the GPU-specific package
uv add tensorflow-gpu

5. Verify GPU Works


cd services/ai_service
 
# Test GPU detection
just test-ai-setup
 
# Should output:
# GPU available: True
# GPU devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

WSL2 GPU Support

Special Considerations

WSL2 supports GPU passthrough, but has specific requirements:

Windows 11 or Windows 10 (version 21H2+)
NVIDIA Driver installed on Windows (not in WSL)
WSL2 with GPU support enabled

Setup for WSL2


# Inside WSL2, verify GPU is visible
nvidia-smi
 
# Install CUDA toolkit (WSL2-specific)
sudo apt-get install -y nvidia-cuda-toolkit
 
# Verify
nvcc --version

Known Issues

Issue: TensorFlow reports CUDA_ERROR_NO_DEVICE even though nvidia-smi works

Cause: Version mismatch between:

NVIDIA Driver (Windows-side)
CUDA Toolkit (WSL2-side)
TensorFlow requirements

Solutions:

Use Docker (recommended for WSL2):


docker run --gpus all -it tensorflow/tensorflow:latest-gpu python -c \
  "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Or use CPU mode:
```
export USE_GPU=false
just dev-ai
```

Configuration

Environment Variables


# Enable/disable GPU
export USE_GPU=true   # Use GPU if available
export USE_GPU=false  # Force CPU mode
 
# GPU memory management
export TF_FORCE_GPU_ALLOW_GROWTH=true  # Allow dynamic memory allocation

GPU Memory Configuration

By default, TensorFlow tries to allocate all GPU memory. The AI Service configures memory growth to prevent OOM errors:


# Automatically configured in src/core/models/__init__.py
if gpus:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

Troubleshooting

GPU Not Detected in Docker

Check:


# Verify NVIDIA runtime is installed
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
 
# Check docker-compose GPU configuration
cat docker-compose.gpu.yml

Fix:


# Reinstall NVIDIA Container Toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

GPU Not Detected Locally

Check CUDA installation:


# Verify CUDA toolkit
nvcc --version
 
# Check library paths
echo $LD_LIBRARY_PATH
# Should include /usr/local/cuda-12.3/lib64
 
# Test CUDA directly
cd /usr/local/cuda-12.3/extras/demo_suite
./deviceQuery

Check TensorFlow:


python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
 
# If empty list, check:
python -c "import tensorflow as tf; print(tf.sysconfig.get_build_info())"
# Look for cuda_version and cudnn_version

Out of Memory Errors

Symptoms:

Service crashes during inference
TensorFlow OOM errors in logs

Solutions:

Enable memory growth (already configured):


tf.config.experimental.set_memory_growth(gpu, True)

Limit GPU memory:


tf.config.set_logical_device_configuration(
    gpu,
    [tf.config.LogicalDeviceConfiguration(memory_limit=4096)]  # 4GB
)

Use CPU for large batches:
```
export USE_GPU=false
```

Slow Inference on GPU