Skip to Content

GPU Setup and Configuration

This guide covers GPU setup for the AI Service, including both Docker and local development configurations.

Overview

The AI Service supports GPU acceleration for faster inference:

  • CPU mode: Works everywhere, ~200-500ms per prediction
  • GPU mode: Requires NVIDIA GPU, ~50-150ms per prediction

GPU support is optional - the service automatically falls back to CPU if GPU is unavailable.

Quick GPU Status Check

# Check if GPU is available just test-ai-setup # Or manually cd services/ai_service uv run python -c "from src.core.models import check_gpu_available; check_gpu_available()"

Prerequisites

  1. NVIDIA GPU with compute capability 3.5+
  2. NVIDIA Driver installed on host
  3. NVIDIA Container Toolkit (install instructions below)

Install NVIDIA Container Toolkit

Ubuntu/Debian:

# Add NVIDIA package repositories curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list # Install toolkit sudo apt-get update sudo apt-get install -y nvidia-container-toolkit # Configure Docker sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Verify installation:

# Test GPU access from Docker docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi

Running AI Service with GPU

# Using justfile just up-ai-gpu # Or with docker compose directly docker compose -f docker-compose.yml -f docker-compose.gpu.yml --profile ai-only up -d # Check GPU is detected curl http://localhost:8080/health | python3 -m json.tool # Look for "gpu_available": true

Verify GPU Detection

# Check service logs for GPU messages docker compose logs ai_service | grep -i gpu # Expected output: # INFO:__main__:GPU available: True # INFO:__main__: GPU 0: /physical_device:GPU:0

Local Development GPU Setup

Requirements

  • NVIDIA GPU (GeForce, Quadro, Tesla)
  • CUDA Toolkit 12.3
  • cuDNN 8.9

Installation Steps

1. Check GPU and Driver

# Verify GPU is detected nvidia-smi # Should show GPU name, driver version, and CUDA version

2. Install CUDA Toolkit

Ubuntu 24.04:

# Remove old CUDA versions (if any) sudo apt-get remove --purge cuda-* nvidia-cuda-toolkit # Add NVIDIA package repository wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update # Install CUDA 12.3 sudo apt-get install -y cuda-toolkit-12-3 # Add to PATH echo 'export PATH=/usr/local/cuda-12.3/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc

Verify CUDA:

nvcc --version # Should show CUDA 12.3

3. Install cuDNN

  1. Download cuDNN from NVIDIA Developer  (requires free account)

    • Version: 8.9.x for CUDA 12.3
  2. Extract and install:

    tar -xvf cudnn-linux-x86_64-8.9.x.x_cuda12-archive.tar.xz sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda-12.3/include sudo cp cudnn-*-archive/lib/libcudnn* /usr/local/cuda-12.3/lib64 sudo chmod a+r /usr/local/cuda-12.3/include/cudnn*.h /usr/local/cuda-12.3/lib64/libcudnn*

4. Install TensorFlow with GPU Support

cd services/ai_service # Install TensorFlow with CUDA support uv add tensorflow[and-cuda] # Or use the GPU-specific package uv add tensorflow-gpu

5. Verify GPU Works

cd services/ai_service # Test GPU detection just test-ai-setup # Should output: # GPU available: True # GPU devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

WSL2 GPU Support

Special Considerations

WSL2 supports GPU passthrough, but has specific requirements:

  1. Windows 11 or Windows 10 (version 21H2+)
  2. NVIDIA Driver installed on Windows (not in WSL)
  3. WSL2 with GPU support enabled

Setup for WSL2

# Inside WSL2, verify GPU is visible nvidia-smi # Install CUDA toolkit (WSL2-specific) sudo apt-get install -y nvidia-cuda-toolkit # Verify nvcc --version

Known Issues

Issue: TensorFlow reports CUDA_ERROR_NO_DEVICE even though nvidia-smi works

Cause: Version mismatch between:

  • NVIDIA Driver (Windows-side)
  • CUDA Toolkit (WSL2-side)
  • TensorFlow requirements

Solutions:

  1. Use Docker (recommended for WSL2):

    docker run --gpus all -it tensorflow/tensorflow:latest-gpu python -c \ "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
  2. Or use CPU mode:

    export USE_GPU=false just dev-ai

Configuration

Environment Variables

# Enable/disable GPU export USE_GPU=true # Use GPU if available export USE_GPU=false # Force CPU mode # GPU memory management export TF_FORCE_GPU_ALLOW_GROWTH=true # Allow dynamic memory allocation

GPU Memory Configuration

By default, TensorFlow tries to allocate all GPU memory. The AI Service configures memory growth to prevent OOM errors:

# Automatically configured in src/core/models/__init__.py if gpus: for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True)

Troubleshooting

GPU Not Detected in Docker

Check:

# Verify NVIDIA runtime is installed docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi # Check docker-compose GPU configuration cat docker-compose.gpu.yml

Fix:

# Reinstall NVIDIA Container Toolkit sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

GPU Not Detected Locally

Check CUDA installation:

# Verify CUDA toolkit nvcc --version # Check library paths echo $LD_LIBRARY_PATH # Should include /usr/local/cuda-12.3/lib64 # Test CUDA directly cd /usr/local/cuda-12.3/extras/demo_suite ./deviceQuery

Check TensorFlow:

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" # If empty list, check: python -c "import tensorflow as tf; print(tf.sysconfig.get_build_info())" # Look for cuda_version and cudnn_version

Out of Memory Errors

Symptoms:

  • Service crashes during inference
  • TensorFlow OOM errors in logs

Solutions:

  1. Enable memory growth (already configured):

    tf.config.experimental.set_memory_growth(gpu, True)
  2. Limit GPU memory:

    tf.config.set_logical_device_configuration( gpu, [tf.config.LogicalDeviceConfiguration(memory_limit=4096)] # 4GB )
  3. Use CPU for large batches:

    export USE_GPU=false

Slow Inference on GPU

Check:

# Monitor GPU utilization nvidia-smi -l 1 # Should show: # - GPU utilization > 0% # - Memory usage increasing during inference

Common issues:

  • Data transfer overhead (minimize CPU↔GPU copies)
  • Small batch sizes (GPU works best with batches)
  • Mixed precision not enabled

Performance Benchmarks

Expected Performance

ConfigurationStartup TimeInference TimeThroughput
CPU (4 vCPU)60-90s200-500ms2-5 req/s
GPU (RTX 3080)60-90s50-150ms6-20 req/s
GPU (T4)60-90s100-200ms5-10 req/s

Benchmark Script

import time import requests # Warm up for _ in range(5): requests.post("http://localhost:8080/api/v1/diagnosis/predict", ...) # Benchmark times = [] for _ in range(100): start = time.time() response = requests.post("http://localhost:8080/api/v1/diagnosis/predict", ...) times.append(time.time() - start) print(f"Mean: {sum(times)/len(times)*1000:.1f}ms") print(f"P50: {sorted(times)[50]*1000:.1f}ms") print(f"P95: {sorted(times)[95]*1000:.1f}ms")

Production Recommendations

For AWS Fargate

  • Use CPU-only images (cheaper, more reliable)
  • GPU not supported on Fargate
  • 2 vCPU / 4GB RAM sufficient for most workloads

For EC2 GPU Instances

  • g4dn.xlarge or larger (NVIDIA T4 GPU)
  • Use Docker with nvidia-runtime
  • Configure autoscaling based on GPU utilization
  • Monitor GPU temperature and throttling

For Local Development

  • Use Docker for GPU (avoids CUDA/cuDNN version issues)
  • Keep model files on fast SSD
  • Use CPU mode for testing, GPU for performance validation

References

Last updated on