GPU Setup and Configuration
This guide covers GPU setup for the AI Service, including both Docker and local development configurations.
Overview
The AI Service supports GPU acceleration for faster inference:
- CPU mode: Works everywhere, ~200-500ms per prediction
- GPU mode: Requires NVIDIA GPU, ~50-150ms per prediction
GPU support is optional - the service automatically falls back to CPU if GPU is unavailable.
Quick GPU Status Check
# Check if GPU is available
just test-ai-setup
# Or manually
cd services/ai_service
uv run python -c "from src.core.models import check_gpu_available; check_gpu_available()"Docker GPU Setup (Recommended)
Prerequisites
- NVIDIA GPU with compute capability 3.5+
- NVIDIA Driver installed on host
- NVIDIA Container Toolkit (install instructions below)
Install NVIDIA Container Toolkit
Ubuntu/Debian:
# Add NVIDIA package repositories
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart dockerVerify installation:
# Test GPU access from Docker
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smiRunning AI Service with GPU
# Using justfile
just up-ai-gpu
# Or with docker compose directly
docker compose -f docker-compose.yml -f docker-compose.gpu.yml --profile ai-only up -d
# Check GPU is detected
curl http://localhost:8080/health | python3 -m json.tool
# Look for "gpu_available": trueVerify GPU Detection
# Check service logs for GPU messages
docker compose logs ai_service | grep -i gpu
# Expected output:
# INFO:__main__:GPU available: True
# INFO:__main__: GPU 0: /physical_device:GPU:0Local Development GPU Setup
Requirements
- NVIDIA GPU (GeForce, Quadro, Tesla)
- CUDA Toolkit 12.3
- cuDNN 8.9
Installation Steps
1. Check GPU and Driver
# Verify GPU is detected
nvidia-smi
# Should show GPU name, driver version, and CUDA version2. Install CUDA Toolkit
Ubuntu 24.04:
# Remove old CUDA versions (if any)
sudo apt-get remove --purge cuda-* nvidia-cuda-toolkit
# Add NVIDIA package repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
# Install CUDA 12.3
sudo apt-get install -y cuda-toolkit-12-3
# Add to PATH
echo 'export PATH=/usr/local/cuda-12.3/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrcVerify CUDA:
nvcc --version
# Should show CUDA 12.33. Install cuDNN
-
Download cuDNN from NVIDIA Developer (requires free account)
- Version: 8.9.x for CUDA 12.3
-
Extract and install:
tar -xvf cudnn-linux-x86_64-8.9.x.x_cuda12-archive.tar.xz sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda-12.3/include sudo cp cudnn-*-archive/lib/libcudnn* /usr/local/cuda-12.3/lib64 sudo chmod a+r /usr/local/cuda-12.3/include/cudnn*.h /usr/local/cuda-12.3/lib64/libcudnn*
4. Install TensorFlow with GPU Support
cd services/ai_service
# Install TensorFlow with CUDA support
uv add tensorflow[and-cuda]
# Or use the GPU-specific package
uv add tensorflow-gpu5. Verify GPU Works
cd services/ai_service
# Test GPU detection
just test-ai-setup
# Should output:
# GPU available: True
# GPU devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]WSL2 GPU Support
Special Considerations
WSL2 supports GPU passthrough, but has specific requirements:
- Windows 11 or Windows 10 (version 21H2+)
- NVIDIA Driver installed on Windows (not in WSL)
- WSL2 with GPU support enabled
Setup for WSL2
# Inside WSL2, verify GPU is visible
nvidia-smi
# Install CUDA toolkit (WSL2-specific)
sudo apt-get install -y nvidia-cuda-toolkit
# Verify
nvcc --versionKnown Issues
Issue: TensorFlow reports CUDA_ERROR_NO_DEVICE even though nvidia-smi works
Cause: Version mismatch between:
- NVIDIA Driver (Windows-side)
- CUDA Toolkit (WSL2-side)
- TensorFlow requirements
Solutions:
-
Use Docker (recommended for WSL2):
docker run --gpus all -it tensorflow/tensorflow:latest-gpu python -c \ "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" -
Or use CPU mode:
export USE_GPU=false just dev-ai
Configuration
Environment Variables
# Enable/disable GPU
export USE_GPU=true # Use GPU if available
export USE_GPU=false # Force CPU mode
# GPU memory management
export TF_FORCE_GPU_ALLOW_GROWTH=true # Allow dynamic memory allocationGPU Memory Configuration
By default, TensorFlow tries to allocate all GPU memory. The AI Service configures memory growth to prevent OOM errors:
# Automatically configured in src/core/models/__init__.py
if gpus:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)Troubleshooting
GPU Not Detected in Docker
Check:
# Verify NVIDIA runtime is installed
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
# Check docker-compose GPU configuration
cat docker-compose.gpu.ymlFix:
# Reinstall NVIDIA Container Toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart dockerGPU Not Detected Locally
Check CUDA installation:
# Verify CUDA toolkit
nvcc --version
# Check library paths
echo $LD_LIBRARY_PATH
# Should include /usr/local/cuda-12.3/lib64
# Test CUDA directly
cd /usr/local/cuda-12.3/extras/demo_suite
./deviceQueryCheck TensorFlow:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
# If empty list, check:
python -c "import tensorflow as tf; print(tf.sysconfig.get_build_info())"
# Look for cuda_version and cudnn_versionOut of Memory Errors
Symptoms:
- Service crashes during inference
- TensorFlow OOM errors in logs
Solutions:
-
Enable memory growth (already configured):
tf.config.experimental.set_memory_growth(gpu, True) -
Limit GPU memory:
tf.config.set_logical_device_configuration( gpu, [tf.config.LogicalDeviceConfiguration(memory_limit=4096)] # 4GB ) -
Use CPU for large batches:
export USE_GPU=false
Slow Inference on GPU
Check:
# Monitor GPU utilization
nvidia-smi -l 1
# Should show:
# - GPU utilization > 0%
# - Memory usage increasing during inferenceCommon issues:
- Data transfer overhead (minimize CPU↔GPU copies)
- Small batch sizes (GPU works best with batches)
- Mixed precision not enabled
Performance Benchmarks
Expected Performance
| Configuration | Startup Time | Inference Time | Throughput |
|---|---|---|---|
| CPU (4 vCPU) | 60-90s | 200-500ms | 2-5 req/s |
| GPU (RTX 3080) | 60-90s | 50-150ms | 6-20 req/s |
| GPU (T4) | 60-90s | 100-200ms | 5-10 req/s |
Benchmark Script
import time
import requests
# Warm up
for _ in range(5):
requests.post("http://localhost:8080/api/v1/diagnosis/predict", ...)
# Benchmark
times = []
for _ in range(100):
start = time.time()
response = requests.post("http://localhost:8080/api/v1/diagnosis/predict", ...)
times.append(time.time() - start)
print(f"Mean: {sum(times)/len(times)*1000:.1f}ms")
print(f"P50: {sorted(times)[50]*1000:.1f}ms")
print(f"P95: {sorted(times)[95]*1000:.1f}ms")Production Recommendations
For AWS Fargate
- Use CPU-only images (cheaper, more reliable)
- GPU not supported on Fargate
- 2 vCPU / 4GB RAM sufficient for most workloads
For EC2 GPU Instances
- g4dn.xlarge or larger (NVIDIA T4 GPU)
- Use Docker with nvidia-runtime
- Configure autoscaling based on GPU utilization
- Monitor GPU temperature and throttling
For Local Development
- Use Docker for GPU (avoids CUDA/cuDNN version issues)
- Keep model files on fast SSD
- Use CPU mode for testing, GPU for performance validation