Docker Deployment Guide

This guide covers building and deploying the AI Service using Docker, both locally and on AWS Fargate.

Overview

The AI Service uses a flexible Docker setup that supports:

CPU-only inference (AWS Fargate, production)
GPU-accelerated inference (local development, EC2 GPU instances)
Official TensorFlow base images
Multi-stage builds for optimization
Non-root user for security

Quick Start

Local Development (CPU)


# From repository root
cd services/ai_service
 
# Build
docker build -t ai-service:latest .
 
# Run (mount your models directory)
docker run -p 8080:8080 \
  -v $(pwd)/../../models:/models:ro \
  -e MODEL_DIRECTORY=/models \
  -e DEFAULT_MODEL=gen2a \
  -e USE_GPU=false \
  ai-service:latest
 
# Test
curl http://localhost:8080/health

Docker Compose (Recommended)


# From repository root
mkdir -p models
 
# CPU-only mode (matches AWS Fargate)
docker compose up -d
 
# GPU-enabled mode (requires NVIDIA Container Toolkit)
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
 
# View logs
docker compose logs -f ai_service
 
# Stop
docker compose down

Building Images

CPU-Only (AWS Fargate Compatible)


docker build -t ai-service:cpu \
  --build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 \
  services/ai_service/

This is the default and recommended for production deployment on AWS Fargate.

Base Image: tensorflow/tensorflow:2.18.0

Official TensorFlow CPU-only image
Smaller image size (~1.5GB vs ~6GB for GPU)
Works on any Docker host (no GPU required)
Compatible with AWS Fargate

GPU-Enabled (Local Development)


docker build -t ai-service:gpu \
  --build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0-gpu \
  services/ai_service/

Base Image: tensorflow/tensorflow:2.18.0-gpu

Official TensorFlow GPU image with CUDA support
Requires NVIDIA GPU + NVIDIA Container Toolkit
Larger image size (~6GB)
For local development or EC2 GPU instances

AWS Fargate Deployment

Task Definition Configuration


{
  "family": "ai-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "2048",
  "memory": "4096",
  "containerDefinitions": [
    {
      "name": "ai-service",
      "image": "<ECR_REPO_URL>/ai-service:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "MODEL_DIRECTORY",
          "value": "/models"
        },
        {
          "name": "DEFAULT_MODEL",
          "value": "gen2a"
        },
        {
          "name": "USE_GPU",
          "value": "false"
        },
        {
          "name": "LOG_LEVEL",
          "value": "INFO"
        }
      ],
      "mountPoints": [
        {
          "sourceVolume": "models",
          "containerPath": "/models",
          "readOnly": true
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8080/health').read()\" || exit 1"],
        "interval": 30,
        "timeout": 10,
        "retries": 3,
        "startPeriod": 90
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/ai-service",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ],
  "volumes": [
    {
      "name": "models",
      "efsVolumeConfiguration": {
        "fileSystemId": "<EFS_FILE_SYSTEM_ID>",
        "rootDirectory": "/models",
        "transitEncryption": "ENABLED"
      }
    }
  ]
}

ECR Push Workflow


# Authenticate to ECR
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com
 
# Build for production
docker build -t ai-service:latest \
  --build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 \
  --platform linux/amd64 \
  services/ai_service/
 
# Tag
docker tag ai-service:latest <ECR_REPO_URL>/ai-service:latest
docker tag ai-service:latest <ECR_REPO_URL>/ai-service:$(git rev-parse --short HEAD)
 
# Push
docker push <ECR_REPO_URL>/ai-service:latest
docker push <ECR_REPO_URL>/ai-service:$(git rev-parse --short HEAD)

Model Management

Development (Volume Mount)


# Mount local models directory
docker run -v $(pwd)/models:/models:ro ai-service:latest

Directory Structure:


models/
├── gen2a/
│   ├── model/
│   │   └── best_model.h5
│   ├── mlb.pkl
│   ├── model_params.pkl
│   ├── continuous_mean.pkl
│   ├── scaler.pkl
│   └── raw_list_of_field.pkl
└── gen2i/
    └── ...

Production (AWS EFS)

Create EFS File System:


aws efs create-file-system \
  --performance-mode generalPurpose \
  --throughput-mode bursting \
  --encrypted \
  --tags Key=Name,Value=ai-service-models

Upload Models to EFS:


# From EC2 instance with EFS mounted
aws s3 sync s3://your-models-bucket/models/ /mnt/efs/models/

Configure Task Definition: See EFS volume configuration above

Alternative: S3 Sync on Startup

Create a custom entrypoint script:


#!/bin/bash
# entrypoint.sh
 
# Sync models from S3
if [ -n "$S3_MODELS_BUCKET" ]; then
  echo "Syncing models from S3..."
  aws s3 sync s3://${S3_MODELS_BUCKET}/models/ /models/
fi
 
# Start application
exec uvicorn src.main:app --host 0.0.0.0 --port 8080

Then update Dockerfile:


COPY entrypoint.sh /app/
RUN chmod +x /app/entrypoint.sh
CMD ["/app/entrypoint.sh"]

GPU Development (Local)

See the GPU Setup Guide for detailed instructions on setting up GPU support for local development.

Quick Start


# Install NVIDIA Container Toolkit (one-time setup)
# See GPU Setup Guide for detailed instructions
 
# Run with GPU
docker run --gpus all -p 8080:8080 \
  -v $(pwd)/models:/models:ro \
  -e USE_GPU=true \
  ai-service:gpu
 
# Or with Docker Compose
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

Environment Variables

Variable	Default	Description
`MODEL_DIRECTORY`	`/models`	Path to model files
`DEFAULT_MODEL`	`gen2a`	Model subdirectory name
`USE_GPU`	`true`	Enable GPU inference
`LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR)
`API_PREFIX`	`/api/v1`	API route prefix

Troubleshooting

Container Starts but Health Check Fails


# Check logs
docker compose logs ai_service
 
# Common issues:
# 1. Models not mounted correctly
docker compose exec ai_service ls -la /models
 
# 2. Model loading failure (check for all required pickle files)
docker compose exec ai_service ls -la /models/gen2a/
 
# 3. Service not listening on correct port
docker compose exec ai_service netstat -tlnp

Large Image Size

The GPU image is large (~6GB) due to CUDA dependencies. For CPU-only:


# Build CPU-only image (smaller)
docker build --build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 -t ai-service:cpu .
 
# Check size
docker images ai-service

Slow Startup on Fargate

TensorFlow model loading takes 60-90 seconds. Configure:

Health check startPeriod: 90 (allows time for model loading)
Use EFS with provisioned throughput for faster model loading
Consider model caching or warm containers

Out of Memory

Fargate minimum requirements:

CPU: 2 vCPU (2048)
Memory: 4 GB (4096)

Adjust based on model size and inference batch size.

Performance Tuning

CPU Optimization


# Set TensorFlow thread count
ENV OMP_NUM_THREADS=4
ENV TF_NUM_INTRAOP_THREADS=4
ENV TF_NUM_INTEROP_THREADS=2

Memory Limits


# docker-compose.yml
services:
  ai_service:
    deploy:
      resources:
        limits:
          memory: 4G

Security Considerations

Non-root User: Container runs as appuser (UID 1000)
Read-only Models: Models mounted as :ro (read-only)
No Secrets in Image: Use environment variables or AWS Secrets Manager
Minimal Base Image: Official TensorFlow images are security-scanned
Health Checks: Ensures container is responsive

CI/CD Integration

Example GitHub Actions workflow:


name: Build and Push AI Service
 
on:
  push:
    branches: [main]
 
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
 
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1
 
      - name: Build and push
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          ECR_REPOSITORY: ai-service
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
            --build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 \
            --platform linux/amd64 \
            services/ai_service/
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest