Docker Deployment Guide
This guide covers building and deploying the AI Service using Docker, both locally and on AWS Fargate.
Overview
The AI Service uses a flexible Docker setup that supports:
- CPU-only inference (AWS Fargate, production)
- GPU-accelerated inference (local development, EC2 GPU instances)
- Official TensorFlow base images
- Multi-stage builds for optimization
- Non-root user for security
Quick Start
Local Development (CPU)
# From repository root
cd services/ai_service
# Build
docker build -t ai-service:latest .
# Run (mount your models directory)
docker run -p 8080:8080 \
-v $(pwd)/../../models:/models:ro \
-e MODEL_DIRECTORY=/models \
-e DEFAULT_MODEL=gen2a \
-e USE_GPU=false \
ai-service:latest
# Test
curl http://localhost:8080/healthDocker Compose (Recommended)
# From repository root
mkdir -p models
# CPU-only mode (matches AWS Fargate)
docker compose up -d
# GPU-enabled mode (requires NVIDIA Container Toolkit)
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
# View logs
docker compose logs -f ai_service
# Stop
docker compose downBuilding Images
CPU-Only (AWS Fargate Compatible)
docker build -t ai-service:cpu \
--build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 \
services/ai_service/This is the default and recommended for production deployment on AWS Fargate.
Base Image: tensorflow/tensorflow:2.18.0
- Official TensorFlow CPU-only image
- Smaller image size (~1.5GB vs ~6GB for GPU)
- Works on any Docker host (no GPU required)
- Compatible with AWS Fargate
GPU-Enabled (Local Development)
docker build -t ai-service:gpu \
--build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0-gpu \
services/ai_service/Base Image: tensorflow/tensorflow:2.18.0-gpu
- Official TensorFlow GPU image with CUDA support
- Requires NVIDIA GPU + NVIDIA Container Toolkit
- Larger image size (~6GB)
- For local development or EC2 GPU instances
AWS Fargate Deployment
Task Definition Configuration
{
"family": "ai-service",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "2048",
"memory": "4096",
"containerDefinitions": [
{
"name": "ai-service",
"image": "<ECR_REPO_URL>/ai-service:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{
"name": "MODEL_DIRECTORY",
"value": "/models"
},
{
"name": "DEFAULT_MODEL",
"value": "gen2a"
},
{
"name": "USE_GPU",
"value": "false"
},
{
"name": "LOG_LEVEL",
"value": "INFO"
}
],
"mountPoints": [
{
"sourceVolume": "models",
"containerPath": "/models",
"readOnly": true
}
],
"healthCheck": {
"command": ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8080/health').read()\" || exit 1"],
"interval": 30,
"timeout": 10,
"retries": 3,
"startPeriod": 90
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ai-service",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"volumes": [
{
"name": "models",
"efsVolumeConfiguration": {
"fileSystemId": "<EFS_FILE_SYSTEM_ID>",
"rootDirectory": "/models",
"transitEncryption": "ENABLED"
}
}
]
}ECR Push Workflow
# Authenticate to ECR
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com
# Build for production
docker build -t ai-service:latest \
--build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 \
--platform linux/amd64 \
services/ai_service/
# Tag
docker tag ai-service:latest <ECR_REPO_URL>/ai-service:latest
docker tag ai-service:latest <ECR_REPO_URL>/ai-service:$(git rev-parse --short HEAD)
# Push
docker push <ECR_REPO_URL>/ai-service:latest
docker push <ECR_REPO_URL>/ai-service:$(git rev-parse --short HEAD)Model Management
Development (Volume Mount)
# Mount local models directory
docker run -v $(pwd)/models:/models:ro ai-service:latestDirectory Structure:
models/
āāā gen2a/
ā āāā model/
ā ā āāā best_model.h5
ā āāā mlb.pkl
ā āāā model_params.pkl
ā āāā continuous_mean.pkl
ā āāā scaler.pkl
ā āāā raw_list_of_field.pkl
āāā gen2i/
āāā ...Production (AWS EFS)
-
Create EFS File System:
aws efs create-file-system \ --performance-mode generalPurpose \ --throughput-mode bursting \ --encrypted \ --tags Key=Name,Value=ai-service-models -
Upload Models to EFS:
# From EC2 instance with EFS mounted aws s3 sync s3://your-models-bucket/models/ /mnt/efs/models/ -
Configure Task Definition: See EFS volume configuration above
Alternative: S3 Sync on Startup
Create a custom entrypoint script:
#!/bin/bash
# entrypoint.sh
# Sync models from S3
if [ -n "$S3_MODELS_BUCKET" ]; then
echo "Syncing models from S3..."
aws s3 sync s3://${S3_MODELS_BUCKET}/models/ /models/
fi
# Start application
exec uvicorn src.main:app --host 0.0.0.0 --port 8080Then update Dockerfile:
COPY entrypoint.sh /app/
RUN chmod +x /app/entrypoint.sh
CMD ["/app/entrypoint.sh"]GPU Development (Local)
See the GPU Setup Guide for detailed instructions on setting up GPU support for local development.
Quick Start
# Install NVIDIA Container Toolkit (one-time setup)
# See GPU Setup Guide for detailed instructions
# Run with GPU
docker run --gpus all -p 8080:8080 \
-v $(pwd)/models:/models:ro \
-e USE_GPU=true \
ai-service:gpu
# Or with Docker Compose
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -dEnvironment Variables
| Variable | Default | Description |
|---|---|---|
MODEL_DIRECTORY | /models | Path to model files |
DEFAULT_MODEL | gen2a | Model subdirectory name |
USE_GPU | true | Enable GPU inference |
LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
API_PREFIX | /api/v1 | API route prefix |
Troubleshooting
Container Starts but Health Check Fails
# Check logs
docker compose logs ai_service
# Common issues:
# 1. Models not mounted correctly
docker compose exec ai_service ls -la /models
# 2. Model loading failure (check for all required pickle files)
docker compose exec ai_service ls -la /models/gen2a/
# 3. Service not listening on correct port
docker compose exec ai_service netstat -tlnpLarge Image Size
The GPU image is large (~6GB) due to CUDA dependencies. For CPU-only:
# Build CPU-only image (smaller)
docker build --build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 -t ai-service:cpu .
# Check size
docker images ai-serviceSlow Startup on Fargate
TensorFlow model loading takes 60-90 seconds. Configure:
- Health check
startPeriod: 90(allows time for model loading) - Use EFS with provisioned throughput for faster model loading
- Consider model caching or warm containers
Out of Memory
Fargate minimum requirements:
- CPU: 2 vCPU (2048)
- Memory: 4 GB (4096)
Adjust based on model size and inference batch size.
Performance Tuning
CPU Optimization
# Set TensorFlow thread count
ENV OMP_NUM_THREADS=4
ENV TF_NUM_INTRAOP_THREADS=4
ENV TF_NUM_INTEROP_THREADS=2Memory Limits
# docker-compose.yml
services:
ai_service:
deploy:
resources:
limits:
memory: 4GSecurity Considerations
- Non-root User: Container runs as
appuser(UID 1000) - Read-only Models: Models mounted as
:ro(read-only) - No Secrets in Image: Use environment variables or AWS Secrets Manager
- Minimal Base Image: Official TensorFlow images are security-scanned
- Health Checks: Ensures container is responsive
CI/CD Integration
Example GitHub Actions workflow:
name: Build and Push AI Service
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build and push
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: ai-service
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
--build-arg TENSORFLOW_IMAGE=tensorflow/tensorflow:2.18.0 \
--platform linux/amd64 \
services/ai_service/
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest