Deep Technical Report & Architecture Documentation

Prepared for: Internship Reference
Date: May 2026
Codebase: Amazon Internal β€” Brazil Workspace Package


Table of Contents

  1. What is EPIC?
  2. High-Level Architecture
  3. System Components (5 Packages)
  4. Data Models & Domain Objects
  5. Database Architecture
  6. REST API Reference
  7. Event Readiness Workflow (Milestones)
  8. Trigger System (Java Lambdas)
  9. Notification & Messaging System
  10. Infrastructure (CDK Stacks)
  11. Frontend Architecture (React)
  12. Traffic & Throttling System
  13. Deployment Pipeline
  14. Key Business Concepts Glossary
  15. Developer Setup Cheatsheet

1. What is EPIC?

EPIC (Everyday Peak In Charge) is an Amazon-internal tool that helps engineering teams plan, manage, and execute capacity scaling for peak traffic events β€” like Prime Day, Black Friday, Holiday season, and BAU (Business As Usual) scaling.

Core Problem It Solves

Amazon services need to handle massive traffic spikes during events. Without coordination:

  • Services under-order hardware β†’ crash during peak
  • Services over-order hardware β†’ wasteful costs
  • Upstream/downstream service teams don’t communicate TPM (traffic) needs
  • No single view of readiness across hundreds of services

What EPIC Does

Without EPIC                          With EPIC
─────────────────────────────         ──────────────────────────────────────
❌ Manual spreadsheets                βœ… Central database of all fleets
❌ Email chains for TPM numbers       βœ… Automated gather/communicate TPM
❌ No hardware order tracking         βœ… Milestone tracking with deadlines
❌ Manual throttling updates          βœ… Automated throttle config push
❌ No readiness dashboard             βœ… Leadership dashboards + HOTW
❌ Services forget about descaling    βœ… Descale milestones & automation

Key Events EPIC Manages (Examples from Code)

Event ID Event Name Type
PrimeDay21 Prime Day 2021 Peak
NewYearSale2025 New Year Sale 2025 Peak
NewYearSale2026 New Year Sale 2026 Peak
SPRINGSALE26 Spring Sale 2026 Peak
EUSPRINGSALE24 EU Spring Sale 2024 Peak
BAU Business As Usual BAU

2. High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         EPIC SYSTEM ARCHITECTURE                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚        SERVICE OWNER / USER     β”‚ (Browser)
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚  HTTPS
                   β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚         EPICFrontend            β”‚  React.js + AWS CloudScape UI
  β”‚    Hosted on Amazon Harmony     β”‚  (Beta/Gamma/Prod via CodePipeline)
  β”‚    https://console.harmony.     β”‚
  β”‚    a2z.com/epic/                β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚  REST API (IAM Auth via Harmony)
                   β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚       AWS API Gateway           β”‚  REST API
  β”‚  (EPICApiStack β€” CDK deployed)  β”‚  ~60+ routes across 12 domains
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚  Lambda Proxy Integration
                   β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                    EPICBackend β€” Node.js Lambda Functions               β”‚
  β”‚                                                                         β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚ Fleet   β”‚ β”‚ Service β”‚ β”‚  Event  β”‚ β”‚EventPlan β”‚ β”‚  Projection   β”‚  β”‚
  β”‚  β”‚ Lambda  β”‚ β”‚ Lambda  β”‚ β”‚ Lambda  β”‚ β”‚  Lambda  β”‚ β”‚    Lambda     β”‚  β”‚
  β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
  β”‚       β”‚           β”‚           β”‚            β”‚                β”‚          β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚Throttle β”‚ β”‚Exceptionβ”‚ β”‚  HOTW   β”‚ β”‚  Ticket  β”‚ β”‚  BulkJobs     β”‚  β”‚
  β”‚  β”‚ Lambda  β”‚ β”‚ Lambda  β”‚ β”‚ Lambda  β”‚ β”‚  Lambda  β”‚ β”‚    Lambda     β”‚  β”‚
  β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
  β”‚       β”‚           β”‚           β”‚            β”‚                           β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚           β”‚           β”‚            β”‚
          β–Ό           β–Ό           β–Ό            β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚              AWS DynamoDB Tables               β”‚
  β”‚  FleetTable  ServiceTable  EventTable          β”‚
  β”‚  EventPlanTable  ProjectionsTable  SchemaTable β”‚
  β”‚  ExceptionTable  ThrottlingTable  HOTWTable    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ DynamoDB Streams
                  β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚               EPICBackendTriggers β€” Java Lambda Functions               β”‚
  β”‚                                                                         β”‚
  β”‚  ApolloHandler  AxonHandler  ThrottlingExecutor  BAUScalingHandler     β”‚
  β”‚  FloTriggerHandler  ConsensusHandler  MilestoneWorkflowHandler         β”‚
  β”‚  ScalingPlannerHandler  PmetHandler  VarianceExceededHandler           β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚              β”‚                β”‚                β”‚
             β–Ό              β–Ό                β–Ό                β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚    Apollo    β”‚ β”‚   Axon   β”‚ β”‚ SDC / Gizmo  β”‚ β”‚   SIM / FLO  β”‚
     β”‚(config push) β”‚ β”‚(traffic) β”‚ β”‚ (throttling) β”‚ β”‚  (ticketing) β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                    MESSAGING & NOTIFICATIONS                            β”‚
  β”‚                                                                         β”‚
  β”‚  SNS (NotificationSNS) ──► SQS (emailQueue) ──► Email Lambda          β”‚
  β”‚  SQS (EventFleetCreation) ──► Fleet trigger Lambda                     β”‚
  β”‚  SQS (EventTicketCreation) ──► Ticket creation Lambda                  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  AWS RDS (MySQL/Aurora)    β”‚ β€” Ticketing, SQL analytics, HOTW data
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. System Components β€” 5 Packages

Package Map

EPIC/ (Brazil Workspace)
β”œβ”€β”€ EPICFrontend/          ← React.js web application
β”‚   β”œβ”€β”€ src/pages/         ← 25+ page views
β”‚   β”œβ”€β”€ src/components/    ← Reusable UI components
β”‚   β”œβ”€β”€ src/client/        ← API Gateway client
β”‚   └── src/store/         ← Redux state management
β”‚
β”œβ”€β”€ EPICBackend/           ← Node.js Lambda business logic
β”‚   β”œβ”€β”€ src/epiclambda/api/       ← 20+ domain API handlers
β”‚   β”œβ”€β”€ src/epiclambda/operations/ ← DB operations layer
β”‚   β”œβ”€β”€ src/epiclambda/common/    ← Constants & utilities
β”‚   β”œβ”€β”€ src/epiclambda/notification/ ← SNS notifications
β”‚   └── src/epiclambda/sqs/       ← SQS message sending
β”‚
β”œβ”€β”€ EPICBackendCDK/        ← TypeScript CDK infrastructure
β”‚   β”œβ”€β”€ lib/apiStack.ts    ← DynamoDB tables, SQS, SNS
β”‚   β”œβ”€β”€ lib/EPICApiStack.ts ← API Gateway routes
β”‚   β”œβ”€β”€ lib/Fleet/         ← Fleet Lambda stack
β”‚   β”œβ”€β”€ lib/Event/         ← Event Lambda stack
β”‚   β”œβ”€β”€ lib/Service/       ← Service Lambda stack
β”‚   β”œβ”€β”€ lib/HOTW/          ← HOTW Lambda stack
β”‚   └── lib/rdsStack.ts    ← RDS MySQL cluster
β”‚
β”œβ”€β”€ EPICBackendTriggers/   ← Java Lambda event processors
β”‚   └── src/com/amazon/epicbackendtriggers/lambda/
β”‚       β”œβ”€β”€ handler/       ← 30+ trigger handlers
β”‚       β”œβ”€β”€ apollo/        ← Apollo config integration
β”‚       β”œβ”€β”€ throttling/    ← SDC/Gizmo throttle updates
β”‚       β”œβ”€β”€ bau/           ← BAU scaling automation
β”‚       β”œβ”€β”€ milestone/     ← Milestone workflows
β”‚       └── scalingplanner/ ← Auto scaling planner
β”‚
└── EPICBackendTriggersIntegrationTests/ ← Integration test suite

3.1 EPICFrontend β€” React Application

Attribute Value
Framework React.js
UI Library AWS CloudScape (@amzn/awsui-components-react)
State Management Redux Toolkit (@reduxjs/toolkit)
Hosting Amazon Harmony
Node version v16.0.0
API Auth Harmony IAM Role (HarmonyAPIGatewayAccessRole)
Env configs .env.development / .env.production

Pages Overview:

Page File Description
events.jsx Browse all peak events
createEvent.jsx Create new peak/BAU event
fleetConfigurations.jsx View & configure fleet scaling
serviceDetails.jsx Per-service configuration detail
service.jsx List all services
serviceOnboarding.jsx Onboard a new service to EPIC
serviceReadinessDashboard.jsx Readiness status per event
serviceOwnerDashboardDetail.jsx Service owner’s view of milestones
hotwDashboard.jsx Head of the Week run dashboard
hotwRunHistory.jsx Historical HOTW run data
hotwAsgRunDetails.jsx ASG (Auto Scaling Group) details
actionDashboard.jsx All outstanding action items
createException.jsx Submit a capacity exception
approveException.jsx Review & approve exceptions
serviceThrottling.jsx Throttle config per service
descaleFleetConfigurations.jsx Post-event descale config
descaleServiceThrottling.jsx Post-event descale throttling
dimensionView.jsx Traffic dimension metrics view
configureDimension.jsx Configure metric dimensions
bauServiceOwnerDashboard.jsx BAU scaling dashboard
hostMigration.jsx Host migration tracking
syncSettings.jsx Configuration sync settings
upstreamDetails.jsx Upstream service dependencies
serviceDescaleReadinessDashboard.jsx Descale readiness view
onboardingChecklist.jsx Service onboarding checklist

3.2 EPICBackend β€” Node.js Lambda Handlers

Each API handler is a class with static async methods:

Handler File Responsibility
Fleet.js CRUD for fleet objects, TPM updates, traffic config, approvals
Service.js CRUD for services, upstream/downstream links, notifications
Event.js CRUD for peak/BAU events, leadership dashboards
EventPlan.js Milestone list management per fleet per event
Projection.js Traffic projections for capacity planning
Schema.js Fleet downstream schema (how traffic is counted)
Throttling.js SDC/Gizmo throttle config data management
Exception.js Capacity exception creation, approval, propagation
Ticket.js Upstream↔Downstream coordination tickets (MySQL)
HOTW.js Head of the Week run & execution details
BulkJobs.js Async bulk job processing (bulk PMET upload etc.)
Calendar.js Excluded dates management for events
EventProfile.js Event profiles for configuration templates
Organization.js Org-level grouping of services
Philosophy.js Scaling philosophy rules per service
CustomInputSF.js Custom input scaling factors
CustomFormula.js Custom TPM computation formulas
Employee.js Employee/user lookup for ownership
Dimension.js Traffic dimension configurations
Traffic.js Input/output traffic management
BAUHostJob.js BAU host ordering job management
Pmet.js PMET (Peak Metric) link management
SIM.js SIM (Amazon ticketing) integration
VarianceExceeded.js Variance detection & alerts

3.3 EPICBackendCDK β€” AWS Infrastructure

Written in TypeScript, deploys via AWS CDK through Brazil Build System.

Build commands:

brazil-build           # Compile TypeScript
brazil-build release   # Build for deployment
brazil-build cdk list  # List available stacks
brazil-build cdk deploy <StackName>

3.4 EPICBackendTriggers β€” Java Lambda Handlers

These are event-driven β€” triggered by DynamoDB Streams, SQS messages, or CloudWatch Events.

Key Handlers:

Handler Trigger What it does
ApolloHandler.java Schedule/DDB Stream Pushes capacity configs to Apollo (Amazon config system)
ApolloTriggerHandler.java SQS Executes Apollo config push for a fleet
BAUScalingHandler.java Schedule Runs BAU scaling recommendations
ThrottlingExecutor.java DDB Stream Pushes throttle changes to SDC/Gizmo systems
FloTriggerHandler.java SQS Runs FLO (Fleet Light Operations) one-box scaling
FloExecutionHandler.java Schedule Executes FLO scaling decisions
MilestoneWorkflowHandler.java (WorkflowHandler) API/SQS Updates milestone completion statuses
ScalingPlannerHandler.java Schedule Generates scaling plan recommendations
ConsensusHandler.java Schedule Runs consensus algorithm for host counts
AxonHandler.java Schedule/Event Integrates with Axon traffic management
GatherEmailTriggerHandler.java SQS Sends TPM gather request emails
PmetHandler.java Schedule Refreshes PMET (Peak Metric) links
VarianceExceededHandler.java CloudWatch Detects TPM variance and alerts
HotwHandler.java Schedule Runs HOTW automation (ASG management)
OnboardingHandler.java DDB Stream Processes new service onboarding steps
DescaleHostsHandler.java Schedule Automates post-event descaling
EAPDetailsHandler.java Event Updates EAP (Emergency Adjustment Process) details
TicketServiceReadinessTriggerHandler.java DDB Stream Creates tickets for service readiness
TotalPeakProjectionHandler.java Schedule Calculates total peak projection across services
ValidateUserHandler.java API Validates user permissions
FmbiHandler.java S3 Processes FMBI (Fleet Management Business Intelligence) data
CapacityInventoryHandler.java Schedule Tracks hardware capacity inventory
UpdateFleetTrafficHandler.java DDB Stream Cascades traffic updates to downstream fleets

4. Data Models & Domain Objects

4.1 Service Object

{
  "ServiceId": "FORTRESSService",
  "ServiceIndexId": 42,
  "VersionId": 3,
  "Email": "fortress-dev@amazon.com",
  "Ldap": "fortress-dev",
  "Owner": "johndoe",
  "PointOfContact": "janedoe",
  "OrganizationId": 1,
  "ServiceType": "Registered",
  "Api": [{ "Name": "EvaluateInternalTransaction", "UsedForScaling": true }],
  "CTI": { "Category": "...", "Type": "...", "Item": "..." },
  "Upstreams": ["VCS-NA", "ARMService-NA"],
  "DownStreams": ["OrderService-NA"],
  "Fleet": ["FORTRESSService-X1-NA", "FORTRESSService-X2-EU"],
  "OnboardingStatus": {
    "FinalStatusComplete": false,
    "CustomerChecklist": {
      "ServiceDetailsVerified": false,
      "UpstreamsAudited": false,
      "DownstreamsAudited": false,
      "PermissionsGiven": false,
      "HostThroughputTPMUpdated": false,
      "PMETLinksGiven": false,
      "CloudTuneDriver": null,
      "CustomerChecklistSignOff": false
    }
  },
  "AuditMetadata": { "User": "johndoe", "Timestamp": "06/01/2024 12:00:00", "Message": "..." }
}

4.2 Fleet Object

{
  "FleetId": "FORTRESSService-X1-NA",
  "ServiceId": "FORTRESSService",
  "EventId": "PrimeDay26",
  "FleetIndexId": 123,
  "VersionId": 2,
  "FleetType": "Registered",
  "FleetConfiguration": {
    "ApolloName": "FORTRESSService/NA/X1/Prod",
    "Region": "us-east-1",
    "AzFactor": 1.125,
    "MaxHostCount": 500,
    "HostThroughputTPM": 570,
    "IsFLORunAutomated": true,
    "ApolloNameForFLO": "FORTRESSService/NA/X1/OneBox"
  },
  "InputTraffic": [
    {
      "Type": "Self",
      "ApiName": "EvaluateInternalTransaction",
      "FleetId": "FortressSILService-NA",
      "InputTPM": 100
    },
    {
      "Type": "Upstream",
      "ApiName": "ComputeRiskProfile",
      "FleetId": "VCS-NA",
      "ScalingFactor": 1.2,
      "ScalingProperties": { "BufferFactor": 0.3 }
    },
    {
      "Type": "CloudTune",
      "ApiName": "EvaluateInternalTransaction",
      "FleetId": "ARMService-NA",
      "CloudTuneProjection": { "ProjectionId": "Physical-Order-Rate-NA", "VersionId": 1 }
    }
  ],
  "OutputTraffic": [
    { "Type": "Auto", "ApiName": "ComputeRiskProfile", "FleetId": "VCS-NA", "ScalingFactor": 1.2 }
  ],
  "HostOrderStatuses": {
    "HostOrdersNeeded": 200,
    "HostsPendingDelivery": 50,
    "HostsPendingApproval": 10
  },
  "ScalingStatus": "Completed",
  "BauMetadata": { ... },
  "AuditMetadata": { "User": "johndoe", "Timestamp": "...", "Message": "..." }
}

4.3 Event Object

{
  "EventId": "PrimeDay26",
  "EventName": "Prime Day 2026",
  "EventType": "Peak",
  "VersionId": 1,
  "LatestVersionId": 1,
  "RegionList": ["NA", "FE", "EU", "CN"],
  "EventStartDate": "07/14/2026 00:00:00",
  "EventEndDate": "07/15/2026 23:59:59",
  "EventInitialHardwareOrderDate": "02/01/2026 12:00:00",
  "EventHardwareReadinessDate": "06/01/2026 12:00:00",
  "BAUMonth": "06/2026",
  "CloudtunePeakFactor": { "NA": 2.1, "EU": 1.8, "FE": 1.6, "CN": 1.4 },
  "SPCOEventDatesByRegion": {
    "NA": { "SPCOEventStartDate": "07/01/2026 00:00:00", "SPCOEventEndDate": "07/31/2026 00:00:00" }
  },
  "AuditMetadata": { "User": "johndoe", "Timestamp": "...", "Message": "Creating PrimeDay26" }
}

4.4 EventPlan (Milestone Tracking)

{
  "EventPlanId": "PrimeDay26#FORTRESSService-X1-NA",
  "EventId": "PrimeDay26",
  "FleetId": "FORTRESSService-X1-NA",
  "ServiceId": "FORTRESSService",
  "VersionId": 3,
  "EventReadinessStatus": false,
  "EventMilestone": [
    {
      "MilestoneId": "GatherProjectionFromUpstream",
      "MilestoneCompletionStatus": "Completed",
      "ETA": "03/01/2026",
      "MilestoneMessage": "Projections gathered"
    },
    {
      "MilestoneId": "HardwareOrder",
      "MilestoneCompletionStatus": "Pending",
      "SubMilestones": [
        { "MilestoneId": "PlaceHardwareOrder", "MilestoneCompletionStatus": "Completed" },
        { "MilestoneId": "HardwareOrderApproval", "MilestoneCompletionStatus": "Pending" }
      ]
    },
    { "MilestoneId": "HardwareFulfillment", "MilestoneCompletionStatus": "NotStarted" },
    { "MilestoneId": "CommunicateTPMToDownstream", "MilestoneCompletionStatus": "NotStarted" },
    { "MilestoneId": "ThrottlingUpdateBeforeEvent", "MilestoneCompletionStatus": "NotStarted" }
  ],
  "EventDescaleMilestone": [
    { "MilestoneId": "DescaleCompletionMilestone" },
    { "MilestoneId": "GatherDescaleProjectionFromUpstream" },
    { "MilestoneId": "CommunicateDescaleTPMToDownstream" },
    { "MilestoneId": "DescaleThrottlingUpdate" }
  ]
}

4.5 Throttling Object

{
  "RecordId": "FORTRESSService-X1-NA#PrimeDay26",
  "FleetIndexId": 123,
  "ServiceIndexId": 42,
  "EventIndexId": 7,
  "Region": "us-east-1",
  "EPICUpstream": "VCS-NA",
  "Operation": "EvaluateInternalTransaction",
  "CurrentLimit": 5000,
  "UpscalingLimit": 10000,
  "DescalingLimit": 2000,
  "IsDisabled": false
}

5. Database Architecture

DynamoDB Tables

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        DYNAMODB TABLE LAYOUT                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Table Name              β”‚ Primary Key      β”‚ Purpose                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ FleetTable              β”‚ FleetId + EventIdβ”‚ All fleet scaling data     β”‚
β”‚ FleetIndexTable         β”‚ FleetIndexId     β”‚ Auto-increment fleet IDs   β”‚
β”‚ FleetLockTable          β”‚ FleetId          β”‚ Optimistic locking         β”‚
β”‚ ServiceTable            β”‚ ServiceId        β”‚ Service configurations     β”‚
β”‚ ServiceIndexTable       β”‚ ServiceIndexId   β”‚ Auto-increment service IDs β”‚
β”‚ EventTable              β”‚ EventId + VersionIdβ”‚ Peak event metadata     β”‚
β”‚ EventIndexTable         β”‚ EventIndexId     β”‚ Auto-increment event IDs   β”‚
β”‚ EventPlanTable          β”‚ EventPlanId + VersionIdβ”‚ Milestone tracking  β”‚
β”‚ ProjectionsTable        β”‚ ProjectionId     β”‚ Traffic projections        β”‚
β”‚ SchemaTable             β”‚ FleetId          β”‚ Fleet downstream schemas   β”‚
β”‚ EventProfileTable       β”‚ EventProfileId   β”‚ Event profile templates    β”‚
β”‚ ExceptionTable          β”‚ ExceptionId      β”‚ Capacity exceptions        β”‚
β”‚ JobDetailsTable         β”‚ JobId            β”‚ Async job status           β”‚
β”‚ ThrottlingTable         β”‚ RecordId         β”‚ Throttle data per fleet    β”‚
β”‚ ThrottlingConfigTable   β”‚ ConfigId         β”‚ Throttle config templates  β”‚
β”‚ BAUServiceDashboard     β”‚ ServiceId        β”‚ BAU scaling dashboard      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

DynamoDB Key Design Pattern (Versioning)

EPIC uses a dual-table versioning pattern for most entities:

Main Table (FleetTable):        Stores ALL versions
  PK: FleetId + VersionId

Latest Version Index (GSI):     Quick lookup for current data
  GSI: FleetId-LatestVersionId-index

This allows:

  • Complete history of every change
  • Fast latest-version reads
  • Audit trail with AuditMetadata on every record

SQS Queues

Queue Purpose DLQ
emailQueue Email notifications via SES emailDLQ
EventFleetCreationQueue Async fleet creation on event EventFleetCreationDLQ
EventTicketCreationQueue Auto-create coordination tickets EventTicketCreationDLQ
DetectFleetInconsistenciesQueue Background fleet validation DetectFleetInconsistenciesDLQ
PmetLinksRfrshQueue Refresh PMET links periodically PmetLinksRfrshDLQ
CustomFormulaRefreshQueue Refresh formula calculations CustomFormulaRefreshDLQ

All DLQs retain messages for 14 days.

RDS (MySQL)

Used for:

  • Ticketing data (relational upstream↔downstream tickets)
  • SQL analytics (ExecuteSQL endpoint)
  • HOTW execution data (ASG run history)
  • Host ordering details (OrderDetails)

6. REST API Reference

Base URL: https://<api-gateway-id>.execute-api.<region>.amazonaws.com/Prod

Fleet APIs

Method Path Description
POST /fleet Create a new fleet
GET /fleet/{FleetId}/{EventId} Get fleet data
PUT /fleet/{FleetId}/{EventId} Update fleet
GET /fleet/{FleetId}/{EventId}/configuration Get fleet config
PUT /fleet/{FleetId}/{EventId}/configuration/host_throughput Update HostTPM
PUT /fleet/{FleetId}/{EventId}/configuration/apollo_properties Update Apollo config
PUT /fleet/{FleetId}/{EventId}/configuration/AZ_Factor Update AZ Factor
PUT /fleet/{FleetId}/{EventId}/configuration/custom_thresholds Update thresholds
PUT /fleet/{FleetId}/{EventId}/configuration/region Update region
GET /fleet/{FleetId}/{EventId}/Traffic Get traffic data
PUT /fleet/{FleetId}/{EventId}/Traffic Update traffic
PUT /fleet/{FleetId}/{EventId}/Traffic/disable Disable fleet traffic
PUT /fleet/{FleetId}/{EventId}/updateFleetTrigger Trigger scaling
PUT /fleet/{FleetId}/{EventId}/overrideTotalInputTPM Override input TPM
PUT /fleet/{FleetId}/{EventId}/updateOutputTpm Update output TPM
PUT /fleet/{FleetId}/{EventId}/updateBAUTPM Update BAU TPM
PUT /fleet/{FleetId}/{EventId}/updateDescaleTPM Update descale TPM
PUT /fleet/{FleetId}/{EventId}/hostOrderStatuses Update host orders
PUT /fleet/{FleetId}/{EventId}/approval Submit approval
GET /fleet/{FleetId}/{EventId}/fleetVersionList Get version history
GET /fleet/{FleetId}/{EventId}/inputTrafficSnapshot Input traffic snapshot
GET /fleet/{FleetId}/Version/{VersionId} Get specific version
GET /fleet/batch Get batch of fleet IDs

Service APIs

Method Path Description
POST /service Create service
GET /service List all services
GET /service/{ServiceId} Get service
PUT /service/{ServiceId} Update service
GET /service/{ServiceId}/upstreams/{EventId} Get upstream services
GET /service/{ServiceId}/downstreams/{EventId} Get downstream readiness
PUT /service/{ServiceId}/upstreams/{EventId} Send gather TPM email
PUT /service/{ServiceId}/throttling/config Update SDC throttle config
PUT /service/{ServiceId}/throttling/gizmo Update Gizmo config
PUT /service/{ServiceId}/throttling/fleetStatus Update fleet throttle status
GET /service/{ServiceId}/throttling/{EventId} Check throttle readiness
PUT /service/{ServiceId}/onboarding Update onboarding status
GET /service/{ServiceId}/preference Get service preferences
PUT /service/{ServiceId}/preference Update service preferences
GET /service/dashboard BAU service dashboard

Event APIs

Method Path Description
GET /event List all events
POST /event Create event
GET /event/{EventId} Get event
PUT /event/{EventId} Update event
GET /event/{EventId}/fleets Get all fleets for event
PUT /event/{EventId}/dashboard Leadership dashboard data
PUT /event/{EventId}/dashboard/descaling Descale dashboard data
GET /event/{EventId}/automatedMetricPercentage Metric automation %

EventPlan (Milestones) APIs

Method Path Description
POST /eventPlan Create event plan
GET /eventPlan/{EventId}/{FleetId} Get event plan
PUT /eventPlan/{EventId}/{FleetId}/eventMilestoneList Bulk update milestones
PUT /eventPlan/{EventId}/{FleetId}/eventMilestoneDetail Update milestone detail
PUT /eventPlan/{EventId}/{FleetId}/milestoneStatusUpdate Update milestone status
GET /eventPlan/{EventId}/{FleetId}/version/{VersionId} Get versioned plan

Other Domain APIs

Domain Routes include
Projection GET/POST /projection, GET/PUT /projection/{ProjectionId}
Schema PUT /fleet/{FleetId}/{EventId}/schema/downstream, GET /fleet/{FleetId}/schema
Exception POST/PUT /exception, GET /exception/{ExceptionId}
Ticket POST /ticket, GET/PUT /ticket/...
HOTW POST/PUT /hotw/run, POST /hotw/execution, POST/GET /hotw/dashboard
Calendar GET/PUT /calendar
BulkJobs POST/PUT/GET /jobs, GET /jobs/{JobId}

7. Event Readiness Workflow (Milestones)

This is the core operational workflow that EPIC manages for each fleet per peak event.

╔════════════════════════════════════════════════════════════════════════╗
β•‘              PEAK EVENT READINESS WORKFLOW (Per Fleet)                β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

  EVENT CREATED
       β”‚
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ MILESTONE 1: Gather Projection From      β”‚
  β”‚             Upstream                     β”‚
  β”‚                                          β”‚
  β”‚  β€’ Send email to all upstream services   β”‚
  β”‚  β€’ Upstreams provide expected TPM        β”‚
  β”‚  β€’ EPIC auto-calculates required hosts   β”‚
  β”‚  Status: NotStarted β†’ Pending β†’ Completedβ”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ MILESTONE 2: Hardware Order              β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
  β”‚  β”‚ Sub-milestone 2a:               β”‚    β”‚
  β”‚  β”‚   Place Hardware Order          β”‚    β”‚
  β”‚  β”‚   (SPCO override submitted)     β”‚    β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
  β”‚                 β”‚                        β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
  β”‚  β”‚ Sub-milestone 2b:               β”‚    β”‚
  β”‚  β”‚   Hardware Order Approval       β”‚    β”‚
  β”‚  β”‚   (Business/Regional/Financial) β”‚    β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ MILESTONE 3: Hardware Fulfillment        β”‚
  β”‚                                          β”‚
  β”‚  β€’ Hardware physically delivered         β”‚
  β”‚  β€’ Hosts come online in datacenter       β”‚
  β”‚  β€’ Fleet host count verified             β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ MILESTONE 4: Communicate TPM             β”‚
  β”‚             To Downstream                β”‚
  β”‚                                          β”‚
  β”‚  β€’ Send peak TPM numbers to downstream   β”‚
  β”‚  β€’ Downstream updates their scaling too  β”‚
  β”‚  β€’ Tickets created for coordination      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ MILESTONE 5: Throttling Update           β”‚
  β”‚             Before Event                 β”‚
  β”‚                                          β”‚
  β”‚  β€’ SDC/Gizmo throttle limits pushed      β”‚
  β”‚  β€’ Limits set to peak capacity           β”‚
  β”‚  β€’ Throttling marked ready               β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
         βœ… EVENT READINESS STATUS = TRUE
                    β”‚
          ══════════════════════
             PEAK EVENT RUNS
          ══════════════════════
                    β”‚
                    β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ DESCALE MILESTONE 1: Descale Completion  β”‚
  β”‚ DESCALE MILESTONE 2: Gather Descale TPM  β”‚
  β”‚ DESCALE MILESTONE 3: Communicate         β”‚
  β”‚              Descale TPM To Downstream   β”‚
  β”‚ DESCALE MILESTONE 4: Descale Throttling  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Milestone Statuses: NotStarted β†’ Pending β†’ Completed
                    (Also: NotApplicable, NotAvailable)

8. Trigger System (Java Lambdas)

How Triggers Work

  DynamoDB FleetTable
       β”‚ (Stream)
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   DynamoDB Stream Processor     β”‚
  β”‚   (EPICBackendTriggers)         β”‚
  β”‚                                 β”‚
  β”‚   INSERT/MODIFY/REMOVE event    β”‚
  β”‚   ──► Route to correct handler  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β–Ό       β–Ό                β–Ό                 β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Apollo  β”‚ β”‚ Throttling  β”‚ β”‚  Milestone    β”‚ β”‚ Traffic      β”‚
  β”‚ Handler β”‚ β”‚ Executor    β”‚ β”‚  Workflow     β”‚ β”‚ Update       β”‚
  β”‚         β”‚ β”‚             β”‚ β”‚  Handler      β”‚ β”‚ Handler      β”‚
  β”‚ Pushes  β”‚ β”‚ SDC + Gizmo β”‚ β”‚               β”‚ β”‚              β”‚
  β”‚ config  β”‚ β”‚ throttle    β”‚ β”‚ Auto-complete β”‚ β”‚ Cascades TPM β”‚
  β”‚ to      β”‚ β”‚ limit       β”‚ β”‚ milestones    β”‚ β”‚ changes to   β”‚
  β”‚ Apollo  β”‚ β”‚ update      β”‚ β”‚               β”‚ β”‚ downstream   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Amazon Internal Systems Integrated

System What it is EPIC’s Integration
Apollo Amazon’s internal configuration deployment system Pushes fleet capacity configs (SPCO overrides)
Axon Amazon’s traffic management system Reads/writes traffic shaping rules
SDC Service Dependency Control (throttling) Updates max TPM throttle limits
Gizmo Another throttling framework Alternative throttle config push
FLO Fleet Light Operations (one-box scaling) Automated one-box scale tests
SIM Amazon internal ticketing system Creates SIM tickets for fleet actions
CloudTune Amazon’s ML-based capacity recommendation Source of scaling factor projections
Conduit Amazon’s credential management AWS credential provisioning
Harmony Amazon’s frontend app hosting Hosts the EPIC web UI
Brazil Amazon’s build/package management Used to build and deploy all packages
PMET Peak Metric tracking system Links to metrics for each fleet
HOTW Head of the Week Weekly operational automation
FMBI Fleet Management Business Intelligence Fleet analytics data source
Superstar Amazon’s CDK pipeline framework Used for deploying CDK stacks

9. Notification & Messaging System

  Any Lambda (Fleet update, Service create, etc.)
       β”‚
       β”‚  publish(TopicArn: SNS_TOPIC_ARN)
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚    SNS Topic         β”‚
  β”‚  (notificationSNS)   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  SqsSubscription
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   emailQueue (SQS)   │──────▢│   Email Lambda       │──▢ Amazon SES β†’ Email
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β”‚ maxReceiveCount: 2
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚      emailDLQ        β”‚ (14-day retention)
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  Message Types (MessageAttributes β†’ NotificationName):
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ β€’ UPDATE         - Fleet/Service object updated      β”‚
  β”‚ β€’ CREATE_SERVICE - New service created               β”‚
  β”‚ β€’ GATHER_EMAIL   - Request TPM from upstream         β”‚
  β”‚ β€’ PEAK_READINESS - Readiness status change           β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

10. Infrastructure (CDK Stacks)

Stack Dependency Tree

SuperStarPersonalBootstrap
└── VpcStack (Virtual Network)
    β”œβ”€β”€ ApiStack (DynamoDB + SQS + SNS + SecurityGroup)
    β”œβ”€β”€ FleetLambdaStack
    β”œβ”€β”€ ServiceLambdaStack
    β”œβ”€β”€ EventLambdaStack
    β”œβ”€β”€ EventPlanLambdaStack
    β”œβ”€β”€ ProjectionsLambdaStack
    β”œβ”€β”€ SchemaLambdaStack
    β”œβ”€β”€ EventProfileLambdaStack
    β”œβ”€β”€ ExceptionLambdaStack ◄─ uses ExceptionTable from ApiStack
    β”œβ”€β”€ BulkJobsLambdaStack
    β”œβ”€β”€ TicketLambdaStack
    β”œβ”€β”€ CalendarLambdaStack
    β”œβ”€β”€ HOTWLambdaStack
    β”œβ”€β”€ ThrottlingLambdaStack
    β”œβ”€β”€ CommonStack
    β”œβ”€β”€ OrganizationStack
    β”œβ”€β”€ PhilosophyStack
    β”œβ”€β”€ CustomInputSFStack
    β”œβ”€β”€ CustomFormulaStack
    β”œβ”€β”€ ExceptionStack
    β”œβ”€β”€ RdsStack (MySQL + Lambda integrations)
    β”œβ”€β”€ EPICApiStack (API Gateway β€” ALL routes)
    └── TriggersStack (CloudWatch / SNS triggers)

Optional:
    β”œβ”€β”€ SIMStack
    β”œβ”€β”€ MilestoneWorkflowStack
    β”œβ”€β”€ AxonStack
    β”œβ”€β”€ ApprovalStack
    β”œβ”€β”€ ScalingPlannerStack
    β”œβ”€β”€ PmetStack
    └── TicketingStack

Deployment Stages (CI/CD)

  Commit to mainline
       β”‚
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚           AWS CodePipeline (EPIC-Prod account)  β”‚
  β”‚                  us-west-2 region               β”‚
  └──────────┬────────────────────────────────────  β”˜
             β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚   Beta Stage   β”‚ ◄── First deployment, automated tests
     β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  Gamma Stage   β”‚ ◄── Staging environment
     β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚   Prod Stage   β”‚ ◄── Live at console.harmony.a2z.com/epic/
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

11. Frontend Architecture (React)

src/
β”œβ”€β”€ Epic.js                    ← Root component, routing setup
β”œβ”€β”€ index.js                   ← Entry point, Redux store setup
β”œβ”€β”€ store/                     ← Redux Toolkit state slices
β”œβ”€β”€ client/
β”‚   └── getApigClient.js       ← API Gateway client factory
β”‚                                 (uses aws-api-gateway-client)
β”œβ”€β”€ pages/                     ← 25+ page components (see table above)
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ cards/                 ← Card layout components
β”‚   β”œβ”€β”€ tables/                ← Data table components
β”‚   β”œβ”€β”€ Tabs/                  ← Tab navigation
β”‚   β”œβ”€β”€ Headers/               ← Page header components
β”‚   β”œβ”€β”€ Flashbar/              ← Alert/notification bar
β”‚   β”œβ”€β”€ Container/             ← Layout containers
β”‚   β”œβ”€β”€ crumbs/                ← Breadcrumb navigation
β”‚   β”œβ”€β”€ VersionHistory/        ← Version history viewer
β”‚   β”œβ”€β”€ sideNav.jsx            ← Left navigation sidebar
β”‚   β”œβ”€β”€ topNavigation.jsx      ← Top navigation bar
β”‚   └── withEventTypeRoutes.jsx← HOC for event routing
β”œβ”€β”€ common/                    ← Shared utilities
β”œβ”€β”€ configuration/             ← App configuration
β”œβ”€β”€ featureutils/              ← Feature flag utilities
β”œβ”€β”€ styles/                    ← Global CSS
└── tutorials/                 ← Onboarding tutorials

Key Libraries:
  @amzn/awsui-components-react  ← AWS CloudScape Design System
  @reduxjs/toolkit              ← State management
  aws-api-gateway-client        ← API calls to backend
  moment                        ← Date formatting
  lodash                        ← Utility functions
  csv-string / json2csv         ← Export to CSV
  react-scripts                 ← Build tooling (CRA)

12. Traffic & Throttling System

TPM Flow β€” How Traffic Numbers Flow

  CloudTune (ML predictions)
       β”‚  CloudTune Peak Factor
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚              EPIC Projection                       β”‚
  β”‚   ProjectionId: "Physical-Order-Rate-NA"           β”‚
  β”‚   BAU TPM Γ— CloudtunePeakFactor β†’ Peak TPM         β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚              Fleet InputTraffic                    β”‚
  β”‚                                                    β”‚
  β”‚  Type: "Self"      β†’ Direct traffic measurement    β”‚
  β”‚  Type: "Upstream"  β†’ Driven by upstream fleet TPM  β”‚
  β”‚  Type: "CloudTune" β†’ ML model driven traffic       β”‚
  β”‚                                                    β”‚
  β”‚  Total InputTPM = Ξ£(InputTraffic sources)          β”‚
  β”‚  Required Hosts = InputTPM / HostThroughputTPM     β”‚
  β”‚                 Γ— AzFactor  (AZ redundancy)        β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚              Throttling Update                     β”‚
  β”‚                                                    β”‚
  β”‚  SDC Throttle Config:                              β”‚
  β”‚    CurrentLimit: BAU TPM                           β”‚
  β”‚    UpscalingLimit: Peak TPM                        β”‚
  β”‚    DescalingLimit: Post-peak TPM                   β”‚
  β”‚                                                    β”‚
  β”‚  Gizmo Throttle Config:                            β”‚
  β”‚    Alternative throttle system with revisions      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  AZ Factor Values (by region):
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ us-east-1 (NA)  β†’ 1.125     β”‚ (need 12.5% extra for AZ redundancy)
  β”‚ eu-west-1 (EU)  β†’ 1.35      β”‚
  β”‚ eu-south-2      β†’ 1.35      β”‚
  β”‚ us-west-2 (FE)  β†’ 1.35      β”‚
  β”‚ cn-north-1 (CN) β†’ 2.0       β”‚
  β”‚ eu-central-1    β†’ 1.35      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

13. Deployment Pipeline

Frontend Deployment (EPICFrontend)

# Local Development
npm install
npm start              # β†’ localhost:3000

# Testing
npm run build_test     # build + test combined
npm test               # Jest tests only

# Deploy to Harmony
git push to mainline
# β†’ CodePipeline auto-deploys: Beta β†’ Gamma β†’ Prod

Backend Deployment (EPICBackend + CDK)

# Step 1: Build backend packages
cd EPICBackend && brazil-build release
cd EPICBackendTriggers && brazil-build release
cd EPICBackendCDK && brazil-build release

# Step 2: Bootstrap CDK
brazil-build bootstrap

# Step 3: Deploy stacks
brazil-build cdk deploy Personal-ApiStack          # DynamoDB, SQS, SNS
brazil-build cdk deploy Personal-VpcStack          # VPC networking
brazil-build cdk deploy Personal-FleetLambdaStack  # Fleet Lambdas
brazil-build cdk deploy Personal-EPICApiStack      # API Gateway + all routes

# Quick Lambda code update (no full CloudFormation deploy)
bb cdk deploy --hotswap Personal-FleetLambdaStack

14. Key Business Concepts Glossary

Term Definition
Fleet A group of Amazon servers (hosts) running a specific service in a region (e.g., FORTRESSService-X1-NA)
TPM Transactions Per Minute β€” the traffic volume metric used for all scaling decisions
HostThroughputTPM How many TPM a single host can handle (e.g., 570 TPM/host)
Peak Event A scheduled high-traffic period requiring extra capacity (Prime Day, Black Friday, etc.)
BAU Business As Usual β€” normal (non-peak) operations, also managed in EPIC
EventPlan The per-fleet milestone tracking plan for a peak event
Milestone A specific readiness gate each fleet must pass before peak (hardware order, throttling, etc.)
AZ Factor Availability Zone redundancy multiplier β€” how much extra capacity to add for multi-AZ redundancy
SPCO Override Service Provider Capacity Override β€” a request to AWS to provision extra hardware
Throttling Rate-limiting traffic to protect a service from overload
SDC Service Dependency Control β€” Amazon’s internal throttling system
Gizmo Another Amazon throttling framework
Apollo Amazon’s internal configuration deployment system
CloudTune Amazon’s ML-based capacity recommendation system
HOTW Head of the Week β€” weekly operational run that automates scaling decisions
FLO Fleet Light Operations β€” one-box (single host) automated scaling tests
Axon Amazon’s traffic management/routing system
SIM Amazon’s internal ticketing system (like Jira)
Upstream A service that sends traffic TO the current service
Downstream A service that receives traffic FROM the current service
Projection An estimated traffic forecast for a future event
Exception A request for capacity outside the normal scaling plan
Harmony Amazon’s internal frontend app hosting platform
Brazil Amazon’s internal build, package, and dependency management system
Bindle Amazon’s resource ownership/permission tracking system
PMET Peak Metric β€” a pre-defined performance metric used to validate scaling
FMBI Fleet Management Business Intelligence β€” data source for fleet analytics

15. Developer Setup Cheatsheet

Prerequisites

βœ“ Amazon dev-desk (dev environment)
βœ“ Brazil CLI installed
βœ“ AWS credentials (mwinit)
βœ“ Node.js v16.0.0 + npm 8
βœ“ Java 11 (JDK)
βœ“ BATS CLI: toolbox install batscli
βœ“ Harmony CLI installed

Workspace Setup

# Create workspace and pull all packages
brazil ws create --name EPIC
cd EPIC
brazil ws use --versionset EPICBackend/development
brazil ws use --package EPICBackendCDK
brazil ws use --package EPICBackend
brazil ws use --package EPICBackendTriggers
brazil ws use --package EPICFrontend

Running Frontend Locally

cd src/EPICFrontend
npm install
npm start          # β†’ localhost:3000 (connected to EPIC-Devo backend)

Running Backend Tests

cd src/EPICBackend
npm test           # Runs Jest tests with β‰₯70% coverage requirement

Common Troubleshooting

Error Fix
sh: react-scripts: command not found rm -rf node_modules && npm install
harmony command not found Run harmony npm to point to Amazon npm registry
Error: Integrity check failed rm package-lock.json && harmony npm && npm install
NOT Found - GET https://registry.npmjs.org/@amzn Run harmony npm first
CDK token expired Run mwinit -o or ada credentials update ...
npm ERR! ERR_STRING_TOO_LONG rm -rf aws_lambda.bundle.primary.* then re-run
JAVA_HOME not found echo $JAVA_HOME β€” install JDK and set PATH

Summary

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        EPIC AT A GLANCE                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Purpose             β”‚ Amazon peak capacity planning & execution     β”‚
β”‚ Users               β”‚ Service owners, EPIC team, leadership        β”‚
β”‚ Scale               β”‚ Hundreds of services, thousands of fleets    β”‚
β”‚ Events Managed      β”‚ Prime Day, Black Friday, Holiday, BAU        β”‚
β”‚ Frontend            β”‚ React + CloudScape UI on Amazon Harmony      β”‚
β”‚ Backend             β”‚ Node.js Lambda functions (~20 domains)       β”‚
β”‚ Infrastructure      β”‚ AWS CDK TypeScript (~25 stacks)              β”‚
β”‚ Triggers            β”‚ Java Lambdas (~30 handlers)                  β”‚
β”‚ Primary Database    β”‚ DynamoDB (15+ tables with versioning)        β”‚
β”‚ Secondary Database  β”‚ RDS MySQL (tickets, analytics)               β”‚
β”‚ Messaging           β”‚ SNS β†’ SQS (6 queues + DLQs)                 β”‚
β”‚ Auth                β”‚ AWS IAM via Harmony proxy                    β”‚
β”‚ CI/CD               β”‚ AWS CodePipeline: Beta β†’ Gamma β†’ Prod        β”‚
β”‚ Regions             β”‚ NA (us-east-1), EU, FE (us-west-2), CN      β”‚
β”‚ Build System        β”‚ Amazon Brazil                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Report generated from deep codebase analysis of EPIC/EPIC workspace.
Internal Amazon project β€” not for external distribution.