What you’ll learn: Every CDK stack in EPICBackendCDK β€” what AWS resources they create, how they connect, and why.
Prerequisite: Read Guide 03 (TypeScript and CDK) first.


Section 1 β€” Infrastructure Overview

The EPICBackendCDK defines ALL the AWS infrastructure that EPIC runs on. Think of it as blueprints for the entire cloud system.

What gets created:

  • 30+ DynamoDB tables
  • 40+ SQS queues (including Dead Letter Queues)
  • 5 SNS topics
  • 100+ Lambda functions
  • Multiple API Gateways (REST APIs)
  • 1 VPC (Virtual Private Cloud)
  • 1 RDS/Aurora MySQL cluster
  • Multiple IAM roles and policies
  • CloudWatch alarms and metrics
  • S3 buckets (for Axon traffic data, FMBI data)

Section 2 β€” The Root: app.ts

app.ts is the entry point. It instantiates all stacks in the right order:

// Simplified app.ts
const app = new App();

// Core infrastructure (created first, others depend on these)
const vpcStack = new VpcStack(app, 'VPC', { stage, env });
const apiStack = new ApiStack(app, 'API', { 
    stage, env, 
    vpc: vpcStack.vpc 
});

// Feature stacks (use resources from apiStack)
const hotwStack = new HOTWLambdaStack(app, 'HOTW', {
    stage, env,
    vpc: vpcStack.vpc,
    fleetTable: apiStack.fleetTable,
    serviceTable: apiStack.serviceTable,
    eventTable: apiStack.eventTable,
    // ...
});

const fleetStack = new FleetLambdaStack(app, 'Fleet', {
    vpc: vpcStack.vpc,
    fleetTable: apiStack.fleetTable,
    // ...
});

// ... many more stacks

Why split into multiple stacks?

  • Smaller stacks deploy faster
  • Failures affect only one stack
  • Teams can own different stacks
  • Parallel deployment possible

Section 3 β€” Foundation Stacks

vpcStack.ts β€” Networking

Creates the VPC (Virtual Private Cloud) that isolates EPIC’s resources:

// All Lambda functions run inside this VPC for security
// No direct internet access β€” all traffic goes through NAT gateway
// Private subnets protect the database

What’s in the VPC:

  • Public subnets (with NAT gateway for outbound internet)
  • Private subnets (where Lambda functions and database live)
  • Security groups (firewall rules)

rdsStack.ts β€” MySQL Database

Creates the Aurora MySQL cluster:

// MySQL database for HOTW operational data:
// - hotw_run
// - hotw_execution  
// - asg_details
// - capacity_override_details
// - fulfillment_details
// - preferred_asg

Why MySQL instead of DynamoDB for HOTW?

  • HOTW data has complex relationships (runs β†’ fleets β†’ ASGs)
  • SQL JOIN queries are easier than DynamoDB’s single-table design
  • Aggregate queries (SUM, COUNT) are native in SQL

apiStack.ts β€” Core DynamoDB Tables and Queues ⭐

The most important stack β€” creates all DynamoDB tables and SQS queues.

DynamoDB Tables Created:

Table Name Primary Key Sort Key Purpose
FleetTable FleetId (S) VersionId (N) Fleet configurations
ServiceTable ServiceId (S) VersionId (N) Service configurations
EventTable EventId (S) VersionId (N) Peak event configs
EventPlanTable EventPlanId (S) VersionId (N) Milestone tracking
ProjectionsTable ProjectionId (S) VersionId (N) Traffic projections
SchemaTable SchemaId (S) - Data schemas
EventProfileTable EventProfileId (S) - Event profile templates
ExceptionTable ExceptionId (S) - Buffer factor exceptions
JobDetailsTable JobId (S) - Background job tracking
FleetLockTable FleetId (S) - Concurrent update locking
BAUServiceDashboard ServiceId (S) - BAU capacity dashboard

SQS Queues Created (30+ queues!):

Queue Name FIFO? DLQ? Used For
atomicHOTWSQSQueue.fifo βœ… βœ… Single-fleet HOTW runs
validateAndUpdateSPCOSQSQueue.fifo βœ… βœ… SPCO validation + update
HotwAuditSQSQueue.fifo βœ… βœ… HOTW audit runs
ApolloSQSQueue ❌ ❌ Apollo capacity sync
FmcSQSQueue.fifo βœ… βœ… FMC order fetching
MilestoneSQSQueue.fifo βœ… βœ… Milestone updates
EventFleetCreationQueue ❌ βœ… Auto-create event plans
EventTicketCreationQueue ❌ βœ… Auto-create SIM tickets
GizmoThrottlingUpdateQueue.fifo βœ… βœ… Throttling updates
BAUScaling ❌ ❌ BAU capacity scaling
CutOrUpdateTicketSQSQueue.fifo βœ… βœ… SIM ticket management
ApproveRejectQueue ❌ βœ… Exception approvals
PmetTPMScalingFactorQueue ❌ βœ… PMET metric refresh
BulkUploadPmetQueue.fifo βœ… βœ… Bulk PMET upload
TotalBAUTPMRefreshQueue ❌ βœ… Total BAU TPM refresh
AxonSQSQueue.fifo βœ… βœ… Axon traffic sync
AxonS3SQSQueue ❌ ❌ Axon S3 data processing
AAAQueue ❌ βœ… AAA dependency analysis
GatherEmailQueue ❌ ❌ Email sending
FloSQSQueue ❌ ❌ FLO host operations
ResilienceDataUpdateQueue ❌ βœ… Resilience data sync
TESEventQueue ❌ βœ… TES event processing
ServiceMetaUpdateQueue.fifo βœ… βœ… Service metadata sync
TicketServiceReadinessQueue ❌ βœ… Ticket readiness check

Why FIFO queues?

  • FIFO = First In, First Out (order guaranteed)
  • Also: exactly-once delivery (no duplicate processing)
  • Used for operations where order and uniqueness matter (hardware orders, milestone updates)

Why Dead Letter Queues (DLQ)?

  • If a message fails 3 times, it goes to the DLQ
  • Team gets alerted
  • Can inspect the failed message and retry manually

Section 4 β€” Lambda Feature Stacks

Fleet/fleetLambdaStack.ts

Creates Lambda functions for Fleet CRUD:

  • createFleetLambda β†’ handler.Fleet.createFleet
  • getFleetLambda β†’ handler.Fleet.getFleet
  • updateFleetConfigurationLambda β†’ handler.Fleet.updateFleetConfiguration
  • etc.

Each gets read/write access to FleetTable, ServiceTable, EventTable.

Service/serviceLambdaStack.ts

Creates Lambda functions for Service CRUD.

Event/eventLambdaStack.ts

Creates Lambda functions for Event CRUD.

EventPlan/eventPlanLambdaStack.ts

Creates Lambda functions for EventPlan milestone management.

Projections/projectionsLambdaStack.ts

Creates Lambda functions for traffic projections.

HOTW/HOTWLambdaStack.ts ⭐

Creates all HOTW-related Lambda functions and the HOTW API Gateway.

Lambda functions in HOTWLambdaStack:

createOrUpdateHotwRunDetailsLambda     β†’ creates/updates HOTW run record
updateHotwRunDetailsLambda             β†’ updates run with order type
createHotwExecutionDetailsLambda       β†’ stores execution details
createOrUpdateDashboardDetailsLambda   β†’ updates dashboard + ASG details
getHotwDashboardDetailsLambda          β†’ fetches dashboard data
getAsgDetailsTableLambda               β†’ gets ASG config for fleet
getHotwExecutionHistoryLambda          β†’ gets run history
getAsgDetailsTableForRunIdLambda       β†’ gets ASG details for specific run
storePreferredASGSLmda                 β†’ saves preferred ASG list
getPreferredASGSLmda                   β†’ gets preferred ASG list

API Gateway Routes in HOTWLambdaStack:

/hotw/asgsPreference/{EventId}/{FleetId}
    PUT β†’ storePreferredASGSLmda
    GET β†’ getPreferredASGSLmda

EPICApiStack.ts β€” Main API Gateway

The biggest API Gateway. Creates ALL the REST routes for the main EPIC API:

/service                        GET β†’ getAllServices
/service/{ServiceId}            GET β†’ getService, PUT β†’ updateService, POST β†’ createService
/fleet                          POST β†’ createFleet
/fleet/{FleetId}/{EventId}      GET β†’ getFleet, PUT β†’ updateFleet
/event                          GET β†’ getAllEvents, POST β†’ createEvent
/event/{EventId}                GET β†’ getEvent
/event/{EventId}/dashboard      PUT β†’ getServiceReadinessDashboard
/eventplan/{EventId}/{FleetId}  GET/POST/PUT β†’ EventPlan operations
/projection/{...}               GET/POST/PUT β†’ Projection operations
/hotw/dashboard/{EventId}/{Purpose}  GET β†’ getHotwDashboardDetails
/hotw/runDetail                 POST/PUT β†’ run detail management
/hotw/executionDetail/...       POST/GET β†’ execution detail management
/hotw/dashboardDetail           POST β†’ dashboard detail management
/hotw/asgDetail/{FleetId}       GET β†’ getAsgDetails
/hotw/asgsPreference/{EventId}/{FleetId}  GET/PUT β†’ preferred ASGs
/throttling/{...}               GET/POST/PUT β†’ throttling config
/ticket/{...}                   GET/POST/PUT β†’ SIM ticket management
/calendar/{...}                 GET/POST/PUT β†’ EPIC calendar
/pmet/{...}                     GET/POST/PUT β†’ PMET metric links
/schema/{...}                   GET/POST/PUT β†’ schema management
/exception/{...}                GET/POST/PUT β†’ exception management
/bulkjobs/{...}                 POST β†’ bulk operations

Section 5 β€” Trigger Stacks (Java Lambdas)

triggersStack.ts β€” Java Lambda Triggers ⭐

This is the most complex stack. Creates all the Java Lambda functions:

HOTW Triggers:

// Main HOTW handler
HotwHandler (SNS trigger from Milestone SNS)
    handleRequest(SNSEvent) β†’ triggers HotwUpscalingHelper.handle()
    
// SPCO validation scheduler
ScheduleHotwForSpcoValidation (cron: weekly)
    handleSQSRequestForUpdateSpco() β†’ queues all services+events

// SPCO executor (SQS consumer)
validateAndUpdateSPCOSQSQueue β†’ HotwHandler.validateAndUpdateSpco()

// Atomic HOTW (API trigger)
atomicHOTWSQSQueue β†’ HotwHandler.atomicHotwExecutor()

// HOTW Audit (SQS consumer)
HotwAuditSQSQueue β†’ HotwHandler.auditHotw()
    β†’ HotwUpscalingHelper.printDetailTable()

Apollo Triggers:

// Apollo data sync
ApolloSQSQueue β†’ ApolloHandler
    β†’ fetches latest host counts from Apollo API
    β†’ updates FleetTable in DynamoDB
    β†’ triggers milestone status updates

// Cron: daily Apollo sync
DailyApolloSync (cron: daily) β†’ queues all fleets for Apollo refresh

FMC Triggers:

// FMC order status fetcher
FmcSQSQueue β†’ FmcHandler
    β†’ fetches latest order status from FMC API
    β†’ updates fulfillment details in MySQL
    β†’ triggers notifications

// Cron: periodic FMC sync
FmcTrigger (cron: every few hours)

BAU Scaling:

// BAU capacity scaling
BAUScaling β†’ BAUScalingHandler
    β†’ computes BAU host requirements
    β†’ places BAU SPCO orders if needed

Milestone Handlers:

// Each milestone has a handler:
MilestoneSQSQueue β†’ WorkflowHandler
    β†’ routes to appropriate milestone handler based on milestone ID
    β†’ GatherProjectionsMilestoneHandler
    β†’ HardwareOrderMilestoneHandler
    β†’ HardwareFulfillmentMilestoneHandler
    β†’ CommunicateTPMMilestoneHandler
    β†’ UpdateThrottlingMilestoneHandler

Section 6 β€” Other Important Stacks

apolloStack.ts

  • ApolloExecutionHandler β€” fetches Apollo data per fleet
  • ApolloTriggerHandler β€” scheduler for Apollo syncs

fmcStack.ts

  • FmcHandler β€” fetches FMC order status
  • FmcTriggerHandler β€” periodic FMC sync

milestoneWorkflowStack.ts

  • WorkflowHandler β€” processes milestone SQS messages
  • Routes to specific milestone handlers

bauScalingStack.ts

  • BAUScalingHandler β€” processes BAU capacity needs
  • BAUScalingTriggerLambda β€” scheduled trigger

scalingPlannerStack.ts

  • ScalingPlannerHandler β€” manages EAP enrollment
  • Creates capacity overrides in ScalingPlanner

throttlingStack.ts

  • ThrottlingExecutor β€” applies Gizmo throttling configs
  • Processes throttling SQS messages

simStack.ts

  • SIM ticket creation and management
  • CutOrUpdateTicketSQSQueue β†’ SIM Lambda

axonStack.ts

  • Axon traffic data ingestion
  • AxonS3Handler β€” processes Axon data from S3
  • AxonDependencyHandler β€” builds service dependency graph

pmetStack.ts

  • PMET metric link management
  • APIPMETHandler β€” fetches PMET data

resilienceStack.ts

  • ResilienceDataHandler β€” collects resilience scores
  • ResilienceGatherServicesLambda β€” gets service list

consensusStack.ts

  • Consensus (approval) workflow
  • ApprovalsHandler β€” processes approvals/rejections

athenaS3BucketStack.ts

  • S3 bucket for Athena queries
  • Used for analytics and reporting

fmbiS3BucketStack.ts

  • S3 bucket for FMBI (Fleet Management Business Intelligence)
  • Stores hardware order analytics data

Section 7 β€” Email Templates

Located in lib/EmailTemplates/, these are HTML email templates:

Template When Sent
hardwareOrderDetails.html.ts After HOTW places hardware orders
atomicHardwareOrderDetails.html.ts After single-fleet HOTW run
gatherProjections.html.ts Reminder to submit traffic projections
peakReadinessReport.html.ts Weekly readiness status for service owners
peakReadinessReportForLeader.html.ts Leader-level readiness summary
bauReminderForService.html.ts BAU capacity reminder
bauFailureReportForEpic.html.ts BAU automation failure alert
createService.html.ts New service registered notification
descalingSpcoDetails.html.ts Descale order placed notification
gatherDescaleProjections.html.ts Descale projection request
update.html.ts General update notification
bulkUploadPMETLinksReport.html.ts Bulk PMET upload results
allSuccessfulDescaleSpcoReport.html.ts Descale complete notification

Section 8 β€” CloudWatch Monitoring

Monitors/Constants/alarmConstants.ts

Defines alarm names and ARNs for all critical metrics.

Monitors/Constants/metricConstants.ts

Defines custom CloudWatch metric names:

export const HOTW_SUCCESS_COUNT = "HOTWSuccessCount";
export const HOTW_FAILURE_COUNT = "HOTWFailureCount";
export const SPCO_ORDERS_PLACED = "SPCOOrdersPlaced";
export const MILESTONE_UPDATES = "MilestoneUpdates";

Monitors/monitorsShared.ts

Creates CloudWatch alarms:

// Lambda error rate alarm
new Alarm(this, 'HotwErrorAlarm', {
    metric: hotwLambda.metricErrors(),
    threshold: 5,
    evaluationPeriods: 1,
    alarmDescription: 'HOTW Lambda errors exceeded threshold',
});

Section 9 β€” ARC (Availability and Recovery Coordination)

// lib/ARC/arcConfig.ts
// lib/ARC/arcStack.ts

// ARC routing controls allow EPIC to redirect traffic between regions
// during outages or planned maintenance

Section 10 β€” Deployment Process

Developer commits code
    ↓
Brazil build system builds the package
    ↓
Brazil Pipelines (CDK Pipelines) deploys:
    ↓
Stage 1: BETA environment
    ↓ (automated tests pass)
Stage 2: GAMMA environment
    ↓ (manual approval)
Stage 3: PROD environment (IAD, PDX, DUB)

CDK commands:

# Synthesize (generate CloudFormation templates)
cdk synth

# Deploy a specific stack
cdk deploy HOTWStack

# Compare changes before deploying
cdk diff

# List all stacks
cdk ls

Section 11 β€” How Resources Connect

app.ts creates all stacks
β”‚
β”œβ”€β”€ VpcStack creates VPC
β”‚   └── All Lambda stacks receive vpc: vpcStack.vpc
β”‚
β”œβ”€β”€ ApiStack creates DynamoDB tables and SQS queues
β”‚   └── All feature stacks receive tables and queues as props
β”‚
β”œβ”€β”€ HOTWLambdaStack creates HOTW Lambdas
β”‚   β”œβ”€β”€ receives: fleetTable, serviceTable, eventTable, eventPlanTable
β”‚   └── creates: HOTW API Gateway + Lambda functions
β”‚
β”œβ”€β”€ TriggersStack creates Java Lambda Triggers
β”‚   β”œβ”€β”€ receives: SNS topics, SQS queues
β”‚   └── creates: HotwHandler, ApolloHandler, etc.
β”‚
└── EPICApiStack creates main API Gateway
    β”œβ”€β”€ receives: all Lambda functions from feature stacks
    └── creates: REST API with all routes

Section 12 β€” Security Model

How Lambdas are secured:

  1. VPC β€” Lambdas run in private subnets, no direct internet access
  2. IAM roles β€” Each Lambda has minimum required permissions only
  3. Security groups β€” Firewall rules between services
  4. API Gateway IAM Auth β€” Frontend must assume a role to call APIs
  5. Harmony account principal β€” Frontend accounts are explicitly trusted

IAM permission example:

// Only grant what's needed
props.fleetTable.grantReadData(getFleetLambda);        // read only
props.fleetTable.grantReadWriteData(createFleetLambda); // read+write
// NOT: grantFullAccess() β€” too permissive!

Section 13 β€” Key CDK Concepts Summary

Concept What It Is EPIC Example
Stack Group of AWS resources HOTWLambdaStack
Construct Single AWS resource new Function(...)
Props Input to a stack HOTWLambdaStackProps
L1 Construct Raw CloudFormation CfnTable
L2 Construct CDK wrapped resource Table, Function, Queue
L3 Construct High-level pattern BrazilPackage.fromString(...)
Token Lazy reference props.table.tableName
Aspects Apply rules to all Tags.of(app).add(...)
Synthesize CDK β†’ CloudFormation cdk synth
Deploy Apply to AWS cdk deploy