What youβll learn: Every CDK stack in EPICBackendCDK β what AWS resources they create, how they connect, and why.
Prerequisite: Read Guide 03 (TypeScript and CDK) first.
Section 1 β Infrastructure Overview
The EPICBackendCDK defines ALL the AWS infrastructure that EPIC runs on. Think of it as blueprints for the entire cloud system.
What gets created:
- 30+ DynamoDB tables
- 40+ SQS queues (including Dead Letter Queues)
- 5 SNS topics
- 100+ Lambda functions
- Multiple API Gateways (REST APIs)
- 1 VPC (Virtual Private Cloud)
- 1 RDS/Aurora MySQL cluster
- Multiple IAM roles and policies
- CloudWatch alarms and metrics
- S3 buckets (for Axon traffic data, FMBI data)
Section 2 β The Root: app.ts
app.ts is the entry point. It instantiates all stacks in the right order:
// Simplified app.ts
const app = new App();
// Core infrastructure (created first, others depend on these)
const vpcStack = new VpcStack(app, 'VPC', { stage, env });
const apiStack = new ApiStack(app, 'API', {
stage, env,
vpc: vpcStack.vpc
});
// Feature stacks (use resources from apiStack)
const hotwStack = new HOTWLambdaStack(app, 'HOTW', {
stage, env,
vpc: vpcStack.vpc,
fleetTable: apiStack.fleetTable,
serviceTable: apiStack.serviceTable,
eventTable: apiStack.eventTable,
// ...
});
const fleetStack = new FleetLambdaStack(app, 'Fleet', {
vpc: vpcStack.vpc,
fleetTable: apiStack.fleetTable,
// ...
});
// ... many more stacks
Why split into multiple stacks?
- Smaller stacks deploy faster
- Failures affect only one stack
- Teams can own different stacks
- Parallel deployment possible
Section 3 β Foundation Stacks
vpcStack.ts β Networking
Creates the VPC (Virtual Private Cloud) that isolates EPICβs resources:
// All Lambda functions run inside this VPC for security
// No direct internet access β all traffic goes through NAT gateway
// Private subnets protect the database
Whatβs in the VPC:
- Public subnets (with NAT gateway for outbound internet)
- Private subnets (where Lambda functions and database live)
- Security groups (firewall rules)
rdsStack.ts β MySQL Database
Creates the Aurora MySQL cluster:
// MySQL database for HOTW operational data:
// - hotw_run
// - hotw_execution
// - asg_details
// - capacity_override_details
// - fulfillment_details
// - preferred_asg
Why MySQL instead of DynamoDB for HOTW?
- HOTW data has complex relationships (runs β fleets β ASGs)
- SQL JOIN queries are easier than DynamoDBβs single-table design
- Aggregate queries (SUM, COUNT) are native in SQL
apiStack.ts β Core DynamoDB Tables and Queues β
The most important stack β creates all DynamoDB tables and SQS queues.
DynamoDB Tables Created:
| Table Name | Primary Key | Sort Key | Purpose |
|---|---|---|---|
FleetTable |
FleetId (S) | VersionId (N) | Fleet configurations |
ServiceTable |
ServiceId (S) | VersionId (N) | Service configurations |
EventTable |
EventId (S) | VersionId (N) | Peak event configs |
EventPlanTable |
EventPlanId (S) | VersionId (N) | Milestone tracking |
ProjectionsTable |
ProjectionId (S) | VersionId (N) | Traffic projections |
SchemaTable |
SchemaId (S) | - | Data schemas |
EventProfileTable |
EventProfileId (S) | - | Event profile templates |
ExceptionTable |
ExceptionId (S) | - | Buffer factor exceptions |
JobDetailsTable |
JobId (S) | - | Background job tracking |
FleetLockTable |
FleetId (S) | - | Concurrent update locking |
BAUServiceDashboard |
ServiceId (S) | - | BAU capacity dashboard |
SQS Queues Created (30+ queues!):
| Queue Name | FIFO? | DLQ? | Used For |
|---|---|---|---|
atomicHOTWSQSQueue.fifo |
β | β | Single-fleet HOTW runs |
validateAndUpdateSPCOSQSQueue.fifo |
β | β | SPCO validation + update |
HotwAuditSQSQueue.fifo |
β | β | HOTW audit runs |
ApolloSQSQueue |
β | β | Apollo capacity sync |
FmcSQSQueue.fifo |
β | β | FMC order fetching |
MilestoneSQSQueue.fifo |
β | β | Milestone updates |
EventFleetCreationQueue |
β | β | Auto-create event plans |
EventTicketCreationQueue |
β | β | Auto-create SIM tickets |
GizmoThrottlingUpdateQueue.fifo |
β | β | Throttling updates |
BAUScaling |
β | β | BAU capacity scaling |
CutOrUpdateTicketSQSQueue.fifo |
β | β | SIM ticket management |
ApproveRejectQueue |
β | β | Exception approvals |
PmetTPMScalingFactorQueue |
β | β | PMET metric refresh |
BulkUploadPmetQueue.fifo |
β | β | Bulk PMET upload |
TotalBAUTPMRefreshQueue |
β | β | Total BAU TPM refresh |
AxonSQSQueue.fifo |
β | β | Axon traffic sync |
AxonS3SQSQueue |
β | β | Axon S3 data processing |
AAAQueue |
β | β | AAA dependency analysis |
GatherEmailQueue |
β | β | Email sending |
FloSQSQueue |
β | β | FLO host operations |
ResilienceDataUpdateQueue |
β | β | Resilience data sync |
TESEventQueue |
β | β | TES event processing |
ServiceMetaUpdateQueue.fifo |
β | β | Service metadata sync |
TicketServiceReadinessQueue |
β | β | Ticket readiness check |
Why FIFO queues?
- FIFO = First In, First Out (order guaranteed)
- Also: exactly-once delivery (no duplicate processing)
- Used for operations where order and uniqueness matter (hardware orders, milestone updates)
Why Dead Letter Queues (DLQ)?
- If a message fails 3 times, it goes to the DLQ
- Team gets alerted
- Can inspect the failed message and retry manually
Section 4 β Lambda Feature Stacks
Fleet/fleetLambdaStack.ts
Creates Lambda functions for Fleet CRUD:
createFleetLambdaβ handler.Fleet.createFleetgetFleetLambdaβ handler.Fleet.getFleetupdateFleetConfigurationLambdaβ handler.Fleet.updateFleetConfiguration- etc.
Each gets read/write access to FleetTable, ServiceTable, EventTable.
Service/serviceLambdaStack.ts
Creates Lambda functions for Service CRUD.
Event/eventLambdaStack.ts
Creates Lambda functions for Event CRUD.
EventPlan/eventPlanLambdaStack.ts
Creates Lambda functions for EventPlan milestone management.
Projections/projectionsLambdaStack.ts
Creates Lambda functions for traffic projections.
HOTW/HOTWLambdaStack.ts β
Creates all HOTW-related Lambda functions and the HOTW API Gateway.
Lambda functions in HOTWLambdaStack:
createOrUpdateHotwRunDetailsLambda β creates/updates HOTW run record
updateHotwRunDetailsLambda β updates run with order type
createHotwExecutionDetailsLambda β stores execution details
createOrUpdateDashboardDetailsLambda β updates dashboard + ASG details
getHotwDashboardDetailsLambda β fetches dashboard data
getAsgDetailsTableLambda β gets ASG config for fleet
getHotwExecutionHistoryLambda β gets run history
getAsgDetailsTableForRunIdLambda β gets ASG details for specific run
storePreferredASGSLmda β saves preferred ASG list
getPreferredASGSLmda β gets preferred ASG list
API Gateway Routes in HOTWLambdaStack:
/hotw/asgsPreference/{EventId}/{FleetId}
PUT β storePreferredASGSLmda
GET β getPreferredASGSLmda
EPICApiStack.ts β Main API Gateway
The biggest API Gateway. Creates ALL the REST routes for the main EPIC API:
/service GET β getAllServices
/service/{ServiceId} GET β getService, PUT β updateService, POST β createService
/fleet POST β createFleet
/fleet/{FleetId}/{EventId} GET β getFleet, PUT β updateFleet
/event GET β getAllEvents, POST β createEvent
/event/{EventId} GET β getEvent
/event/{EventId}/dashboard PUT β getServiceReadinessDashboard
/eventplan/{EventId}/{FleetId} GET/POST/PUT β EventPlan operations
/projection/{...} GET/POST/PUT β Projection operations
/hotw/dashboard/{EventId}/{Purpose} GET β getHotwDashboardDetails
/hotw/runDetail POST/PUT β run detail management
/hotw/executionDetail/... POST/GET β execution detail management
/hotw/dashboardDetail POST β dashboard detail management
/hotw/asgDetail/{FleetId} GET β getAsgDetails
/hotw/asgsPreference/{EventId}/{FleetId} GET/PUT β preferred ASGs
/throttling/{...} GET/POST/PUT β throttling config
/ticket/{...} GET/POST/PUT β SIM ticket management
/calendar/{...} GET/POST/PUT β EPIC calendar
/pmet/{...} GET/POST/PUT β PMET metric links
/schema/{...} GET/POST/PUT β schema management
/exception/{...} GET/POST/PUT β exception management
/bulkjobs/{...} POST β bulk operations
Section 5 β Trigger Stacks (Java Lambdas)
triggersStack.ts β Java Lambda Triggers β
This is the most complex stack. Creates all the Java Lambda functions:
HOTW Triggers:
// Main HOTW handler
HotwHandler (SNS trigger from Milestone SNS)
handleRequest(SNSEvent) β triggers HotwUpscalingHelper.handle()
// SPCO validation scheduler
ScheduleHotwForSpcoValidation (cron: weekly)
handleSQSRequestForUpdateSpco() β queues all services+events
// SPCO executor (SQS consumer)
validateAndUpdateSPCOSQSQueue β HotwHandler.validateAndUpdateSpco()
// Atomic HOTW (API trigger)
atomicHOTWSQSQueue β HotwHandler.atomicHotwExecutor()
// HOTW Audit (SQS consumer)
HotwAuditSQSQueue β HotwHandler.auditHotw()
β HotwUpscalingHelper.printDetailTable()
Apollo Triggers:
// Apollo data sync
ApolloSQSQueue β ApolloHandler
β fetches latest host counts from Apollo API
β updates FleetTable in DynamoDB
β triggers milestone status updates
// Cron: daily Apollo sync
DailyApolloSync (cron: daily) β queues all fleets for Apollo refresh
FMC Triggers:
// FMC order status fetcher
FmcSQSQueue β FmcHandler
β fetches latest order status from FMC API
β updates fulfillment details in MySQL
β triggers notifications
// Cron: periodic FMC sync
FmcTrigger (cron: every few hours)
BAU Scaling:
// BAU capacity scaling
BAUScaling β BAUScalingHandler
β computes BAU host requirements
β places BAU SPCO orders if needed
Milestone Handlers:
// Each milestone has a handler:
MilestoneSQSQueue β WorkflowHandler
β routes to appropriate milestone handler based on milestone ID
β GatherProjectionsMilestoneHandler
β HardwareOrderMilestoneHandler
β HardwareFulfillmentMilestoneHandler
β CommunicateTPMMilestoneHandler
β UpdateThrottlingMilestoneHandler
Section 6 β Other Important Stacks
apolloStack.ts
- ApolloExecutionHandler β fetches Apollo data per fleet
- ApolloTriggerHandler β scheduler for Apollo syncs
fmcStack.ts
- FmcHandler β fetches FMC order status
- FmcTriggerHandler β periodic FMC sync
milestoneWorkflowStack.ts
- WorkflowHandler β processes milestone SQS messages
- Routes to specific milestone handlers
bauScalingStack.ts
- BAUScalingHandler β processes BAU capacity needs
- BAUScalingTriggerLambda β scheduled trigger
scalingPlannerStack.ts
- ScalingPlannerHandler β manages EAP enrollment
- Creates capacity overrides in ScalingPlanner
throttlingStack.ts
- ThrottlingExecutor β applies Gizmo throttling configs
- Processes throttling SQS messages
simStack.ts
- SIM ticket creation and management
- CutOrUpdateTicketSQSQueue β SIM Lambda
axonStack.ts
- Axon traffic data ingestion
- AxonS3Handler β processes Axon data from S3
- AxonDependencyHandler β builds service dependency graph
pmetStack.ts
- PMET metric link management
- APIPMETHandler β fetches PMET data
resilienceStack.ts
- ResilienceDataHandler β collects resilience scores
- ResilienceGatherServicesLambda β gets service list
consensusStack.ts
- Consensus (approval) workflow
- ApprovalsHandler β processes approvals/rejections
athenaS3BucketStack.ts
- S3 bucket for Athena queries
- Used for analytics and reporting
fmbiS3BucketStack.ts
- S3 bucket for FMBI (Fleet Management Business Intelligence)
- Stores hardware order analytics data
Section 7 β Email Templates
Located in lib/EmailTemplates/, these are HTML email templates:
| Template | When Sent |
|---|---|
hardwareOrderDetails.html.ts |
After HOTW places hardware orders |
atomicHardwareOrderDetails.html.ts |
After single-fleet HOTW run |
gatherProjections.html.ts |
Reminder to submit traffic projections |
peakReadinessReport.html.ts |
Weekly readiness status for service owners |
peakReadinessReportForLeader.html.ts |
Leader-level readiness summary |
bauReminderForService.html.ts |
BAU capacity reminder |
bauFailureReportForEpic.html.ts |
BAU automation failure alert |
createService.html.ts |
New service registered notification |
descalingSpcoDetails.html.ts |
Descale order placed notification |
gatherDescaleProjections.html.ts |
Descale projection request |
update.html.ts |
General update notification |
bulkUploadPMETLinksReport.html.ts |
Bulk PMET upload results |
allSuccessfulDescaleSpcoReport.html.ts |
Descale complete notification |
Section 8 β CloudWatch Monitoring
Monitors/Constants/alarmConstants.ts
Defines alarm names and ARNs for all critical metrics.
Monitors/Constants/metricConstants.ts
Defines custom CloudWatch metric names:
export const HOTW_SUCCESS_COUNT = "HOTWSuccessCount";
export const HOTW_FAILURE_COUNT = "HOTWFailureCount";
export const SPCO_ORDERS_PLACED = "SPCOOrdersPlaced";
export const MILESTONE_UPDATES = "MilestoneUpdates";
Monitors/monitorsShared.ts
Creates CloudWatch alarms:
// Lambda error rate alarm
new Alarm(this, 'HotwErrorAlarm', {
metric: hotwLambda.metricErrors(),
threshold: 5,
evaluationPeriods: 1,
alarmDescription: 'HOTW Lambda errors exceeded threshold',
});
Section 9 β ARC (Availability and Recovery Coordination)
// lib/ARC/arcConfig.ts
// lib/ARC/arcStack.ts
// ARC routing controls allow EPIC to redirect traffic between regions
// during outages or planned maintenance
Section 10 β Deployment Process
Developer commits code
β
Brazil build system builds the package
β
Brazil Pipelines (CDK Pipelines) deploys:
β
Stage 1: BETA environment
β (automated tests pass)
Stage 2: GAMMA environment
β (manual approval)
Stage 3: PROD environment (IAD, PDX, DUB)
CDK commands:
# Synthesize (generate CloudFormation templates)
cdk synth
# Deploy a specific stack
cdk deploy HOTWStack
# Compare changes before deploying
cdk diff
# List all stacks
cdk ls
Section 11 β How Resources Connect
app.ts creates all stacks
β
βββ VpcStack creates VPC
β βββ All Lambda stacks receive vpc: vpcStack.vpc
β
βββ ApiStack creates DynamoDB tables and SQS queues
β βββ All feature stacks receive tables and queues as props
β
βββ HOTWLambdaStack creates HOTW Lambdas
β βββ receives: fleetTable, serviceTable, eventTable, eventPlanTable
β βββ creates: HOTW API Gateway + Lambda functions
β
βββ TriggersStack creates Java Lambda Triggers
β βββ receives: SNS topics, SQS queues
β βββ creates: HotwHandler, ApolloHandler, etc.
β
βββ EPICApiStack creates main API Gateway
βββ receives: all Lambda functions from feature stacks
βββ creates: REST API with all routes
Section 12 β Security Model
How Lambdas are secured:
- VPC β Lambdas run in private subnets, no direct internet access
- IAM roles β Each Lambda has minimum required permissions only
- Security groups β Firewall rules between services
- API Gateway IAM Auth β Frontend must assume a role to call APIs
- Harmony account principal β Frontend accounts are explicitly trusted
IAM permission example:
// Only grant what's needed
props.fleetTable.grantReadData(getFleetLambda); // read only
props.fleetTable.grantReadWriteData(createFleetLambda); // read+write
// NOT: grantFullAccess() β too permissive!
Section 13 β Key CDK Concepts Summary
| Concept | What It Is | EPIC Example |
|---|---|---|
Stack |
Group of AWS resources | HOTWLambdaStack |
Construct |
Single AWS resource | new Function(...) |
Props |
Input to a stack | HOTWLambdaStackProps |
L1 Construct |
Raw CloudFormation | CfnTable |
L2 Construct |
CDK wrapped resource | Table, Function, Queue |
L3 Construct |
High-level pattern | BrazilPackage.fromString(...) |
Token |
Lazy reference | props.table.tableName |
Aspects |
Apply rules to all | Tags.of(app).add(...) |
Synthesize |
CDK β CloudFormation | cdk synth |
Deploy |
Apply to AWS | cdk deploy |