What youβll learn: Every AWS service used in EPIC β what it is, why EPIC uses it, and real examples from the codebase.
Assumes: No prior AWS knowledge needed.
Section 1 β AWS Lambda (Serverless Functions)
What it is
Lambda is βserverlessβ computing. You write a function, upload it to AWS, and it runs when triggered. You donβt manage servers.
Why EPIC uses it
- Cost efficient: you only pay when code is running
- Auto-scales: AWS runs as many copies as needed automatically
- Simple deployment: just upload new code
- Supports multiple languages: EPIC uses Node.js (backend) and Java (triggers)
How it works
Event/Trigger
β
AWS spins up Lambda container
β
Your code runs
β
Returns response
β
Container stays warm for ~15 min, then shuts down
In EPIC
// CDK creates a Lambda function
new Function(this, 'HOTWDashboardLambda', {
handler: 'handler.HOTW.getHotwDashboardDetails',
// handler = file.class.method
runtime: Runtime.NODEJS_16_X,
timeout: Duration.seconds(300), // max 5 minutes to run
memorySize: 512, // 512 MB RAM allocated
});
Triggers used in EPIC:
- API Gateway β Lambda (REST API calls from frontend)
- SQS β Lambda (queue messages trigger Java handlers)
- SNS β Lambda (pub/sub notifications)
- DynamoDB Streams β Lambda (DB changes trigger updates)
- CloudWatch Events/Cron β Lambda (scheduled jobs)
Lambda limits:
- Max execution time: 15 minutes
- Max memory: 10 GB
- EPIC uses 5 minutes max (most jobs are under 2 minutes)
Section 2 β Amazon DynamoDB (NoSQL Database)
What it is
DynamoDB is a fully managed NoSQL database. Stores data as JSON-like items. Fast at any scale.
Why EPIC uses it
- Primary data store: Fleet, Service, Event, EventPlan are all in DynamoDB
- Automatic scaling: handles any amount of data
- DynamoDB Streams: changes trigger Lambda functions automatically
- Versioning: storing multiple versions of same item is easy (different SortKey)
How it works
DynamoDB Table = collection of Items
Item = JSON document (like a row in SQL but flexible structure)
Primary Key = what uniquely identifies each item
Two key models:
- Single-key table:
{ FleetId: "RIPE-NA" }β single partition key - Composite key table (EPICβs pattern):
{ FleetId: "RIPE-NA", VersionId: 5 }β partition key + sort key
In EPIC
// Write to DynamoDB
await dynamoDB.put({
TableName: 'FleetTable',
Item: {
FleetId: 'RIPE-NA',
VersionId: 5,
LatestVersionId: 5,
ApolloName: 'RIPE_NA_PROD',
// ... all fleet data
}
});
// Read from DynamoDB
const result = await dynamoDB.get({
TableName: 'FleetTable',
Key: { FleetId: 'RIPE-NA', VersionId: 5 }
});
DynamoDB Streams: When a FleetTable item is updated, DynamoDB automatically sends the change to a stream, which triggers a Java Lambda to process it.
// CDK: connect DynamoDB stream to Lambda
new DynamoEventSource(fleetTable, {
startingPosition: StartingPosition.TRIM_HORIZON,
batchSize: 10
})
Section 3 β Amazon SQS (Message Queues)
What it is
SQS (Simple Queue Service) is a message queue. Think of it as a mailbox. One service puts a message in, another service picks it up later.
Why EPIC uses it
- Decoupling: HOTW puts work in a queue; workers process at their own pace
- Retry logic: Failed messages stay in queue and retry automatically
- Scale: Many workers can process from same queue in parallel
- FIFO queues: Guarantee order + exactly-once processing
How it works
Producer β puts message in queue β Consumer reads message β deletes after processing
(or message becomes visible again for retry)
Two types used in EPIC:
- Standard queue: At-least-once delivery, may be out of order. Used for Apollo, FLO.
- FIFO queue (.fifo suffix): Exactly-once, ordered. Used for HOTW, FMC, milestones.
In EPIC (Java)
// Sending a message to SQS
AmazonSQS sqs = AmazonSQSClientBuilder.defaultClient();
String queueUrl = sqs.getQueueUrl("atomicHOTWSQSQueue.fifo").getQueueUrl();
Map<String, MessageAttributeValue> attributes = new HashMap<>();
attributes.put("FleetId", new MessageAttributeValue()
.withDataType("String").withStringValue("RIPE-NA"));
attributes.put("EventId", new MessageAttributeValue()
.withDataType("String").withStringValue("PD2024"));
SendMessageRequest request = new SendMessageRequest()
.withQueueUrl(queueUrl)
.withMessageBody("Execute Atomic HOTW")
.withMessageGroupId("default") // required for FIFO
.withMessageDeduplicationId("RIPE-NA-PD2024") // prevents duplicates
.withMessageAttributes(attributes);
sqs.sendMessage(request);
// Receiving messages (Lambda is triggered automatically)
public void atomicHotwExecutor(SQSEvent sqsEvent, Context context) {
for (SQSEvent.SQSMessage message : sqsEvent.getRecords()) {
String fleetId = message.getMessageAttributes().get("FleetId").getStringValue();
// process...
// Lambda automatically deletes message after function returns successfully
}
}
Dead Letter Queue (DLQ):
Message fails β retries 3 times β goes to DLQ
β Team gets alerted
β Can inspect why it failed
β Can reprocess manually
EPICβs SQS Pattern
Weekly Cron
β
HotwHandler.handleSQSRequestForUpdateSpco()
creates 1 message per fleet Γ event
β puts in validateAndUpdateSPCOSQSQueue
HotwHandler.validateAndUpdateSpco()
processes each message
runs HOTW for that fleet
Why use a queue instead of calling directly?
- 500 fleets Γ 3 events = 1500 HOTW runs
- Canβt do all 1500 synchronously (would timeout)
- Queue spreads work across time
- If one fleet fails, others still process
Section 4 β Amazon SNS (Pub/Sub Notifications)
What it is
SNS (Simple Notification Service) is a pub/sub (publish/subscribe) service. One publisher sends a message; multiple subscribers receive it.
Why EPIC uses it
- MilestoneSNSTopic: When HOTW completes, publishes result. Multiple Lambda functions subscribe and react.
- Notification emails: SNS triggers email notifications to service owners
- Cross-service events: Decoupled communication between components
How it works
Publisher β sends to SNS Topic
β fans out to all subscribers
βββββββββββββΌβββββββββββββββββ
Lambda A Lambda B Email/SES
(update (trigger (send email
milestone) Apollo refresh) notification)
In EPIC (Java)
// Publishing to SNS (from HotwUpscalingHelper)
AmazonSNS sns = AmazonSNSClientBuilder.defaultClient();
Map<String, MessageAttributeValue> attributes = new HashMap<>();
attributes.put("FleetId", new MessageAttributeValue()
.withDataType("String").withStringValue(fleetId));
attributes.put("EventId", new MessageAttributeValue()
.withDataType("String").withStringValue(eventId));
PublishRequest publishRequest = new PublishRequest()
.withTopicArn(snsTopicArn)
.withMessage(JsonUtil.toJson(hotwResult))
.withMessageAttributes(attributes);
sns.publish(publishRequest);
SNS Topics in EPIC:
MilestoneSNSTopicβ HOTW results trigger milestone updatesTicketingReadinessSNSTopicβ ticket status changesnotificationSNSβ general EPIC notifications β email via SES
Section 5 β Amazon API Gateway
What it is
API Gateway is a managed service that creates HTTP REST APIs. It receives HTTP requests from the frontend and routes them to Lambda functions.
Why EPIC uses it
- Frontend canβt call Lambda directly (security)
- API Gateway handles: authentication, rate limiting, CORS, logging
- Multiple Lambda functions can share one API Gateway
How it works
Browser β HTTPS request β API Gateway β Lambda β Response β Browser
Authentication: EPIC uses IAM authentication. The frontend must assume an IAM role before making API calls. This role is trusted by the API Gateway.
In EPIC
// CDK: create API Gateway and add route
const api = new RestApi(this, 'HOTWApi', {
defaultCorsPreflightOptions: {
allowOrigins: Cors.ALL_ORIGINS, // allow any origin (CORS)
allowMethods: Cors.ALL_METHODS
}
});
// Add route: POST /hotw/runDetail
const hotw = api.root.addResource('hotw');
const runDetail = hotw.addResource('runDetail');
runDetail.addMethod(
'POST',
new LambdaIntegration(createOrUpdateHotwRunDetailsLambda),
{ authorizationType: AuthorizationType.IAM } // requires IAM auth
);
In frontend (calling the API):
// backend_api.js β frontend calls API Gateway
const apigClient = await getApigClient(
AWS_CONFIG_CONSTANTS.BACKEND_API_URL, // API Gateway URL
AWS_CONFIG_CONSTANTS.BACKEND_API_HARMONY_ROLE // IAM role to assume
);
return apigClient.invokeApi({}, '/hotw/runDetail', 'POST', {}, body);
Section 6 β Amazon S3 (File Storage)
What it is
S3 (Simple Storage Service) stores files (objects) in buckets. Can store any file type.
Why EPIC uses it
- Axon traffic data: Axon writes traffic metrics to S3; EPIC reads them
- FMBI data: Hardware order analytics stored in S3
- Athena queries: SQL queries run against data stored in S3
In EPIC
// CDK: create S3 bucket
const axonDataBucket = new Bucket(this, 'AxonDataBucket', {
bucketName: 'epic-axon-traffic-data',
versioned: true,
encryption: BucketEncryption.S3_MANAGED
});
// Java: read file from S3
AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();
S3Object object = s3Client.getObject("epic-axon-traffic-data", "2024/06/01/traffic.json");
String content = IOUtils.toString(object.getObjectContent());
AxonTrafficDataModel data = JsonUtil.parseJson(content, AxonTrafficDataModel.class);
Section 7 β Amazon VPC (Virtual Private Cloud)
What it is
VPC creates a private, isolated section of AWS cloud. Resources inside a VPC communicate privately; traffic doesnβt go through the public internet.
Why EPIC uses it
- Security: Lambda functions and databases are hidden from the internet
- Network isolation: EPIC resources can only talk to each other (and approved external services)
- Compliance: Required for handling sensitive capacity planning data
In EPIC
VPC: 10.0.0.0/16
βββ Public Subnets (10.0.1.0/24, 10.0.2.0/24)
β βββ NAT Gateway (for Lambda to call external APIs)
βββ Private Subnets (10.0.3.0/24, 10.0.4.0/24)
βββ Lambda functions (EPICBackend, EPICBackendTriggers)
βββ RDS/Aurora MySQL
All Lambda functions run in private subnets. They use the NAT gateway to call external Amazon APIs (Apollo, FMC, SIM, etc.).
Section 8 β Amazon RDS Aurora (Relational Database)
What it is
RDS Aurora is a managed MySQL/PostgreSQL database. EPIC uses MySQL-compatible Aurora.
Why EPIC uses it
- HOTW data: Complex relational queries (JOINs, aggregations) needed for HOTW reporting
- Transaction support: Multiple inserts must succeed together (or all fail)
- Familiar SQL: Developers know SQL; easier than NoSQL for operational data
In EPIC
// Node.js: MySQL query with transaction
const auroraMysqlClient = new AuroraMysqlClient();
try {
await auroraMysqlClient.startTransaction(); // BEGIN
await hotwOps.insertHotwExecutionDetails( // INSERT execution
tableName, fleetIndexId, runId, ...);
await hotwOps.insertCapacityOverrideDetails( // INSERT overrides
tableName, fleetIndexId, runId, details);
await auroraMysqlClient.commitTransaction(); // COMMIT (saves both)
} catch (err) {
await auroraMysqlClient.rollbackTransaction(); // ROLLBACK (undoes both)
}
Connection: Lambda connects to Aurora via RDS Proxy (connection pooler) inside the VPC.
Section 9 β Amazon CloudWatch (Monitoring)
What it is
CloudWatch collects logs, metrics, and creates alarms for AWS resources.
Why EPIC uses it
- Lambda logs: Every
console.log()orlogger.log()goes to CloudWatch Logs - Alarms: Alert team when error rate is too high
- Custom metrics: Track HOTW success/failure counts
- Scheduled events: CloudWatch Events trigger Lambda on cron schedules
In EPIC
// CDK: Create CloudWatch alarm
new Alarm(this, 'HotwErrorAlarm', {
metric: hotwLambda.metricErrors(),
threshold: 5,
evaluationPeriods: 1,
comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
alarmDescription: 'More than 5 HOTW errors in 1 period',
});
Cron schedule (triggers HOTW weekly):
// CDK: Schedule Lambda to run every week
new Rule(this, 'WeeklyHotwTrigger', {
schedule: Schedule.cron({
weekDay: 'MON',
hour: '09',
minute: '00'
}),
targets: [new LambdaFunction(hotwTriggerLambda)]
});
// Java: Log to CloudWatch (via Lambda logger)
logger.log("HOTW completed for fleet: " + fleetId + " with status: " + status);
// This appears in CloudWatch Logs > /aws/lambda/HotwLambda
Section 10 β AWS IAM (Identity and Access Management)
What it is
IAM controls who/what can access AWS resources. Uses Roles and Policies.
Why EPIC uses it
- Lambda execution roles: Each Lambda has a role with specific permissions
- API Gateway authorization: Frontend must assume an IAM role to call APIs
- Cross-account access: Frontend (Harmony account) calls EPICβs API Gateway
In EPIC
// CDK: Grant Lambda permission to read DynamoDB
fleetTable.grantReadData(getFleetLambda);
// β Creates IAM policy: Allow dynamodb:GetItem, dynamodb:Query on FleetTable
// Grant Lambda permission to publish to SNS
lambdaFunction.addToRolePolicy(new PolicyStatement({
effect: Effect.ALLOW,
actions: ["sns:Publish"],
resources: [snsTopicArn]
}));
// Allow Harmony frontend accounts to call API Gateway
const accessRole = new Role(this, 'HarmonyAPIAccessRole', {
assumedBy: new CompositePrincipal(
new AccountPrincipal(HARMONY_ACCOUNT_BETA),
new AccountPrincipal(HARMONY_ACCOUNT_PROD_IAD)
)
});
Section 11 β Amazon SES (Simple Email Service)
What it is
SES sends emails programmatically. EPIC uses it for automated notifications.
Why EPIC uses it
- Hardware order summary emails after HOTW
- Projection gathering reminder emails
- Peak readiness report emails to leaders
In EPIC (triggered via SNS)
// EmailNotification.js
await ses.sendEmail({
Source: 'epic-team@amazon.com',
Destination: { ToAddresses: ['service-owner@amazon.com'] },
Message: {
Subject: { Data: 'HOTW Hardware Order Placed - RIPE-NA' },
Body: {
Html: { Data: hardwareOrderEmailHtml }
}
}
}).promise();
Section 12 β AWS Step Functions (Workflow Orchestration)
What it is
Step Functions create visual state machines β workflows where each step is a Lambda function.
Why EPIC uses it
- Complex multi-step workflows (onboarding, approval workflows)
CustomInputSFβ custom step function inputs for special workflows
Section 13 β Amazon Athena (SQL on S3)
What it is
Athena runs SQL queries directly on data stored in S3. No database needed.
Why EPIC uses it
- Historical analytics on Axon traffic data
- FMBI (hardware order analytics) queries
- ad-hoc data exploration by EPIC team
Section 14 β AWS CDK (Cloud Development Kit)
What it is
CDK is a framework for defining AWS infrastructure as code.
Why EPIC uses it
- Version control for infrastructure (changes are reviewed like code)
- Repeatable deployments (beta, gamma, prod are identical)
- TypeScript type safety catches mistakes
Key CDK Classes Used in EPIC
| CDK Class | What it creates |
|---|---|
aws-cdk-lib/aws-lambda.Function |
Lambda function |
aws-cdk-lib/aws-dynamodb.Table |
DynamoDB table |
aws-cdk-lib/aws-sqs.Queue |
SQS queue |
aws-cdk-lib/aws-sns.Topic |
SNS topic |
aws-cdk-lib/aws-apigateway.RestApi |
API Gateway |
aws-cdk-lib/aws-ec2.Vpc |
VPC |
aws-cdk-lib/aws-rds.DatabaseCluster |
RDS Aurora |
aws-cdk-lib/aws-iam.Role |
IAM role |
aws-cdk-lib/aws-cloudwatch.Alarm |
CloudWatch alarm |
aws-cdk-lib/aws-events.Rule |
CloudWatch event rule (cron) |
@amzn/pipelines.DeploymentStack |
Amazonβs internal CDK extension |
Section 15 β Quick Reference: When to Use What
| Need | Use |
|---|---|
| Store structured records (fleet, service, event) | DynamoDB |
| Store operational data with SQL queries (HOTW) | Aurora MySQL |
| Store files, large data | S3 |
| Run code on demand | Lambda |
| HTTP API for frontend | API Gateway |
| Async work queue | SQS |
| One-to-many notifications | SNS |
| Send emails | SES |
| Monitor and alert | CloudWatch |
| Access control | IAM |
| Private networking | VPC |
| Scheduled jobs | CloudWatch Events (cron) |
| Complex multi-step workflows | Step Functions |
| SQL queries on S3 | Athena |
| Infrastructure as code | CDK |