What you’ll learn: Every AWS service used in EPIC β€” what it is, why EPIC uses it, and real examples from the codebase.
Assumes: No prior AWS knowledge needed.


Section 1 β€” AWS Lambda (Serverless Functions)

What it is

Lambda is β€œserverless” computing. You write a function, upload it to AWS, and it runs when triggered. You don’t manage servers.

Why EPIC uses it

  • Cost efficient: you only pay when code is running
  • Auto-scales: AWS runs as many copies as needed automatically
  • Simple deployment: just upload new code
  • Supports multiple languages: EPIC uses Node.js (backend) and Java (triggers)

How it works

Event/Trigger
    ↓
AWS spins up Lambda container
    ↓
Your code runs
    ↓
Returns response
    ↓
Container stays warm for ~15 min, then shuts down

In EPIC

// CDK creates a Lambda function
new Function(this, 'HOTWDashboardLambda', {
    handler: 'handler.HOTW.getHotwDashboardDetails',
    // handler = file.class.method
    runtime: Runtime.NODEJS_16_X,
    timeout: Duration.seconds(300),  // max 5 minutes to run
    memorySize: 512,                  // 512 MB RAM allocated
});

Triggers used in EPIC:

  • API Gateway β†’ Lambda (REST API calls from frontend)
  • SQS β†’ Lambda (queue messages trigger Java handlers)
  • SNS β†’ Lambda (pub/sub notifications)
  • DynamoDB Streams β†’ Lambda (DB changes trigger updates)
  • CloudWatch Events/Cron β†’ Lambda (scheduled jobs)

Lambda limits:

  • Max execution time: 15 minutes
  • Max memory: 10 GB
  • EPIC uses 5 minutes max (most jobs are under 2 minutes)

Section 2 β€” Amazon DynamoDB (NoSQL Database)

What it is

DynamoDB is a fully managed NoSQL database. Stores data as JSON-like items. Fast at any scale.

Why EPIC uses it

  • Primary data store: Fleet, Service, Event, EventPlan are all in DynamoDB
  • Automatic scaling: handles any amount of data
  • DynamoDB Streams: changes trigger Lambda functions automatically
  • Versioning: storing multiple versions of same item is easy (different SortKey)

How it works

DynamoDB Table = collection of Items
Item = JSON document (like a row in SQL but flexible structure)
Primary Key = what uniquely identifies each item

Two key models:

  1. Single-key table: { FleetId: "RIPE-NA" } β†’ single partition key
  2. Composite key table (EPIC’s pattern): { FleetId: "RIPE-NA", VersionId: 5 } β†’ partition key + sort key

In EPIC

// Write to DynamoDB
await dynamoDB.put({
    TableName: 'FleetTable',
    Item: {
        FleetId: 'RIPE-NA',
        VersionId: 5,
        LatestVersionId: 5,
        ApolloName: 'RIPE_NA_PROD',
        // ... all fleet data
    }
});

// Read from DynamoDB
const result = await dynamoDB.get({
    TableName: 'FleetTable',
    Key: { FleetId: 'RIPE-NA', VersionId: 5 }
});

DynamoDB Streams: When a FleetTable item is updated, DynamoDB automatically sends the change to a stream, which triggers a Java Lambda to process it.

// CDK: connect DynamoDB stream to Lambda
new DynamoEventSource(fleetTable, {
    startingPosition: StartingPosition.TRIM_HORIZON,
    batchSize: 10
})

Section 3 β€” Amazon SQS (Message Queues)

What it is

SQS (Simple Queue Service) is a message queue. Think of it as a mailbox. One service puts a message in, another service picks it up later.

Why EPIC uses it

  • Decoupling: HOTW puts work in a queue; workers process at their own pace
  • Retry logic: Failed messages stay in queue and retry automatically
  • Scale: Many workers can process from same queue in parallel
  • FIFO queues: Guarantee order + exactly-once processing

How it works

Producer β†’ puts message in queue β†’ Consumer reads message β†’ deletes after processing
                                                         (or message becomes visible again for retry)

Two types used in EPIC:

  1. Standard queue: At-least-once delivery, may be out of order. Used for Apollo, FLO.
  2. FIFO queue (.fifo suffix): Exactly-once, ordered. Used for HOTW, FMC, milestones.

In EPIC (Java)

// Sending a message to SQS
AmazonSQS sqs = AmazonSQSClientBuilder.defaultClient();
String queueUrl = sqs.getQueueUrl("atomicHOTWSQSQueue.fifo").getQueueUrl();

Map<String, MessageAttributeValue> attributes = new HashMap<>();
attributes.put("FleetId", new MessageAttributeValue()
    .withDataType("String").withStringValue("RIPE-NA"));
attributes.put("EventId", new MessageAttributeValue()
    .withDataType("String").withStringValue("PD2024"));

SendMessageRequest request = new SendMessageRequest()
    .withQueueUrl(queueUrl)
    .withMessageBody("Execute Atomic HOTW")
    .withMessageGroupId("default")           // required for FIFO
    .withMessageDeduplicationId("RIPE-NA-PD2024")  // prevents duplicates
    .withMessageAttributes(attributes);

sqs.sendMessage(request);
// Receiving messages (Lambda is triggered automatically)
public void atomicHotwExecutor(SQSEvent sqsEvent, Context context) {
    for (SQSEvent.SQSMessage message : sqsEvent.getRecords()) {
        String fleetId = message.getMessageAttributes().get("FleetId").getStringValue();
        // process...
        // Lambda automatically deletes message after function returns successfully
    }
}

Dead Letter Queue (DLQ):

Message fails β†’ retries 3 times β†’ goes to DLQ
    β†’ Team gets alerted
    β†’ Can inspect why it failed
    β†’ Can reprocess manually

EPIC’s SQS Pattern

Weekly Cron
    ↓
HotwHandler.handleSQSRequestForUpdateSpco()
    creates 1 message per fleet Γ— event
    ↓ puts in validateAndUpdateSPCOSQSQueue
HotwHandler.validateAndUpdateSpco()
    processes each message
    runs HOTW for that fleet

Why use a queue instead of calling directly?

  • 500 fleets Γ— 3 events = 1500 HOTW runs
  • Can’t do all 1500 synchronously (would timeout)
  • Queue spreads work across time
  • If one fleet fails, others still process

Section 4 β€” Amazon SNS (Pub/Sub Notifications)

What it is

SNS (Simple Notification Service) is a pub/sub (publish/subscribe) service. One publisher sends a message; multiple subscribers receive it.

Why EPIC uses it

  • MilestoneSNSTopic: When HOTW completes, publishes result. Multiple Lambda functions subscribe and react.
  • Notification emails: SNS triggers email notifications to service owners
  • Cross-service events: Decoupled communication between components

How it works

Publisher β†’ sends to SNS Topic
                     ↓ fans out to all subscribers
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    Lambda A     Lambda B         Email/SES
    (update      (trigger          (send email
    milestone)   Apollo refresh)    notification)

In EPIC (Java)

// Publishing to SNS (from HotwUpscalingHelper)
AmazonSNS sns = AmazonSNSClientBuilder.defaultClient();

Map<String, MessageAttributeValue> attributes = new HashMap<>();
attributes.put("FleetId", new MessageAttributeValue()
    .withDataType("String").withStringValue(fleetId));
attributes.put("EventId", new MessageAttributeValue()
    .withDataType("String").withStringValue(eventId));

PublishRequest publishRequest = new PublishRequest()
    .withTopicArn(snsTopicArn)
    .withMessage(JsonUtil.toJson(hotwResult))
    .withMessageAttributes(attributes);

sns.publish(publishRequest);

SNS Topics in EPIC:

  • MilestoneSNSTopic β€” HOTW results trigger milestone updates
  • TicketingReadinessSNSTopic β€” ticket status changes
  • notificationSNS β€” general EPIC notifications β†’ email via SES

Section 5 β€” Amazon API Gateway

What it is

API Gateway is a managed service that creates HTTP REST APIs. It receives HTTP requests from the frontend and routes them to Lambda functions.

Why EPIC uses it

  • Frontend can’t call Lambda directly (security)
  • API Gateway handles: authentication, rate limiting, CORS, logging
  • Multiple Lambda functions can share one API Gateway

How it works

Browser β†’ HTTPS request β†’ API Gateway β†’ Lambda β†’ Response β†’ Browser

Authentication: EPIC uses IAM authentication. The frontend must assume an IAM role before making API calls. This role is trusted by the API Gateway.

In EPIC

// CDK: create API Gateway and add route
const api = new RestApi(this, 'HOTWApi', {
    defaultCorsPreflightOptions: {
        allowOrigins: Cors.ALL_ORIGINS,  // allow any origin (CORS)
        allowMethods: Cors.ALL_METHODS
    }
});

// Add route: POST /hotw/runDetail
const hotw = api.root.addResource('hotw');
const runDetail = hotw.addResource('runDetail');
runDetail.addMethod(
    'POST',
    new LambdaIntegration(createOrUpdateHotwRunDetailsLambda),
    { authorizationType: AuthorizationType.IAM }  // requires IAM auth
);

In frontend (calling the API):

// backend_api.js β€” frontend calls API Gateway
const apigClient = await getApigClient(
    AWS_CONFIG_CONSTANTS.BACKEND_API_URL,          // API Gateway URL
    AWS_CONFIG_CONSTANTS.BACKEND_API_HARMONY_ROLE  // IAM role to assume
);
return apigClient.invokeApi({}, '/hotw/runDetail', 'POST', {}, body);

Section 6 β€” Amazon S3 (File Storage)

What it is

S3 (Simple Storage Service) stores files (objects) in buckets. Can store any file type.

Why EPIC uses it

  • Axon traffic data: Axon writes traffic metrics to S3; EPIC reads them
  • FMBI data: Hardware order analytics stored in S3
  • Athena queries: SQL queries run against data stored in S3

In EPIC

// CDK: create S3 bucket
const axonDataBucket = new Bucket(this, 'AxonDataBucket', {
    bucketName: 'epic-axon-traffic-data',
    versioned: true,
    encryption: BucketEncryption.S3_MANAGED
});
// Java: read file from S3
AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();
S3Object object = s3Client.getObject("epic-axon-traffic-data", "2024/06/01/traffic.json");
String content = IOUtils.toString(object.getObjectContent());
AxonTrafficDataModel data = JsonUtil.parseJson(content, AxonTrafficDataModel.class);

Section 7 β€” Amazon VPC (Virtual Private Cloud)

What it is

VPC creates a private, isolated section of AWS cloud. Resources inside a VPC communicate privately; traffic doesn’t go through the public internet.

Why EPIC uses it

  • Security: Lambda functions and databases are hidden from the internet
  • Network isolation: EPIC resources can only talk to each other (and approved external services)
  • Compliance: Required for handling sensitive capacity planning data

In EPIC

VPC: 10.0.0.0/16
β”œβ”€β”€ Public Subnets (10.0.1.0/24, 10.0.2.0/24)
β”‚   └── NAT Gateway (for Lambda to call external APIs)
└── Private Subnets (10.0.3.0/24, 10.0.4.0/24)
    β”œβ”€β”€ Lambda functions (EPICBackend, EPICBackendTriggers)
    └── RDS/Aurora MySQL

All Lambda functions run in private subnets. They use the NAT gateway to call external Amazon APIs (Apollo, FMC, SIM, etc.).


Section 8 β€” Amazon RDS Aurora (Relational Database)

What it is

RDS Aurora is a managed MySQL/PostgreSQL database. EPIC uses MySQL-compatible Aurora.

Why EPIC uses it

  • HOTW data: Complex relational queries (JOINs, aggregations) needed for HOTW reporting
  • Transaction support: Multiple inserts must succeed together (or all fail)
  • Familiar SQL: Developers know SQL; easier than NoSQL for operational data

In EPIC

// Node.js: MySQL query with transaction
const auroraMysqlClient = new AuroraMysqlClient();

try {
    await auroraMysqlClient.startTransaction();   // BEGIN
    
    await hotwOps.insertHotwExecutionDetails(     // INSERT execution
        tableName, fleetIndexId, runId, ...);
    
    await hotwOps.insertCapacityOverrideDetails(  // INSERT overrides
        tableName, fleetIndexId, runId, details);
    
    await auroraMysqlClient.commitTransaction();   // COMMIT (saves both)
} catch (err) {
    await auroraMysqlClient.rollbackTransaction(); // ROLLBACK (undoes both)
}

Connection: Lambda connects to Aurora via RDS Proxy (connection pooler) inside the VPC.


Section 9 β€” Amazon CloudWatch (Monitoring)

What it is

CloudWatch collects logs, metrics, and creates alarms for AWS resources.

Why EPIC uses it

  • Lambda logs: Every console.log() or logger.log() goes to CloudWatch Logs
  • Alarms: Alert team when error rate is too high
  • Custom metrics: Track HOTW success/failure counts
  • Scheduled events: CloudWatch Events trigger Lambda on cron schedules

In EPIC

// CDK: Create CloudWatch alarm
new Alarm(this, 'HotwErrorAlarm', {
    metric: hotwLambda.metricErrors(),
    threshold: 5,
    evaluationPeriods: 1,
    comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
    alarmDescription: 'More than 5 HOTW errors in 1 period',
});

Cron schedule (triggers HOTW weekly):

// CDK: Schedule Lambda to run every week
new Rule(this, 'WeeklyHotwTrigger', {
    schedule: Schedule.cron({ 
        weekDay: 'MON',
        hour: '09',
        minute: '00' 
    }),
    targets: [new LambdaFunction(hotwTriggerLambda)]
});
// Java: Log to CloudWatch (via Lambda logger)
logger.log("HOTW completed for fleet: " + fleetId + " with status: " + status);
// This appears in CloudWatch Logs > /aws/lambda/HotwLambda

Section 10 β€” AWS IAM (Identity and Access Management)

What it is

IAM controls who/what can access AWS resources. Uses Roles and Policies.

Why EPIC uses it

  • Lambda execution roles: Each Lambda has a role with specific permissions
  • API Gateway authorization: Frontend must assume an IAM role to call APIs
  • Cross-account access: Frontend (Harmony account) calls EPIC’s API Gateway

In EPIC

// CDK: Grant Lambda permission to read DynamoDB
fleetTable.grantReadData(getFleetLambda);  
// β†’ Creates IAM policy: Allow dynamodb:GetItem, dynamodb:Query on FleetTable

// Grant Lambda permission to publish to SNS
lambdaFunction.addToRolePolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["sns:Publish"],
    resources: [snsTopicArn]
}));

// Allow Harmony frontend accounts to call API Gateway
const accessRole = new Role(this, 'HarmonyAPIAccessRole', {
    assumedBy: new CompositePrincipal(
        new AccountPrincipal(HARMONY_ACCOUNT_BETA),
        new AccountPrincipal(HARMONY_ACCOUNT_PROD_IAD)
    )
});

Section 11 β€” Amazon SES (Simple Email Service)

What it is

SES sends emails programmatically. EPIC uses it for automated notifications.

Why EPIC uses it

  • Hardware order summary emails after HOTW
  • Projection gathering reminder emails
  • Peak readiness report emails to leaders

In EPIC (triggered via SNS)

// EmailNotification.js
await ses.sendEmail({
    Source: 'epic-team@amazon.com',
    Destination: { ToAddresses: ['service-owner@amazon.com'] },
    Message: {
        Subject: { Data: 'HOTW Hardware Order Placed - RIPE-NA' },
        Body: { 
            Html: { Data: hardwareOrderEmailHtml }
        }
    }
}).promise();

Section 12 β€” AWS Step Functions (Workflow Orchestration)

What it is

Step Functions create visual state machines β€” workflows where each step is a Lambda function.

Why EPIC uses it

  • Complex multi-step workflows (onboarding, approval workflows)
  • CustomInputSF β€” custom step function inputs for special workflows

Section 13 β€” Amazon Athena (SQL on S3)

What it is

Athena runs SQL queries directly on data stored in S3. No database needed.

Why EPIC uses it

  • Historical analytics on Axon traffic data
  • FMBI (hardware order analytics) queries
  • ad-hoc data exploration by EPIC team

Section 14 β€” AWS CDK (Cloud Development Kit)

What it is

CDK is a framework for defining AWS infrastructure as code.

Why EPIC uses it

  • Version control for infrastructure (changes are reviewed like code)
  • Repeatable deployments (beta, gamma, prod are identical)
  • TypeScript type safety catches mistakes

Key CDK Classes Used in EPIC

CDK Class What it creates
aws-cdk-lib/aws-lambda.Function Lambda function
aws-cdk-lib/aws-dynamodb.Table DynamoDB table
aws-cdk-lib/aws-sqs.Queue SQS queue
aws-cdk-lib/aws-sns.Topic SNS topic
aws-cdk-lib/aws-apigateway.RestApi API Gateway
aws-cdk-lib/aws-ec2.Vpc VPC
aws-cdk-lib/aws-rds.DatabaseCluster RDS Aurora
aws-cdk-lib/aws-iam.Role IAM role
aws-cdk-lib/aws-cloudwatch.Alarm CloudWatch alarm
aws-cdk-lib/aws-events.Rule CloudWatch event rule (cron)
@amzn/pipelines.DeploymentStack Amazon’s internal CDK extension

Section 15 β€” Quick Reference: When to Use What

Need Use
Store structured records (fleet, service, event) DynamoDB
Store operational data with SQL queries (HOTW) Aurora MySQL
Store files, large data S3
Run code on demand Lambda
HTTP API for frontend API Gateway
Async work queue SQS
One-to-many notifications SNS
Send emails SES
Monitor and alert CloudWatch
Access control IAM
Private networking VPC
Scheduled jobs CloudWatch Events (cron)
Complex multi-step workflows Step Functions
SQL queries on S3 Athena
Infrastructure as code CDK