The Big Picture: How all 14 files work together
1. The HOTW Big Picture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HOTW SYSTEM β COMPLETE FLOW β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TRIGGER: Weekly CloudWatch Event (every Monday morning)
OR
TRIGGER: DynamoDB Stream (when fleet version is updated)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HotwHandler.java (Lambda entry point) β
β β
β 1. Parse event (which fleets to process) β
β 2. Create HotwUpscalingHelper(context) β
β 3. For each fleet: call handle() or handleAtomic() β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HotwUpscalingHelper.handle() β
β β
β Step 1: Get latest fleet from EPIC API β
β Step 2: validateFleetData() β must have: β
β β FleetProjection[0] with InputTPM > 5 β
β β HostThroughputTPM > 1 β
β β FleetType = "Registered" β
β β TotalInputTPMLinkVerifiedByCustomer = true β
β Step 3: updateFMCAndApolloDetails() β
β Step 4: sleep(5000ms) β
β Step 5: getResult() ββββββββββββββββββββββββββββββΊ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β HOTWHelperModel returned
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β If hostNeeded > 0 AND not differentialOrder: β
β publishHardwareDetailsToSNS() β
β β Sends email to service owner + EPIC team β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. The getResult() Method Flow in Detail
getResult(fleetId, eventId, runId, versionId)
β
ββ GET FLEET DATA βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β fleet = callGetLatestEventFleet(fleetId, eventId)
β projectedHost = fleet.getFleetProjection()[0].getProjectedHosts()
β (or specific version if versionId provided)
β
ββ CLASSIFY HOSTS βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β hostTypeWithAsg = { "r5.xlarge": ["asg-1a", "asg-1b"] }
β sortedMap = { "r5.xlarge": 150 } (count per type, sorted desc)
β hostType = "r5.xlarge" (dominant type)
β asgList = ["asg-1a", "asg-1b"]
β
ββ CALCULATE HOSTS NEEDED βββββββββββββββββββββββββββββββββββββββββββββ
β maxHostInApollo = 180
β totalPendingHost = 10 (pending approval + pending delivery)
β hostNeeded = projectedHost(200) - maxHostInApollo(180) - pending(10) = 10
β
β hostNeededWithPending = 10 + pending(10) = 20
β (full SPCO should be for 20, but delta is only 10)
β
ββ GET PER-ASG RECOMMENDATIONS ββββββββββββββββββββββββββββββββββββββββ
β asgToHostRecommendationMap = {
β "asg-1a": 12, (distribute 20 hosts across ASGs)
β "asg-1b": 8
β }
β
ββ HANDLE PREFERRED ASG CHANGES βββββββββββββββββββββββββββββββββββββββ
β preferredASGList = [latest, previousLatest]
β If ASGs were removed from preference:
β - Those ASGs get BAU count (from asgProperties.hostCount)
β - Remaining preferred ASGs get new recommendations
β - Maps are MERGED
β
ββ CHECK EMERGENT βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Is today past EmergentScalingStartDate AND before SPCO end date?
β β
β ββ YES β checkAndCreateSev2TicketIfNeeded()
β β Check for existing sev2 ticket
β β If none β createSev2PendingTicketOnEPICCTI()
β β sev2TicketId = "T-12345"
β β
β ββ NO β sev2TicketId = ""
β
ββ PLACE SPCO (HARDWARE ORDERS) ββββββββββββββββββββββββββββββββββββββββ
β fetchArnUpdateSPCOAndReturnSIMLink()
β β
β ββ For each ASG in asgToHostRecommendationMap:
β β β
β β ββ scalingPlannerHelper.getAsgInfoResponse() β asgArn, EAP status
β β β
β β ββ IF EAP Enabled:
β β β asgArn[asgTag] = asgScalingPlannerInfo (store for later)
β β β
β β ββ IF EAP Disabled:
β β asgDisableOrNotOnboarded = true (will fail below)
β β
β ββ IF ALL ASGs EAP enabled:
β β β
β β ββ getSIMLink() β existing or new SIM ticket
β β β
β β ββ For each ASG:
β β ββ Check lower bound
β β ββ ScalingStrategyFactory.getScalingStrategy()
β β β (how to gradually scale up: gradual vs immediate)
β β ββ scalingPlannerHelper.getScalingPlanOverrideV2ResultResponse()
β β β ACTUAL SPCO PLACED IN AWS! π―
β β
β ββ IF ANY ASG not EAP enabled:
β β createAndInsertHotwExecutionDetailsForFailExecution()
β β throw IllegalStateException("Some ASGs are disabled on EAP")
β
ββ UPDATE FMC ORDERS ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β fmcHelper.updateFMCorders() β sync FMC status from ScalingPlanner
β
ββ UPDATE SIM TICKET ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β updateHardwareOrderSimDetails()
β β
β ββ IF fleet already in SIM:
β β ββ Get HOTW execution history
β β ββ IF reqHosts changed: update SIM with new data + add comment
β β ββ IF reqHosts same: no SIM update needed
β β
β ββ IF fleet new to SIM:
β ββ Get SIM description
β ββ Append new table row with fleet data
β ββ Update SIM description
β ββ Print FMC links in SIM thread
β ββ Create ticket entry in EPIC DB
β
ββ SAVE AUDIT RECORDS βββββββββββββββββββββββββββββββββββββββββββββββββββ
β createHotwExecutionDetails(hotwExecutionAllDetailsInput)
β updateHotwRunDetails(runId, eventId, orderType)
β
ββ BUILD RETURN VALUE βββββββββββββββββββββββββββββββββββββββββββββββββββ
β HOTWHelperModel {
β hostNeeded: 10,
β service: service,
β hardwareOrderSimLink: "https://sim.amazon.com/issues/T-12345",
β isDifferentialOrder: true/false,
β fleet: fleet,
β emergentDetails: { isEmergent: false, ... }
β }
β
ββ [FINALLY] ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
sqs.sendMessage β Apollo queue (trigger Apollo config refresh)
publishToMilestoneSNSTopic β trigger HardwareOrder milestone completion
3. File Relationships Diagram
Files in HotwHelper and what they use each other:
HotwUpscalingHelper.java βββββ MAIN ORCHESTRATOR ββββββββββββββββββββββββ
β
βββ uses βββΊ HardwareOrdersUtil
β β
β βββ uses βββΊ TableUtil
β β βββ uses βββΊ TableConstants
β β
β βββ builds βββΊ HotwAllDetails
β β βββ HotwAllDetails.FleetDetails (inner)
β β
β βββ builds βββΊ HotwExecutionAllDetailsInput
β βββ HotwExecutionDetails (inner)
β βββ CapacityOverrideDetails (inner)
β βββ FulfillmentDetail (inner)
β βββ StatusOfHOTW enum (inner)
β
βββ uses βββΊ EPICBackendHotwApiCallsCommon
β βββ calls HTTP ββΊ EPIC Backend APIs
β βββ returns βββΊ PreferredASG (list)
β
βββ builds βββΊ HOTWHelperModel
β βββ EmergentDetails (inner class)
β
βββ uses βββΊ AtomicHOTWRequestModel (input for handleAtomic)
β
βββ uses βββΊ CreateHotwRunDetailsInput (for run record updates)
DescaleHostRecommendationHelper.java βββ POST-EVENT DESCALING
β
βββ uses βββΊ EPICBackendHotwApiCallsCommon
β
βββ builds βββΊ DescaleRecommendationWithClb (result per ASG)
OrderDetails.java βββ SIMPLE DATA CARRIER (used in other reports/tables)
4. What Each Java Concept Means in This Code
Classes = Real-world things
| Java Class | Real-world thing |
|---|---|
Fleet |
A server group (e.g., FORTRESSService-X1-NA) |
Service |
An Amazon service (e.g., FORTRESSService) |
Event |
A peak event (Prime Day 2026) |
HOTWHelperModel |
Result of one fleetβs HOTW processing |
HotwAllDetails |
Summary of a fleetβs hardware situation |
PreferredASG |
A preference record for which ASGs to use |
CapacityOverrideDetails |
One SPCO (hardware order request) |
FulfillmentDetail |
Status of one hardware delivery |
Methods = Actions
| Java Method | Real-world action |
|---|---|
handle() |
βRun HOTW for this fleetβ |
handleAtomic() |
βRe-run HOTW because fleet data changedβ |
validateFleetData() |
βCheck fleet is ready for HOTWβ |
getResult() |
βCalculate needs and place ordersβ |
calculateBauTpm() |
βWork backwards to find BAU trafficβ |
buildHOTWDetails() |
βGather all data about fleetβs ordersβ |
descriptionFormatter() |
βFormat data as a markdown table rowβ |
publishHardwareDetailsToSNS() |
βEmail the service ownerβ |
createSev2PendingTicketOnEPICCTI() |
βCreate emergency SIM ticketβ |
Exceptions = Things that can go wrong
| Exception thrown | When itβs thrown |
|---|---|
IllegalArgumentException |
Fleet data is invalid/incomplete |
IllegalStateException |
Business rule violation (e.g., >1000 hosts, ASG not on EAP) |
Exception e re-thrown |
Unexpected error β propagates up to caller |
5. The Emergent Ordering Flow
Normal order timeline:
Feb: Initial hardware order
Apr: Hardware readiness date
Jul 14: Peak Event starts
Emergent timeline (close to peak):
Jun: EmergentScalingStartDate β after this, orders are "emergent"
Jul: SPCO End Date
Jul 14: Peak Event starts
checkIfCurrentDayPastEmergentScaleStartDate():
Returns true if:
currentDate > EmergentScalingStartDate
AND
currentDate < lastSpcoEndDate (still time to order)
If emergent:
1. Create sev2 ticket (or reuse existing one)
2. Place SPCO with sev2 ticket as justification
3. Update ticket with comment about what was ordered
4. HOTWHelperModel.emergentDetails.isEmergent = true
5. SNS notification includes emergent details
6. Key Formulas Used
Peak TPM (total input TPM):
= Ξ£(InputTraffic sources for this fleet)
Source types: Self, Upstream, CloudTune
BAU TPM:
= Peak TPM Γ· (CT Peak Factor Γ Buffer Factor)
Example: 10,000 Γ· (2.0 Γ 1.3) = 3,846 TPM
Required Hosts:
= Peak TPM Γ· Host Throughput TPM Γ AZ Factor
Example: 10,000 Γ· 570 Γ 1.125 = 19.7 β ceil β 20 hosts
Hosts Needed (to ORDER):
= Required Hosts - (Current Hosts in Apollo + Pending Orders)
Example: 20 - (14 + 3) = 3 more to order
SPCO Value per ASG:
= ScalingPlanner recommendation for this ASG
(distributed across multiple ASGs by RecommendDistribution algorithm)
7. SNS Notifications β What Gets Sent
Normal HOTW notification (publishHardwareDetailsToSNS):
{
"FleetId": "FORTRESSService-X1-NA",
"ServiceId": "FORTRESSService",
"EventId": "PrimeDay26",
"SimLink": "https://sim.amazon.com/issues/T-12345"
}
MessageAttributes:
NotificationName = "HARDWARE_ORDER_DETAILS"
ServiceEmailId = ["team@amazon.com", "owner@amazon.com", "poc@amazon.com"]
Atomic HOTW notification (publishAtomicDetailsToSNS):
{
"FleetId": "FORTRESSService-X1-NA",
"ServiceId": "FORTRESSService",
"EventId": "PrimeDay26",
"LatestVersionId": "v3",
"PreviousVersionId": "v2",
"SIMLink": "https://sim.amazon.com/issues/T-12345",
"Owner": "johndoe",
"POC": "janedoe",
"EmergentDetails": {
"IsEmergent": false,
"EventName": "Prime Day 2026",
"DaysToPeakGameDays": 15,
"IsEmergentCloseToPeak": false
}
}
MessageAttributes:
NotificationName = "ATOMIC_HARDWARE_ORDER_DETAILS"
ServiceEmailId = ["team@amazon.com", "owner@amazon.com", "poc@amazon.com"]
Milestone SNS (publishToMilestoneSNSTopic):
Message: "Sending SNS trigger for HardwareOrder milestone"
MessageAttributes:
FleetId = "FORTRESSService-X1-NA"
EventId = "PrimeDay26"
WorkflowName = "HardwareOrder"
β This triggers the MilestoneWorkflowHandler to mark HardwareOrder milestone as completed.
8. SIM Ticket Structure (What Gets Created)
The SIM ticket description looks like this in markdown:
We have created this SIM for PrimeDay26 hardware orders. EPIC has used this SIM to
automatically place hardware orders for the above mentioned service.
Here is a [link](https://console.harmony.a2z.com/epic/Event/PrimeDay26/Dashboards) to EPIC.
| Fleet Id | Peak TPM | BAU TPM | CT Peak Factor | Buffer% | Required Hosts | Hosts Present in Apollo | Capacity Override Value | Pending Hosts Ordered By EPIC | FMC Order Details |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| [FORTRESSService-X1-NA](https://console.harmony.a2z.com/epic/Event/PrimeDay26/...) | 10000 | 3846 | 2.08 | 1.3 | 20 | 14 | 17 | 3 | [FMC LINK](https://fmc.amazon.com/?...) |
| | | | | | | | | | |
| [OtherService-X2-EU](link) | 5000 | 2000 | 1.8 | 1.2 | 12 | 10 | 10 | 0 | [FMC LINK](url) |
| | | | | | | | | | |
9. Common Questions & Answers
Q: Why does the code sleep(5000)?
Apollo is a distributed config system. When EPIC calls
updateApolloDetails, the data propagates across multiple nodes. Sleeping 5 seconds gives Apollo time to update, so the NEXT call to get fleet data returns fresh Apollo host counts.
Q: What is EAP?
EAP = Emergency Access Protocol (or similar). In the context of HOTW, it means the ASG (Auto Scaling Group) is enrolled in the ScalingPlanner system and can accept capacity override requests. If EAP is not enabled, HOTW canβt place orders for that ASG automatically.
Q: Why Math.ceil() for host counts?
Safety. If calculation says you need 19.3 hosts, you order 20 (round up). Never round down β youβd be short on capacity!
Q: What is βdifferential orderβ?
When the fleet was already in the SIM ticket and the host count DIDNβT change, itβs a βdifferentialβ = no new order needed. When
isDifferentialOrder = true, the code skips sending the SNS notification.
Q: Why does getResult() have a finally block?
Even if HOTW fails partway through, the Apollo queue message and milestone SNS must still be sent:
- Apollo queue: triggers Apollo to re-read fleet configs (even on failure, this cleans up)
- Milestone SNS: the MilestoneWorkflowHandler uses this to know HOTW ran (success or fail)
Q: What is StringUtils.isNotBlank()?
From Apache Commons library.
isNotBlank("hello")= true.isNotBlank("")= false.isNotBlank(" ")= false (spaces donβt count).isNotBlank(null)= false. Itβs safer than!string.isEmpty()because it also handles null.
Q: Why two different API proxies (epicBackendApiProxy and hotwApiProxy)?
They point to different endpoints:
epicBackendApiProxy: main EPIC backend (fleet, service, event data)hotwApiProxy: HOTW-specific endpoint (preferred ASG data, HOTW dashboard) Having separate proxies lets them potentially run on different infrastructure.
10. Internship Tips β Things to Focus On
What to understand deeply:
- The HOTW formula:
hostNeeded = projected - (inApollo + pending) - Why
finallyis critical: Apollo queue and milestone must always run - The builder pattern: Every model uses
.builder()....build() - Lombok saves thousands of lines: @Data, @Builder replace manual getters/setters
- Jackson maps JSON β Java: @JsonProperty maps key names
What questions to ask your mentor:
- βHow is
getRecommendedHostForEachAsg()in ScalingPlannerHelper implemented?β - βWhat happens in MilestoneWorkflowHandler after receiving the milestone SNS?β
- βHow is the HOTW triggered β is it CloudWatch Events or DynamoDB Streams?β
- βWhat is the difference between FMC and SPCO orders?β
- βHow does the gradual scaling strategy work vs immediate?β
How to navigate the code:
To understand how HOTW is triggered:
β Look at HotwHandler.java (in lambda/handler/ folder)
To understand ScalingPlanner:
β Look at lambda/scalingplanner/ package
To understand milestone completion:
β Look at lambda/milestone/ package
To understand Apollo updates:
β Look at lambda/apollo/ package
To understand FMC orders:
β Look at lambda/fmc/ package
11. Quick Reference Card
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HOTWHELPER QUICK REFERENCE β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β β
β DATA MODELS: β
β HOTWHelperModel β Result of one fleet's HOTW β
β HotwAllDetails β Fleet TPM + order summary β
β HotwExecutionAllDetails β Full audit record with ASG data β
β CreateHotwRunDetails β Create/update run record β
β AtomicHOTWRequestModel β Version IDs for atomic trigger β
β PreferredASG β Preferred ASG preference record β
β DescaleRecommendation β Post-event descale per ASG β
β OrderDetails β Simple per-fleet order summary β
β TableConstants β Column name strings β
β β
β UTILITIES: β
β HardwareOrdersUtil β Calculate orders, build details β
β TableUtil β Build markdown tables for SIM β
β β
β API CLIENT: β
β EPICBackendHotwApiCallsCommon β REST API calls to EPIC β
β β
β CORE LOGIC: β
β HotwUpscalingHelper β Main upscaling orchestrator β
β DescaleRecommendation β Post-event descaling recommendations β
β β
β KEY FORMULA: β
β hostNeeded = projected - (inApollo + pendingOrders) β
β β
β KEY ANNOTATIONS: β
β @Data β getters + setters + equals + toString β
β @Builder β .builder()....build() pattern β
β @Jacksonized β @Builder + Jackson JSON deserialization β
β @JsonProperty("Name") β map field to JSON key β
β @JsonIgnoreProperties β ignore unknown JSON fields β
β @JsonInclude(NON_NULL)β skip null fields in JSON output β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Congratulations!
You have now read through:
- β Layer 1: Java basics β classes, variables, methods, loops
- β Layer 2: Java intermediate β exceptions, streams, lambdas, enums, builder pattern
- β Layer 3: Libraries β Lombok, Jackson, AWS SDK
- β Layer 4: Deep dive into every file in HotwHelper
- β Layer 5: Full system flow and big picture
You are now equipped to:
- Read and understand any Java file in this codebase
- Explain what each class and method does
- Navigate the dependency relationships between files
- Ask intelligent questions during your internship
- Start making small modifications with confidence
Good luck with your internship! π