The practical guide β how to actually do your job, survive code reviews, ask the right questions, debug issues, and make your internship a success.
Section 1 β First Week: What To Do
Day 1: Get access and set up
# These are the typical first-day tasks at Amazon
1. Get AWS console access for your team's account
2. Set up your Brazil workspace:
brazil setup workspace
brazil ws add -p EPICBackend EPICBackendCDK EPICBackendTriggers EPICFrontend
3. Get VPN set up (needed for internal tools)
4. Get access to:
- EPIC internal wiki pages
- The EPIC SIM (ticketing) queue
- Relevant Slack/Chime channels
- CloudWatch dashboards
Day 1: Questions to ask your mentor
1. "Can you walk me through one complete HOTW run in production?"
2. "Where are the production dashboards to monitor EPIC health?"
3. "What are the current top 3 pain points with the system?"
4. "Which SIM ticket should I look at to understand a real issue?"
5. "Is there a sandbox/beta environment where I can experiment safely?"
Section 2 β Understanding Your First Ticket
When you get your first work ticket, do this:
Step 1: Read the ticket fully
- What is the problem/feature?
- Which component is affected? (Frontend? Backend? Triggers? CDK?)
- Are there any related tickets?
Step 2: Find the code
Is it a UI issue? β Look in EPICFrontend/src/pages/ or /components/
Is it an API issue? β Look in EPICBackend/src/epiclambda/api/
Is it a background job? β Look in EPICBackendTriggers/lambda/handler/
Is it infrastructure? β Look in EPICBackendCDK/lib/
Step 3: Trace the flow
Follow the data:
Frontend page β backend_api.js method β API Gateway route β Lambda handler
β Operations class β Database β Response β Redux state β UI update
Step 4: Read related tests
# Tests are in __tests__ or *.test.js files
# Read them to understand expected behavior
find . -name "*.test.js" | xargs grep "HOTW"
Section 3 β How to Debug Issues
Frontend Issues (React)
// 1. Open browser DevTools (F12) β Console tab
// Look for red errors
// 2. Check Network tab
// Find the failing API call
// Click it β see request and response
// 3. Add console.log
const HOTWDashboard = (props) => {
console.log("Props received:", props);
console.log("State:", { hotwData, isLoading });
// ...
};
// 4. Check Redux state
// Install Redux DevTools Chrome extension
// See current state of all reducers
Backend Issues (Node.js Lambda)
// 1. Check CloudWatch Logs
// Go to: CloudWatch β Log Groups β /aws/lambda/[LambdaName]
// Filter by: "ERROR" or "Exception"
// 2. Add more logging (temporary)
static async getHotwDashboardDetails(event) {
console.log("Input event:", JSON.stringify(event)); // log input
try {
const result = await hotwOps.getDetails(eventId);
console.log("DB result:", JSON.stringify(result)); // log output
// ...
}
}
Java Lambda Issues
// 1. Check CloudWatch Logs
// Go to: CloudWatch β Log Groups β /aws/lambda/[JavaLambdaName]
// 2. Log more in Java
logger.log("Starting HOTW for fleet: " + fleetId);
logger.log("Fleet data: " + JsonUtil.toJson(fleet));
logger.log("Calculated hostsNeeded: " + hostsNeeded);
// 3. Common Java error patterns:
// NullPointerException β fleet/event data was null (check DynamoDB item exists)
// JsonProcessingException β JSON parsing failed (print raw string before parsing)
// AmazonServiceException β AWS API call failed (check IAM permissions)
Database Issues
// 1. For DynamoDB issues:
// Check: does the item exist?
// Go to: AWS Console β DynamoDB β Tables β FleetTable β Items
// Search by FleetId
// 2. For MySQL issues:
// Query the table directly (ask mentor for RDS access)
SELECT * FROM hotw_execution WHERE FleetIndexId = 42 ORDER BY CreatedAt DESC LIMIT 10;
// 3. Common MySQL errors:
// Duplicate key error β INSERT on already-existing unique key (use ON DUPLICATE KEY UPDATE)
// Timeout β query taking too long (add index or limit results)
Section 4 β Common Issues and Solutions
Issue 1: βHOTW not running for my fleetβ
Checklist:
β‘ Is fleet in EventTable for the event? (check DynamoDB)
β‘ Is service type "Registered"? (check ServiceTable)
β‘ Has SPCO end date passed for this region? (check EventTable.SPCOEventDatesByRegion)
β‘ Is fleet in the "servicesNotToConsider" blacklist in HotwHandler.java?
β‘ Check validateAndUpdateSPCOSQSQueue for messages (CloudWatch Metrics)
β‘ Check validateAndUpdateSPCOSQSQueue DLQ for failed messages
Issue 2: βMilestone stuck in βInProgressββ
Steps:
1. Check EventPlanTable for the fleet (DynamoDB)
2. Which milestone is stuck? What's the message?
3. For HardwareOrder stuck:
β Check if HOTW ran for this fleet (hotw_execution table)
β Check if RunId exists
4. For HardwareFulfillment stuck:
β Check fulfillment_details table
β Check FMC order status (link in fulfillment_details)
5. Manually update milestone if system can't:
PUT /eventplan/{EventId}/{FleetId}/status
Body: { EventMilestoneId: "HardwareOrder", MilestoneCompletionStatus: "Completed", ... }
Issue 3: βWrong host count shown in HOTW dashboardβ
Steps:
1. Check Apollo data (ApolloHandler recently synced?)
β Look at ApolloSQSQueue message count
β Check CloudWatch logs for ApolloHandler
2. Check hotw_execution table for latest run
β What is HostsPresentInApollo value?
β Matches what Apollo shows?
3. If Apollo is stale:
β Manually trigger ApolloHandler for this fleet
β Send message to ApolloSQSQueue with FleetId
Issue 4: βFrontend showing loading spinner foreverβ
Steps:
1. Open browser DevTools β Network tab
2. Find the failing API call (red)
3. Check status code:
- 401/403: Authentication issue (Harmony session expired?)
- 404: Resource not found (check DynamoDB for the item)
- 500: Lambda error (check CloudWatch logs)
- Network Error: CORS issue or Lambda not deployed
4. Check if Lambda is deployed (CDK deploy ran?)
Issue 5: βSQS messages piling up in DLQβ
Steps:
1. Go to SQS console β find the DLQ
2. Click "Receive Messages" β see one failed message
3. Check CloudWatch logs for the Lambda that processes this queue
4. Find the exception and reason for failure
5. Fix the code
6. Move messages from DLQ back to main queue to retry
Section 5 β Making Code Changes
Before writing code:
- Read the existing code in that area
- Understand the pattern being used
- Write down: what input, what output, what side effects?
- Check: are there unit tests? Read them.
For a Backend (Node.js) change:
// 1. Find the right api/ file
// 2. Add/modify the method following the pattern:
static async newMethod(event) {
// Parse and validate (try/catch 400)
try {
body = JSON.parse(event[OtherConstants.BODY]);
if (!Util.validateKeys(body, [RequiredConstants.KEY1])) {
throw TypeError('Required fields missing');
}
} catch (err) {
return Util.handleErr(400, 'Bad request', err);
}
// Business logic (try/catch 503)
try {
const result = await operations.doSomething(data);
return Util.handleResponse(200, JSON.stringify(result));
} catch (err) {
return Util.handleErr(503, 'Server error', err);
}
}
// 3. Add to EPICApiStack.ts (if new route)
// 4. Write unit tests
// 5. Test manually via Postman or frontend
For a Java change:
// 1. Find the right handler or helper class
// 2. Follow the existing pattern
// 3. Always use logger.log() not System.out.println()
// 4. Handle exceptions with try/catch
// 5. Build: brazil build
// 6. Check for compile errors
// 7. Run unit tests: brazil test
For a CDK change:
// 1. Find the right stack file
// 2. Add resource following existing pattern
// 3. Run: cdk diff to see what will change
// 4. Review diff carefully (make sure nothing unexpected changes)
// 5. Deploy to beta: cdk deploy StackName
// 6. Test in beta
// 7. Deploy to gamma, then prod
For a Frontend change:
// 1. Find the right page or component
// 2. Understand which Redux state it uses
// 3. Make your change
// 4. Start local dev server:
npm start
// 5. Test in browser
// 6. Check DevTools Console for errors
Section 6 β Code Review Tips
Before submitting for review:
β‘ No console.log statements left in (except intentional logging)
β‘ Error handling added (try/catch where needed)
β‘ Unit tests added/updated
β‘ No hardcoded values (use constants)
β‘ Follows existing code patterns in that file
β‘ No TODO comments without context
β‘ No commented-out code
β‘ Lambda timeout is reasonable for what it does
β‘ IAM permissions are minimal (only what's needed)
Typical review comments youβll get:
"Can you add a unit test for the error case?"
β Add a test where the DB call throws an error
"This should use a constant instead of the string literal"
β Add to the constants file, use the constant
"What happens if this field is null?"
β Add null check or explain why it can't be null
"Why 5 minutes timeout for this Lambda?"
β Justify or reduce the timeout
"This looks like it could be a separate helper method"
β Extract into a private method
Section 7 β Important Contacts and Resources
When to ask your mentor:
- Youβve been stuck for more than 30 minutes
- Youβre not sure if a change is safe (could affect production)
- You donβt understand a business requirement
- The code is doing something unexpected and you canβt figure out why
When to escalate:
- Production is broken (escalate immediately)
- Data inconsistency found (escalate immediately)
- Security issue found (escalate immediately)
Resources to bookmark:
CloudWatch Dashboards β for monitoring Lambda/SQS health
DynamoDB Console β for inspecting table data
SQS Console β for monitoring queue depths
EPIC Internal Wiki β for business context
SIM Queue β for tracking issues
Section 8 β The EPIC Vocabulary Sheet
Keep this open when reading code or talking to teammates:
| Term | Say it as⦠| Means |
|---|---|---|
| HOTW | βH-O-T-Wβ | Head of the Week β weekly hardware ordering automation |
| SPCO | βS-P-C-Oβ | Service Provider Capacity Override β hardware order request |
| EAP | βE-A-Pβ | Emergency Access Protocol β ASG enrolled in ScalingPlanner |
| ASG | βA-S-Gβ | Auto Scaling Group β group of AWS servers |
| FMC | βF-M-Cβ | Fulfillment Management Console β tracks hardware delivery |
| TPM | βT-P-Mβ | Transactions Per Minute β traffic metric |
| BAU | βB-A-Uβ | Business As Usual β non-peak normal operations |
| SIM | βSIMβ | Amazonβs internal ticketing system |
| Apollo | βApolloβ | Amazonβs config management (stores current host counts) |
| Gizmo/SDC | βGizmoβ | Throttling management systems |
| FLO | βFLOβ | Fleet Light Operations β host management |
| CloudTune | βCloud-Tuneβ | ML-based capacity prediction |
| PMET | βP-METβ | Performance Metric β CloudWatch metric for TPM |
| Axon | βAxonβ | Traffic metrics collection system |
| TES | βT-E-Sβ | Traffic Engineering System |
| AZ | βA-Zβ | Availability Zone β isolated data center section |
| CDK | βC-D-Kβ | Cloud Development Kit β infrastructure as code |
| DLQ | βD-L-Qβ | Dead Letter Queue β failed message archive |
| FIFO | βFI-FOβ | First In, First Out β ordered queue |
| PK/SK | βP-K/S-Kβ | Primary Key / Sort Key (DynamoDB) |
Section 9 β What Good Code Looks Like in EPIC
Node.js backend:
// β
Good EPIC backend code
static async createHardwareOrder(event) {
const hotwOperations = new HOTWOperations();
const fleetOperations = new FleetOperations();
let body, fleetId, runId;
// Step 1: Parse input (returns 400 on bad input)
try {
body = JSON.parse(event[OtherConstants.BODY]);
if (!Util.validateKeys(body, [FleetConstants.FLEET_ID, HOTWConstants.RUN_ID])) {
throw TypeError('Required fields missing: FleetId, RunId');
}
fleetId = body[FleetConstants.FLEET_ID];
runId = body[HOTWConstants.RUN_ID];
} catch (err) {
return Util.handleErr(
API_RESPONSE_STATUS_CODES.MALFORMED_REQUEST_STATUS_CODE,
`Error parsing request for fleet ${fleetId}`,
err
);
}
// Step 2: Business logic (returns 503 on server error)
try {
const fleetIndexId = await fleetOperations.getFleetIndex(fleetId);
const result = await hotwOperations.createOrder(fleetIndexId, runId, body);
console.log(`Order created for fleet: ${fleetId}, runId: ${runId}`);
return Util.handleResponse(
API_RESPONSE_STATUS_CODES.SUCCESSFUL_OK_STATUS_CODE,
JSON.stringify(result)
);
} catch (err) {
return Util.handleErr(
API_RESPONSE_STATUS_CODES.SERVICE_UNAVAILABLE_STATUS_CODE,
`Error creating order for fleet: ${fleetId}`,
err
);
}
}
Java Lambda:
// β
Good EPIC Java code
public void processFleet(String fleetId, String eventId, String runId) {
try {
logger.log("Processing fleet: " + fleetId + " for event: " + eventId);
// Get fleet data
Fleet fleet = epicBackendFleetApiCallsCommon.getLatestFleet(fleetId);
if (fleet == null) {
logger.log("Fleet not found: " + fleetId);
updateStatusWithReason(fleetId, eventId, runId, "FAIL", "Fleet not found");
return;
}
// Process
int hostsNeeded = calculateHostsNeeded(fleet, eventId);
if (hostsNeeded > 0) {
placeSPCOOrder(fleet, hostsNeeded);
logger.log("Order placed: " + hostsNeeded + " hosts for fleet: " + fleetId);
} else {
logger.log("No order needed. Hosts needed: " + hostsNeeded);
}
} catch (Exception e) {
logger.log("Error processing fleet: " + fleetId + ". Error: " + e.getMessage());
e.printStackTrace();
// Don't re-throw β let finally block run
} finally {
// Always update status and publish to SNS
publishStatusToSNS(fleetId, eventId);
}
}
Section 10 β Key Files to Know by Heart
After reading this guide, these files should feel familiar:
Node.js Backend:
/api/HOTW.jsβ HOTW REST endpoints/api/EventPlan.jsβ milestone management/operations/HOTWOperations.jsβ HOTW MySQL queries/common/Util.jsβ handleResponse, handleErr, validateKeys/clients/AuroraMysqlClient.jsβ MySQL operations
Java Triggers:
handler/HotwHandler.javaβ HOTW Lambda entry pointsHotwHelper/HotwUpscalingHelper.javaβ core HOTW logicHotwHelper/HardwareOrdersUtil.javaβ calculation utilitiesmilestone/handler/*.javaβ all milestone handlerscommon/api/epicbackend/EPICBackendApiCallsCommon.javaβ API calls
Frontend:
pages/hotwDashboard.jsxβ HOTW dashboardpages/serviceDetails.jsxβ fleet detailscommon/backend_api.jsβ all backend API callsstore/store.jsβ Redux store setup
CDK:
lib/constants.tsβ all queue/table nameslib/HOTW/HOTWLambdaStack.tsβ HOTW infrastructurelib/apiStack.tsβ core tables and queues
Section 11 β Sample Daily Schedule
9:00 AM Check Slack/Chime for any overnight alerts
9:15 AM Check CloudWatch dashboards for errors
9:30 AM Review your current ticket β what's the next step?
10:00 AM Code time (most productive period)
12:00 PM Lunch
1:00 PM Team standup (share: what did yesterday, doing today, blockers)
1:30 PM Code review β check if your PRs have feedback
2:00 PM More code time
4:00 PM Write up what you learned today (helps retention)
5:00 PM Wind down β check in with mentor if anything unclear
Weekly rhythm:
- Monday: New week, check HOTW ran over weekend
- Weekly: HOTW cron fires (check CloudWatch for results)
- Pre-peak: More intense β hardware orders being placed
- Post-peak: Descale work, retrospectives
Section 12 β Questions to Ask in 1:1 with Manager
Week 1:
"What's the most impactful thing I could contribute this summer?"
"What does success look like for my internship project?"
Week 3:
"I've been working on X. Am I on the right track?"
"What skills do you think I should develop most?"
Week 6:
"What's the most complex part of the system I should try to understand?"
"Is there a component that needs attention that I could help with?"
Week 10 (near end):
"What would make this a standout project?"
"Who else on the team should I talk to before I leave?"
Section 13 β Final Cheat Sheet
"Where do I find the code for X?"
βββ HOTW automation β EPICBackendTriggers/lambda/handler/HotwHandler.java
β β EPICBackendTriggers/lambda/HotwHelper/
βββ Hardware order calcs β HardwareOrdersUtil.java
βββ Milestone updates β milestone/handler/*.java
βββ HOTW REST APIs β EPICBackend/src/epiclambda/api/HOTW.js
βββ Milestone REST APIs β EPICBackend/src/epiclambda/api/EventPlan.js
βββ HOTW dashboard UI β EPICFrontend/src/pages/hotwDashboard.jsx
βββ Service details UI β EPICFrontend/src/pages/serviceDetails.jsx
βββ All API routes β EPICBackendCDK/lib/EPICApiStack.ts
βββ AWS resource setup β EPICBackendCDK/lib/[feature]Stack.ts
βββ DB table definitions β EPICBackendCDK/lib/apiStack.ts
"What triggers X?"
βββ HOTW runs weekly β CloudWatch cron β HotwHandler.handleSQSRequestForUpdateSpco
βββ HOTW for one fleet β atomicHotw button β POST /hotw/atomic/{fleetId}/{eventId}
βββ Milestone updates β HOTW publishes to MilestoneSNSTopic β WorkflowHandler
βββ Apollo sync β Daily cron β ApolloTriggerHandler
βββ FMC check β Periodic cron β FmcTriggerHandler
βββ Email notifications β SNS message β SES Lambda
βββ Event plan creation β EventFleetCreationQueue β FleetReceiver.js
"How do I find out why X failed?"
βββ Any Lambda error β CloudWatch Logs /aws/lambda/[name]
βββ SQS not processing β Check DLQ in SQS console
βββ DynamoDB data β DynamoDB Console β Tables β Items
βββ MySQL data β Query Aurora MySQL directly (ask mentor)
βββ API Gateway error β CloudWatch β API Gateway access logs
Good luck with your internship! π You have everything you need to succeed.