The practical guide β€” how to actually do your job, survive code reviews, ask the right questions, debug issues, and make your internship a success.


Section 1 β€” First Week: What To Do

Day 1: Get access and set up

# These are the typical first-day tasks at Amazon

1. Get AWS console access for your team's account
2. Set up your Brazil workspace:
   brazil setup workspace
   brazil ws add -p EPICBackend EPICBackendCDK EPICBackendTriggers EPICFrontend

3. Get VPN set up (needed for internal tools)
4. Get access to:
   - EPIC internal wiki pages
   - The EPIC SIM (ticketing) queue
   - Relevant Slack/Chime channels
   - CloudWatch dashboards

Day 1: Questions to ask your mentor

1. "Can you walk me through one complete HOTW run in production?"
2. "Where are the production dashboards to monitor EPIC health?"
3. "What are the current top 3 pain points with the system?"
4. "Which SIM ticket should I look at to understand a real issue?"
5. "Is there a sandbox/beta environment where I can experiment safely?"

Section 2 β€” Understanding Your First Ticket

When you get your first work ticket, do this:

Step 1: Read the ticket fully

  • What is the problem/feature?
  • Which component is affected? (Frontend? Backend? Triggers? CDK?)
  • Are there any related tickets?

Step 2: Find the code

Is it a UI issue? β†’ Look in EPICFrontend/src/pages/ or /components/
Is it an API issue? β†’ Look in EPICBackend/src/epiclambda/api/
Is it a background job? β†’ Look in EPICBackendTriggers/lambda/handler/
Is it infrastructure? β†’ Look in EPICBackendCDK/lib/

Step 3: Trace the flow

Follow the data:
Frontend page β†’ backend_api.js method β†’ API Gateway route β†’ Lambda handler
β†’ Operations class β†’ Database β†’ Response β†’ Redux state β†’ UI update

Step 4: Read related tests

# Tests are in __tests__ or *.test.js files
# Read them to understand expected behavior
find . -name "*.test.js" | xargs grep "HOTW"

Section 3 β€” How to Debug Issues

Frontend Issues (React)

// 1. Open browser DevTools (F12) β†’ Console tab
// Look for red errors

// 2. Check Network tab
// Find the failing API call
// Click it β†’ see request and response

// 3. Add console.log
const HOTWDashboard = (props) => {
    console.log("Props received:", props);
    console.log("State:", { hotwData, isLoading });
    // ...
};

// 4. Check Redux state
// Install Redux DevTools Chrome extension
// See current state of all reducers

Backend Issues (Node.js Lambda)

// 1. Check CloudWatch Logs
// Go to: CloudWatch β†’ Log Groups β†’ /aws/lambda/[LambdaName]
// Filter by: "ERROR" or "Exception"

// 2. Add more logging (temporary)
static async getHotwDashboardDetails(event) {
    console.log("Input event:", JSON.stringify(event));  // log input
    try {
        const result = await hotwOps.getDetails(eventId);
        console.log("DB result:", JSON.stringify(result));  // log output
        // ...
    }
}

Java Lambda Issues

// 1. Check CloudWatch Logs
// Go to: CloudWatch β†’ Log Groups β†’ /aws/lambda/[JavaLambdaName]

// 2. Log more in Java
logger.log("Starting HOTW for fleet: " + fleetId);
logger.log("Fleet data: " + JsonUtil.toJson(fleet));
logger.log("Calculated hostsNeeded: " + hostsNeeded);

// 3. Common Java error patterns:
// NullPointerException β†’ fleet/event data was null (check DynamoDB item exists)
// JsonProcessingException β†’ JSON parsing failed (print raw string before parsing)
// AmazonServiceException β†’ AWS API call failed (check IAM permissions)

Database Issues

// 1. For DynamoDB issues:
// Check: does the item exist?
// Go to: AWS Console β†’ DynamoDB β†’ Tables β†’ FleetTable β†’ Items
// Search by FleetId

// 2. For MySQL issues:
// Query the table directly (ask mentor for RDS access)
SELECT * FROM hotw_execution WHERE FleetIndexId = 42 ORDER BY CreatedAt DESC LIMIT 10;

// 3. Common MySQL errors:
// Duplicate key error β†’ INSERT on already-existing unique key (use ON DUPLICATE KEY UPDATE)
// Timeout β†’ query taking too long (add index or limit results)

Section 4 β€” Common Issues and Solutions

Issue 1: β€œHOTW not running for my fleet”

Checklist:
β–‘ Is fleet in EventTable for the event? (check DynamoDB)
β–‘ Is service type "Registered"? (check ServiceTable)
β–‘ Has SPCO end date passed for this region? (check EventTable.SPCOEventDatesByRegion)
β–‘ Is fleet in the "servicesNotToConsider" blacklist in HotwHandler.java?
β–‘ Check validateAndUpdateSPCOSQSQueue for messages (CloudWatch Metrics)
β–‘ Check validateAndUpdateSPCOSQSQueue DLQ for failed messages

Issue 2: β€œMilestone stuck in β€˜InProgress’”

Steps:
1. Check EventPlanTable for the fleet (DynamoDB)
2. Which milestone is stuck? What's the message?
3. For HardwareOrder stuck:
   β†’ Check if HOTW ran for this fleet (hotw_execution table)
   β†’ Check if RunId exists
4. For HardwareFulfillment stuck:
   β†’ Check fulfillment_details table
   β†’ Check FMC order status (link in fulfillment_details)
5. Manually update milestone if system can't:
   PUT /eventplan/{EventId}/{FleetId}/status
   Body: { EventMilestoneId: "HardwareOrder", MilestoneCompletionStatus: "Completed", ... }

Issue 3: β€œWrong host count shown in HOTW dashboard”

Steps:
1. Check Apollo data (ApolloHandler recently synced?)
   β†’ Look at ApolloSQSQueue message count
   β†’ Check CloudWatch logs for ApolloHandler
2. Check hotw_execution table for latest run
   β†’ What is HostsPresentInApollo value?
   β†’ Matches what Apollo shows?
3. If Apollo is stale:
   β†’ Manually trigger ApolloHandler for this fleet
   β†’ Send message to ApolloSQSQueue with FleetId

Issue 4: β€œFrontend showing loading spinner forever”

Steps:
1. Open browser DevTools β†’ Network tab
2. Find the failing API call (red)
3. Check status code:
   - 401/403: Authentication issue (Harmony session expired?)
   - 404: Resource not found (check DynamoDB for the item)
   - 500: Lambda error (check CloudWatch logs)
   - Network Error: CORS issue or Lambda not deployed
4. Check if Lambda is deployed (CDK deploy ran?)

Issue 5: β€œSQS messages piling up in DLQ”

Steps:
1. Go to SQS console β†’ find the DLQ
2. Click "Receive Messages" β†’ see one failed message
3. Check CloudWatch logs for the Lambda that processes this queue
4. Find the exception and reason for failure
5. Fix the code
6. Move messages from DLQ back to main queue to retry

Section 5 β€” Making Code Changes

Before writing code:

  1. Read the existing code in that area
  2. Understand the pattern being used
  3. Write down: what input, what output, what side effects?
  4. Check: are there unit tests? Read them.

For a Backend (Node.js) change:

// 1. Find the right api/ file
// 2. Add/modify the method following the pattern:
static async newMethod(event) {
    // Parse and validate (try/catch 400)
    try {
        body = JSON.parse(event[OtherConstants.BODY]);
        if (!Util.validateKeys(body, [RequiredConstants.KEY1])) {
            throw TypeError('Required fields missing');
        }
    } catch (err) {
        return Util.handleErr(400, 'Bad request', err);
    }
    
    // Business logic (try/catch 503)
    try {
        const result = await operations.doSomething(data);
        return Util.handleResponse(200, JSON.stringify(result));
    } catch (err) {
        return Util.handleErr(503, 'Server error', err);
    }
}

// 3. Add to EPICApiStack.ts (if new route)
// 4. Write unit tests
// 5. Test manually via Postman or frontend

For a Java change:

// 1. Find the right handler or helper class
// 2. Follow the existing pattern
// 3. Always use logger.log() not System.out.println()
// 4. Handle exceptions with try/catch
// 5. Build: brazil build
// 6. Check for compile errors
// 7. Run unit tests: brazil test

For a CDK change:

// 1. Find the right stack file
// 2. Add resource following existing pattern
// 3. Run: cdk diff to see what will change
// 4. Review diff carefully (make sure nothing unexpected changes)
// 5. Deploy to beta: cdk deploy StackName
// 6. Test in beta
// 7. Deploy to gamma, then prod

For a Frontend change:

// 1. Find the right page or component
// 2. Understand which Redux state it uses
// 3. Make your change
// 4. Start local dev server:
npm start
// 5. Test in browser
// 6. Check DevTools Console for errors

Section 6 β€” Code Review Tips

Before submitting for review:

β–‘ No console.log statements left in (except intentional logging)
β–‘ Error handling added (try/catch where needed)
β–‘ Unit tests added/updated
β–‘ No hardcoded values (use constants)
β–‘ Follows existing code patterns in that file
β–‘ No TODO comments without context
β–‘ No commented-out code
β–‘ Lambda timeout is reasonable for what it does
β–‘ IAM permissions are minimal (only what's needed)

Typical review comments you’ll get:

"Can you add a unit test for the error case?"
β†’ Add a test where the DB call throws an error

"This should use a constant instead of the string literal"
β†’ Add to the constants file, use the constant

"What happens if this field is null?"
β†’ Add null check or explain why it can't be null

"Why 5 minutes timeout for this Lambda?"
β†’ Justify or reduce the timeout

"This looks like it could be a separate helper method"
β†’ Extract into a private method

Section 7 β€” Important Contacts and Resources

When to ask your mentor:

  • You’ve been stuck for more than 30 minutes
  • You’re not sure if a change is safe (could affect production)
  • You don’t understand a business requirement
  • The code is doing something unexpected and you can’t figure out why

When to escalate:

  • Production is broken (escalate immediately)
  • Data inconsistency found (escalate immediately)
  • Security issue found (escalate immediately)

Resources to bookmark:

CloudWatch Dashboards β†’ for monitoring Lambda/SQS health
DynamoDB Console β†’ for inspecting table data
SQS Console β†’ for monitoring queue depths
EPIC Internal Wiki β†’ for business context
SIM Queue β†’ for tracking issues

Section 8 β€” The EPIC Vocabulary Sheet

Keep this open when reading code or talking to teammates:

Term Say it as… Means
HOTW β€œH-O-T-W” Head of the Week β€” weekly hardware ordering automation
SPCO β€œS-P-C-O” Service Provider Capacity Override β€” hardware order request
EAP β€œE-A-P” Emergency Access Protocol β€” ASG enrolled in ScalingPlanner
ASG β€œA-S-G” Auto Scaling Group β€” group of AWS servers
FMC β€œF-M-C” Fulfillment Management Console β€” tracks hardware delivery
TPM β€œT-P-M” Transactions Per Minute β€” traffic metric
BAU β€œB-A-U” Business As Usual β€” non-peak normal operations
SIM β€œSIM” Amazon’s internal ticketing system
Apollo β€œApollo” Amazon’s config management (stores current host counts)
Gizmo/SDC β€œGizmo” Throttling management systems
FLO β€œFLO” Fleet Light Operations β€” host management
CloudTune β€œCloud-Tune” ML-based capacity prediction
PMET β€œP-MET” Performance Metric β€” CloudWatch metric for TPM
Axon β€œAxon” Traffic metrics collection system
TES β€œT-E-S” Traffic Engineering System
AZ β€œA-Z” Availability Zone β€” isolated data center section
CDK β€œC-D-K” Cloud Development Kit β€” infrastructure as code
DLQ β€œD-L-Q” Dead Letter Queue β€” failed message archive
FIFO β€œFI-FO” First In, First Out β€” ordered queue
PK/SK β€œP-K/S-K” Primary Key / Sort Key (DynamoDB)

Section 9 β€” What Good Code Looks Like in EPIC

Node.js backend:

// βœ… Good EPIC backend code
static async createHardwareOrder(event) {
    const hotwOperations = new HOTWOperations();
    const fleetOperations = new FleetOperations();
    let body, fleetId, runId;
    
    // Step 1: Parse input (returns 400 on bad input)
    try {
        body = JSON.parse(event[OtherConstants.BODY]);
        if (!Util.validateKeys(body, [FleetConstants.FLEET_ID, HOTWConstants.RUN_ID])) {
            throw TypeError('Required fields missing: FleetId, RunId');
        }
        fleetId = body[FleetConstants.FLEET_ID];
        runId = body[HOTWConstants.RUN_ID];
    } catch (err) {
        return Util.handleErr(
            API_RESPONSE_STATUS_CODES.MALFORMED_REQUEST_STATUS_CODE,
            `Error parsing request for fleet ${fleetId}`,
            err
        );
    }
    
    // Step 2: Business logic (returns 503 on server error)
    try {
        const fleetIndexId = await fleetOperations.getFleetIndex(fleetId);
        const result = await hotwOperations.createOrder(fleetIndexId, runId, body);
        console.log(`Order created for fleet: ${fleetId}, runId: ${runId}`);
        return Util.handleResponse(
            API_RESPONSE_STATUS_CODES.SUCCESSFUL_OK_STATUS_CODE,
            JSON.stringify(result)
        );
    } catch (err) {
        return Util.handleErr(
            API_RESPONSE_STATUS_CODES.SERVICE_UNAVAILABLE_STATUS_CODE,
            `Error creating order for fleet: ${fleetId}`,
            err
        );
    }
}

Java Lambda:

// βœ… Good EPIC Java code
public void processFleet(String fleetId, String eventId, String runId) {
    try {
        logger.log("Processing fleet: " + fleetId + " for event: " + eventId);
        
        // Get fleet data
        Fleet fleet = epicBackendFleetApiCallsCommon.getLatestFleet(fleetId);
        if (fleet == null) {
            logger.log("Fleet not found: " + fleetId);
            updateStatusWithReason(fleetId, eventId, runId, "FAIL", "Fleet not found");
            return;
        }
        
        // Process
        int hostsNeeded = calculateHostsNeeded(fleet, eventId);
        
        if (hostsNeeded > 0) {
            placeSPCOOrder(fleet, hostsNeeded);
            logger.log("Order placed: " + hostsNeeded + " hosts for fleet: " + fleetId);
        } else {
            logger.log("No order needed. Hosts needed: " + hostsNeeded);
        }
        
    } catch (Exception e) {
        logger.log("Error processing fleet: " + fleetId + ". Error: " + e.getMessage());
        e.printStackTrace();
        // Don't re-throw β€” let finally block run
    } finally {
        // Always update status and publish to SNS
        publishStatusToSNS(fleetId, eventId);
    }
}

Section 10 β€” Key Files to Know by Heart

After reading this guide, these files should feel familiar:

Node.js Backend:

  1. /api/HOTW.js β€” HOTW REST endpoints
  2. /api/EventPlan.js β€” milestone management
  3. /operations/HOTWOperations.js β€” HOTW MySQL queries
  4. /common/Util.js β€” handleResponse, handleErr, validateKeys
  5. /clients/AuroraMysqlClient.js β€” MySQL operations

Java Triggers:

  1. handler/HotwHandler.java β€” HOTW Lambda entry points
  2. HotwHelper/HotwUpscalingHelper.java β€” core HOTW logic
  3. HotwHelper/HardwareOrdersUtil.java β€” calculation utilities
  4. milestone/handler/*.java β€” all milestone handlers
  5. common/api/epicbackend/EPICBackendApiCallsCommon.java β€” API calls

Frontend:

  1. pages/hotwDashboard.jsx β€” HOTW dashboard
  2. pages/serviceDetails.jsx β€” fleet details
  3. common/backend_api.js β€” all backend API calls
  4. store/store.js β€” Redux store setup

CDK:

  1. lib/constants.ts β€” all queue/table names
  2. lib/HOTW/HOTWLambdaStack.ts β€” HOTW infrastructure
  3. lib/apiStack.ts β€” core tables and queues

Section 11 β€” Sample Daily Schedule

9:00 AM   Check Slack/Chime for any overnight alerts
9:15 AM   Check CloudWatch dashboards for errors
9:30 AM   Review your current ticket β€” what's the next step?
10:00 AM  Code time (most productive period)
12:00 PM  Lunch
1:00 PM   Team standup (share: what did yesterday, doing today, blockers)
1:30 PM   Code review β€” check if your PRs have feedback
2:00 PM   More code time
4:00 PM   Write up what you learned today (helps retention)
5:00 PM   Wind down β€” check in with mentor if anything unclear

Weekly rhythm:

  • Monday: New week, check HOTW ran over weekend
  • Weekly: HOTW cron fires (check CloudWatch for results)
  • Pre-peak: More intense β€” hardware orders being placed
  • Post-peak: Descale work, retrospectives

Section 12 β€” Questions to Ask in 1:1 with Manager

Week 1:
  "What's the most impactful thing I could contribute this summer?"
  "What does success look like for my internship project?"

Week 3:
  "I've been working on X. Am I on the right track?"
  "What skills do you think I should develop most?"

Week 6:
  "What's the most complex part of the system I should try to understand?"
  "Is there a component that needs attention that I could help with?"

Week 10 (near end):
  "What would make this a standout project?"
  "Who else on the team should I talk to before I leave?"

Section 13 β€” Final Cheat Sheet

"Where do I find the code for X?"
β”œβ”€β”€ HOTW automation       β†’ EPICBackendTriggers/lambda/handler/HotwHandler.java
β”‚                         β†’ EPICBackendTriggers/lambda/HotwHelper/
β”œβ”€β”€ Hardware order calcs  β†’ HardwareOrdersUtil.java
β”œβ”€β”€ Milestone updates     β†’ milestone/handler/*.java
β”œβ”€β”€ HOTW REST APIs        β†’ EPICBackend/src/epiclambda/api/HOTW.js
β”œβ”€β”€ Milestone REST APIs   β†’ EPICBackend/src/epiclambda/api/EventPlan.js
β”œβ”€β”€ HOTW dashboard UI     β†’ EPICFrontend/src/pages/hotwDashboard.jsx
β”œβ”€β”€ Service details UI    β†’ EPICFrontend/src/pages/serviceDetails.jsx
β”œβ”€β”€ All API routes        β†’ EPICBackendCDK/lib/EPICApiStack.ts
β”œβ”€β”€ AWS resource setup    β†’ EPICBackendCDK/lib/[feature]Stack.ts
└── DB table definitions  β†’ EPICBackendCDK/lib/apiStack.ts

"What triggers X?"
β”œβ”€β”€ HOTW runs weekly      β†’ CloudWatch cron β†’ HotwHandler.handleSQSRequestForUpdateSpco
β”œβ”€β”€ HOTW for one fleet    β†’ atomicHotw button β†’ POST /hotw/atomic/{fleetId}/{eventId}
β”œβ”€β”€ Milestone updates     β†’ HOTW publishes to MilestoneSNSTopic β†’ WorkflowHandler
β”œβ”€β”€ Apollo sync           β†’ Daily cron β†’ ApolloTriggerHandler
β”œβ”€β”€ FMC check             β†’ Periodic cron β†’ FmcTriggerHandler
β”œβ”€β”€ Email notifications   β†’ SNS message β†’ SES Lambda
└── Event plan creation   β†’ EventFleetCreationQueue β†’ FleetReceiver.js

"How do I find out why X failed?"
β”œβ”€β”€ Any Lambda error      β†’ CloudWatch Logs /aws/lambda/[name]
β”œβ”€β”€ SQS not processing    β†’ Check DLQ in SQS console
β”œβ”€β”€ DynamoDB data         β†’ DynamoDB Console β†’ Tables β†’ Items
β”œβ”€β”€ MySQL data            β†’ Query Aurora MySQL directly (ask mentor)
└── API Gateway error     β†’ CloudWatch β†’ API Gateway access logs

Good luck with your internship! πŸš€ You have everything you need to succeed.