Exam Scenarios

🎧 Scenario 1

Customer Support Resolution Agent

You are building a customer support resolution agent using the Claude Agent SDK. The agent handles high-ambiguity requests like returns, billing disputes, and account issues. It has access to your backend systems through custom MCP tools: get_customer, lookup_order, process_refund, and escalate_to_human. Your target is 80%+ first-contact resolution while knowing when to escalate.

D1 · Agentic Architecture D2 · Tool Design & MCP D5 · Context & Reliability

MCP Tools

get_customer lookup_order process_refund escalate_to_human

Architecture

Claude Agent SDK + custom MCP server exposing 4 backend tools

Key Constraint

get_customer must complete before process_refund — identity must be verified first

Target

80%+ first-contact resolution; calibrated escalation for complex cases

Risk

Mis-identified accounts → incorrect refunds; wrong escalation calibration

What the Exam Tests

When to enforce tool ordering programmatically (prerequisite gates) vs relying on prompt instructions — prompt instructions have non-zero failure rates for safety-critical sequences
How expanding tool descriptions with input formats, example queries, and boundary explanations reduces misrouting between similar tools (get_customer vs lookup_order)
Escalation calibration: adding explicit criteria + few-shot examples to calibrate when to resolve autonomously vs escalate
Structured handoff summaries: what to include when calling escalate_to_human (customer ID, root cause, refund amount, recommended action)
PostToolUse hooks for normalizing heterogeneous data formats (Unix timestamps, ISO 8601, numeric status codes) before the agent processes them
Task decomposition for multi-concern requests: splitting multi-issue cases into distinct items, investigating in parallel, then synthesizing a unified resolution

Practice D1 Questions Practice D2 Questions Study: Enforcement & Handoffs → Study: Tool Design →

💻 Scenario 2

Code Generation with Claude Code

You are using Claude Code to accelerate software development. Your team uses it for code generation, refactoring, debugging, and documentation. You need to integrate it into your development workflow with custom slash commands, CLAUDE.md configurations, and understand when to use plan mode vs direct execution.

D3 · Claude Code Config D5 · Context & Reliability

Config Hierarchy

~/.claude/CLAUDE.md → project CLAUDE.md → directory CLAUDE.md

Custom Commands

.claude/commands/ (team, version-controlled) vs ~/.claude/commands/ (personal)

Plan Mode

Architectural changes, library migrations, multi-file edits with valid alternatives

Direct Execution

Well-scoped changes: single-file bug fix, adding one validation conditional

What the Exam Tests

CLAUDE.md hierarchy: which level (user / project / directory) controls what, and why user-level settings aren't shared with teammates
When to use plan mode vs direct execution — the triggering criteria (architectural decisions, multiple valid approaches, large-scale changes)
Custom slash commands and skills: .claude/commands/ for team workflows, context: fork frontmatter to isolate verbose skill output from the main conversation
Path-specific rules via .claude/rules/ with YAML paths frontmatter — load only when editing matching files
Session management: --resume <session-name> for resumption, fork_session for parallel exploration branches
When to start fresh with an injected summary vs resume (prior tool results stale after file modifications)

Practice D3 Questions Study: CLAUDE.md Hierarchy → Study: Plan Mode →

🔬 Scenario 3

Multi-Agent Research System

You are building a multi-agent research system using the Claude Agent SDK. A coordinator agent delegates to specialized subagents: one searches the web, one analyzes documents, one synthesizes findings, and one generates reports. The system researches topics and produces comprehensive, cited reports.

D1 · Agentic Architecture D2 · Tool Design & MCP D5 · Context & Reliability

Subagent Roles

Coordinator

Decomposes query, delegates to subagents, aggregates results, evaluates coverage gaps

Web Search Agent

Searches web for sources; returns structured results with URLs and excerpts

Document Analysis Agent

Analyzes provided documents, extracts claims with page/source attribution

Synthesis Agent

Combines findings from prior agents; may have a scoped verify_fact tool

Report Agent

Generates final cited report from synthesis output

Spawning Mechanism

Task tool — coordinator's allowedTools must include "Task"

What the Exam Tests

Hub-and-spoke coordination: all inter-subagent communication routes through the coordinator for consistent error handling and observability
Context isolation: subagents do not inherit coordinator conversation history — context must be explicitly included in each subagent's prompt
Parallel vs sequential spawning: emitting multiple Task tool calls in a single coordinator response to run subagents in parallel
Scope partitioning: assigning distinct subtopics or source types to each subagent to minimize duplication
Citation provenance: using structured data formats to separate content from metadata (source URLs, document names, page numbers) when passing context between agents
Iterative refinement: coordinator evaluates synthesis gaps, re-delegates targeted queries, re-invokes synthesis until coverage is sufficient
Tool scoping: restricting subagent tool sets to their role (synthesis agent should not have web search tools)

Practice D1 Questions Study: Multi-Agent Patterns → Study: Context & Spawning →

🛠️ Scenario 4

Developer Productivity with Claude

You are building developer productivity tools using the Claude Agent SDK. The agent helps engineers explore unfamiliar codebases, understand legacy systems, generate boilerplate code, and automate repetitive tasks. It uses built-in tools (Read, Write, Bash, Grep, Glob) and integrates with Model Context Protocol (MCP) servers.

D2 · Tool Design & MCP D3 · Claude Code Config D1 · Agentic Architecture

Built-in Tools

Read Write Edit Bash Grep Glob Task

Grep

Searching file contents for patterns: function names, error messages, import statements

Glob

Finding files by path pattern: **/*.test.tsx, all files in a directory subtree

Read + Write

Fallback when Edit fails due to non-unique text match; full file operations

MCP Scoping

Shared team tooling → .mcp.json (project-level); personal/experimental → ~/.claude.json

What the Exam Tests

Selecting the right built-in tool: Grep for content search, Glob for path patterns, Read for full file loads, Edit for targeted modifications
Incremental codebase understanding: start with Grep to find entry points, then Read to follow imports and trace flows — not reading all files upfront
Tool set size management: giving an agent 18 tools instead of 4–5 degrades selection reliability; agents with out-of-scope tools tend to misuse them
MCP server scoping: project-level .mcp.json for tools shared via version control; user-level ~/.claude.json for personal/experimental servers
Enhancing MCP tool descriptions to prevent the agent from preferring built-in tools (like Grep) over more capable MCP-provided alternatives
Choosing community MCP servers over custom implementations for standard integrations (e.g., Jira)

Practice D2 Questions Study: Built-in Tools → Study: MCP Integration →

⚙️ Scenario 5

Claude Code for Continuous Integration

You are integrating Claude Code into your CI/CD pipeline. The system runs automated code reviews, generates test cases, and provides feedback on pull requests. You need to design prompts that provide actionable feedback and minimize false positives.

D3 · Claude Code Config D4 · Prompt Engineering

Non-interactive Mode

claude -p / claude --print — prevents interactive input hangs in CI

Structured Output

--output-format json with --json-schema for machine-parseable PR comment data

CI Context

CLAUDE.md provides testing standards, fixture conventions, review criteria to CI-invoked Claude Code

Review Independence

Independent Claude instance (not the generator's session) catches more issues — no retained reasoning context

What the Exam Tests

The -p / --print flag: required for running Claude Code non-interactively in CI — prevents hangs waiting for user input
--output-format json with --json-schema: produces structured findings that can be automatically posted as inline PR comments
Why the session that generated code is less effective at reviewing it — a model retains its generation reasoning context and is less likely to question its own decisions
Including prior review findings on re-runs to instruct Claude to report only new or still-unaddressed issues, avoiding duplicate comments
Providing existing test files in context so test generation avoids duplicating already-covered scenarios
Explicit criteria for what to flag: "flag only issues where code behavior contradicts documented requirements" beats "be conservative"
Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories

Practice D3 Questions Practice D4 Questions Study: CI/CD Integration → Study: Explicit Criteria →

📊 Scenario 6

Structured Data Extraction

You are building a structured data extraction system using Claude. The system extracts information from unstructured documents, validates the output using JSON schemas, and maintains high accuracy. It must handle edge cases gracefully and integrate with downstream systems.

D4 · Prompt Engineering D5 · Context & Reliability

Output Enforcement

tool_use with JSON schema as input parameters — eliminates syntax errors, guarantees structure

tool_choice Options

"auto" (may return text), "any" (must call a tool), {"type":"tool","name":"..."} (forced)

Validation Loop

Append specific validation errors to retry prompt — model self-corrects format/structure mismatches

Batch API

50% cost savings, up to 24-hour processing window — for non-blocking, latency-tolerant workloads

What the Exam Tests

tool_use with JSON schema is the most reliable approach for structured output — it eliminates syntax errors but does not prevent semantic errors (values in wrong fields, line items that don't sum to total)
tool_choice: "any" guarantees the model calls a tool rather than returning conversational text — use when document type is unknown and multiple schemas exist
Designing optional (nullable) schema fields when source documents may not contain the information — prevents model from fabricating values to satisfy required fields
Retry-with-error-feedback: including the original document, failed extraction, and specific validation errors in the retry prompt for model self-correction
When retries are ineffective: if information is simply absent from the source document, retrying cannot produce it — address the schema or accept null
Few-shot examples with varied document structures reduce extraction failures from format diversity (inline citations vs bibliographies, measurement formats)
Batch API matching: use synchronous API for pre-merge checks (blocking); batch API for overnight document processing (non-blocking, latency-tolerant)

Practice D4 Questions Practice D5 Questions Study: Structured Output → Study: Validation & Retry →