Exam Scenarios
The real exam presents 4 of these 6 scenarios, selected at random. Each scenario frames a set of scenario-based multiple choice questions. Study all 6 — you won't know which 4 will appear.
🎧
Scenario 1
Customer Support Resolution Agent
You are building a customer support resolution agent using the Claude Agent SDK. The agent handles high-ambiguity requests like returns, billing disputes, and account issues. It has access to your backend systems through custom MCP tools:
get_customer, lookup_order, process_refund, and escalate_to_human. Your target is 80%+ first-contact resolution while knowing when to escalate.
D1 · Agentic Architecture
D2 · Tool Design & MCP
D5 · Context & Reliability
MCP Tools
get_customer
lookup_order
process_refund
escalate_to_human
Architecture
Claude Agent SDK + custom MCP server exposing 4 backend tools
Key Constraint
get_customer must complete before process_refund — identity must be verified firstTarget
80%+ first-contact resolution; calibrated escalation for complex cases
Risk
Mis-identified accounts → incorrect refunds; wrong escalation calibration
What the Exam Tests
- When to enforce tool ordering programmatically (prerequisite gates) vs relying on prompt instructions — prompt instructions have non-zero failure rates for safety-critical sequences
- How expanding tool descriptions with input formats, example queries, and boundary explanations reduces misrouting between similar tools (
get_customervslookup_order) - Escalation calibration: adding explicit criteria + few-shot examples to calibrate when to resolve autonomously vs escalate
- Structured handoff summaries: what to include when calling
escalate_to_human(customer ID, root cause, refund amount, recommended action) PostToolUsehooks for normalizing heterogeneous data formats (Unix timestamps, ISO 8601, numeric status codes) before the agent processes them- Task decomposition for multi-concern requests: splitting multi-issue cases into distinct items, investigating in parallel, then synthesizing a unified resolution
💻
Scenario 2
Code Generation with Claude Code
You are using Claude Code to accelerate software development. Your team uses it for code generation, refactoring, debugging, and documentation. You need to integrate it into your development workflow with custom slash commands, CLAUDE.md configurations, and understand when to use plan mode vs direct execution.
D3 · Claude Code Config
D5 · Context & Reliability
Config Hierarchy
~/.claude/CLAUDE.md → project CLAUDE.md → directory CLAUDE.mdCustom Commands
.claude/commands/ (team, version-controlled) vs ~/.claude/commands/ (personal)Plan Mode
Architectural changes, library migrations, multi-file edits with valid alternatives
Direct Execution
Well-scoped changes: single-file bug fix, adding one validation conditional
What the Exam Tests
- CLAUDE.md hierarchy: which level (user / project / directory) controls what, and why user-level settings aren't shared with teammates
- When to use plan mode vs direct execution — the triggering criteria (architectural decisions, multiple valid approaches, large-scale changes)
- Custom slash commands and skills:
.claude/commands/for team workflows,context: forkfrontmatter to isolate verbose skill output from the main conversation - Path-specific rules via
.claude/rules/with YAMLpathsfrontmatter — load only when editing matching files - Session management:
--resume <session-name>for resumption,fork_sessionfor parallel exploration branches - When to start fresh with an injected summary vs resume (prior tool results stale after file modifications)
🔬
Scenario 3
Multi-Agent Research System
You are building a multi-agent research system using the Claude Agent SDK. A coordinator agent delegates to specialized subagents: one searches the web, one analyzes documents, one synthesizes findings, and one generates reports. The system researches topics and produces comprehensive, cited reports.
D1 · Agentic Architecture
D2 · Tool Design & MCP
D5 · Context & Reliability
Subagent Roles
Coordinator
Decomposes query, delegates to subagents, aggregates results, evaluates coverage gaps
Web Search Agent
Searches web for sources; returns structured results with URLs and excerpts
Document Analysis Agent
Analyzes provided documents, extracts claims with page/source attribution
Synthesis Agent
Combines findings from prior agents; may have a scoped
verify_fact toolReport Agent
Generates final cited report from synthesis output
Spawning Mechanism
Task tool — coordinator's allowedTools must include "Task"What the Exam Tests
- Hub-and-spoke coordination: all inter-subagent communication routes through the coordinator for consistent error handling and observability
- Context isolation: subagents do not inherit coordinator conversation history — context must be explicitly included in each subagent's prompt
- Parallel vs sequential spawning: emitting multiple
Tasktool calls in a single coordinator response to run subagents in parallel - Scope partitioning: assigning distinct subtopics or source types to each subagent to minimize duplication
- Citation provenance: using structured data formats to separate content from metadata (source URLs, document names, page numbers) when passing context between agents
- Iterative refinement: coordinator evaluates synthesis gaps, re-delegates targeted queries, re-invokes synthesis until coverage is sufficient
- Tool scoping: restricting subagent tool sets to their role (synthesis agent should not have web search tools)
🛠️
Scenario 4
Developer Productivity with Claude
You are building developer productivity tools using the Claude Agent SDK. The agent helps engineers explore unfamiliar codebases, understand legacy systems, generate boilerplate code, and automate repetitive tasks. It uses built-in tools (
Read, Write, Bash, Grep, Glob) and integrates with Model Context Protocol (MCP) servers.
D2 · Tool Design & MCP
D3 · Claude Code Config
D1 · Agentic Architecture
Built-in Tools
Read
Write
Edit
Bash
Grep
Glob
Task
Grep
Searching file contents for patterns: function names, error messages, import statements
Glob
Finding files by path pattern:
**/*.test.tsx, all files in a directory subtreeRead + Write
Fallback when Edit fails due to non-unique text match; full file operations
MCP Scoping
Shared team tooling →
.mcp.json (project-level); personal/experimental → ~/.claude.jsonWhat the Exam Tests
- Selecting the right built-in tool: Grep for content search, Glob for path patterns, Read for full file loads, Edit for targeted modifications
- Incremental codebase understanding: start with Grep to find entry points, then Read to follow imports and trace flows — not reading all files upfront
- Tool set size management: giving an agent 18 tools instead of 4–5 degrades selection reliability; agents with out-of-scope tools tend to misuse them
- MCP server scoping: project-level
.mcp.jsonfor tools shared via version control; user-level~/.claude.jsonfor personal/experimental servers - Enhancing MCP tool descriptions to prevent the agent from preferring built-in tools (like Grep) over more capable MCP-provided alternatives
- Choosing community MCP servers over custom implementations for standard integrations (e.g., Jira)
⚙️
Scenario 5
Claude Code for Continuous Integration
You are integrating Claude Code into your CI/CD pipeline. The system runs automated code reviews, generates test cases, and provides feedback on pull requests. You need to design prompts that provide actionable feedback and minimize false positives.
D3 · Claude Code Config
D4 · Prompt Engineering
Non-interactive Mode
claude -p / claude --print — prevents interactive input hangs in CIStructured Output
--output-format json with --json-schema for machine-parseable PR comment dataCI Context
CLAUDE.md provides testing standards, fixture conventions, review criteria to CI-invoked Claude Code
Review Independence
Independent Claude instance (not the generator's session) catches more issues — no retained reasoning context
What the Exam Tests
- The
-p/--printflag: required for running Claude Code non-interactively in CI — prevents hangs waiting for user input --output-format jsonwith--json-schema: produces structured findings that can be automatically posted as inline PR comments- Why the session that generated code is less effective at reviewing it — a model retains its generation reasoning context and is less likely to question its own decisions
- Including prior review findings on re-runs to instruct Claude to report only new or still-unaddressed issues, avoiding duplicate comments
- Providing existing test files in context so test generation avoids duplicating already-covered scenarios
- Explicit criteria for what to flag: "flag only issues where code behavior contradicts documented requirements" beats "be conservative"
- Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories
📊
Scenario 6
Structured Data Extraction
You are building a structured data extraction system using Claude. The system extracts information from unstructured documents, validates the output using JSON schemas, and maintains high accuracy. It must handle edge cases gracefully and integrate with downstream systems.
D4 · Prompt Engineering
D5 · Context & Reliability
Output Enforcement
tool_use with JSON schema as input parameters — eliminates syntax errors, guarantees structuretool_choice Options
"auto" (may return text), "any" (must call a tool), {"type":"tool","name":"..."} (forced)Validation Loop
Append specific validation errors to retry prompt — model self-corrects format/structure mismatches
Batch API
50% cost savings, up to 24-hour processing window — for non-blocking, latency-tolerant workloads
What the Exam Tests
tool_usewith JSON schema is the most reliable approach for structured output — it eliminates syntax errors but does not prevent semantic errors (values in wrong fields, line items that don't sum to total)tool_choice: "any"guarantees the model calls a tool rather than returning conversational text — use when document type is unknown and multiple schemas exist- Designing optional (nullable) schema fields when source documents may not contain the information — prevents model from fabricating values to satisfy required fields
- Retry-with-error-feedback: including the original document, failed extraction, and specific validation errors in the retry prompt for model self-correction
- When retries are ineffective: if information is simply absent from the source document, retrying cannot produce it — address the schema or accept null
- Few-shot examples with varied document structures reduce extraction failures from format diversity (inline citations vs bibliographies, measurement formats)
- Batch API matching: use synchronous API for pre-merge checks (blocking); batch API for overnight document processing (non-blocking, latency-tolerant)