What two hours with Anthropic's agent team taught me about building AI

Anthropic recently ran a full workshop on their Claude Agent SDK at the AI Engineer conference. Thariq Shihipar, who works on the SDK team, spent nearly two hours walking through how Anthropic builds agents internally.

I've built agentic systems in production across insurance, defence, and enterprise software. Some of what I heard confirmed patterns I already use. Some gave me new tools. All of it crystallised into five principles that I think define how production agents should be built right now.

The actual agent code is about 50 lines. The real engineering is everything around it.

Here are the five principles.

1. Your agent is the environment, not the code

This was the workshop's most powerful reframing. The orchestration code is trivial. The real engineering is what you put in the working directory.

A typical agent project:

your-agent/
├── agent.ts              # ~50 lines of boilerplate
├── CLAUDE.md             # Instructions, API descriptions, rules
├── scripts/              # Bash tools with --help
├── lib/                  # TypeScript SDKs and types
├── data/                 # Reference data
├── skills/               # On-demand context
├── memories/             # Persistent state (just files)
└── examples/             # Example scripts for codegen

Context is not just a prompt. It's the scripts the agent can discover. The TypeScript SDK you generated for your API. The CLAUDE.md file describing what's available. The skill directories the agent can navigate into when it needs specialised knowledge.

Thariq used a good analogy. If someone locked you in a room and gave you tasks, would you want a stack of papers or a computer with Google? You'd want the computer. Give the agent the tools to find its own information, not the information itself.

I've seen this principle in my own work. The insurance automation system I designed resolves 67% of cases autonomously, and the agent code is a small fraction of the engineering effort. Most of the work went into the environment: data pipelines, decision rules, escalation logic, and verification layers. Once you make that shift, agent development becomes less about writing clever code and more about curating an excellent workspace.

2. A shell replaces a tool registry

Instead of defining a search tool, a lint tool, an execute tool, and a test tool, each with its own schema and error handling, Anthropic's approach is to give the agent bash. The agent just uses grep, npm run lint, npm test, and ffmpeg. One generic tool replaces an entire registry.

The reason this works is composability. Bash lets the agent pipe outputs together, save intermediate results to files, discover new capabilities by running --help, and leverage the entire Unix ecosystem. You don't need to anticipate every action the agent might take. You give it a shell and it composes the tools that already exist.

This is how Claude Code is built, and it's the most capable coding agent available today. The same architecture powers non-coding agents in finance, legal, healthcare, and customer service. The bash tool is what makes it generalise.

The workshop laid out a clear decision framework for when to use each capability. Tools for irreversible actions where you want explicit confirmation. Bash for composable operations. Code generation for dynamic logic.

Capability selection: decision tree for choosing tools, bash, or code generation

The practical takeaway: if you're maintaining more than a handful of custom tool definitions, you're probably doing work the shell could do better.

3. Translate your data into the model's language

This is the principle with the highest return on effort. Make your problem in-distribution.

If you've only tried one search interface for your data, it's probably not enough. For a spreadsheet agent, you might try SQL queries, cell reference syntax, grep on CSV, or XML queries (XLS files are XML underneath). Different approaches work differently depending on the data shape.

The most impactful thing you can do is translate your data into a format the model already knows well. Loading a CSV into SQLite and letting the agent write SQL queries is often dramatically more effective than any custom search tool. Generating TypeScript interfaces from your API schema gives the model type information it can reason about natively.

This single step often produces more improvement than any amount of prompt engineering. It's also the easiest to test: try two formats, measure which produces better results, keep the winner.

4. Verify at every layer, not just the output

Every agent follows three steps: gather context, take action, verify work. Planning sits optionally between the first and second steps.

The agent loop: gather context, plan, take action, verify work, iterate

That sounds trivial. Getting it right is not.

Verification is a discipline applied at every layer, not a final check. This aligns precisely with what I described in my earlier piece on agentic AI in production. Errors compound in multi-step pipelines. Catching them early is dramatically cheaper than catching them at the end.

Anthropic uses deterministic rules wherever possible: linting, compilation, schema validation, constraint checks. In Claude Code, if the agent tries to write a file it hasn't read, the harness throws an error and tells it to read first. Simple, deterministic, and extremely effective.

A practical detail worth noting: the model reads error messages and iterates. If your error says "you tried to insert 50,000 rows in one operation, please chunk this into batches of 1,000 or fewer," the agent will follow that coaching. Design your error messages as instructions, not diagnostics.

Sub-agents add a powerful verification layer. You spawn a sub-agent with a completely fresh context and frame the task combatively: "This analysis was written by a junior analyst. Find the errors." The fresh context means the verifier has no sympathetic relationship with the work. It's genuinely adversarial.

Sub-agent architecture: main agent delegates to parallel workers and a verification agent

The gap between self-verification and adversarial verification is significant. I plan to integrate this pattern into my own production deployments. Sub-agents also shine for parallel processing ("read and summarise sheet 1, sheet 2, sheet 3 simultaneously") and search offloading, where many queries run but only the final answer returns to the main agent. Both patterns keep the main context clean and focused.

5. Read every transcript. Then improve the environment

The workshop was emphatic about one workflow: don't start with the SDK. Start with Claude Code.

Set up a working directory with your APIs, data, and scripts. Write a CLAUDE.md. Chat with Claude Code and give it real tasks. Then read the transcripts.

This is the single most important practice for improving agent design, and it's the one most people skip.

After every session, read the full transcript. Where did the agent get stuck? Where did it take unnecessary detours? What context was it missing? Where did it use training knowledge instead of your data?

Iterate on the environment based on what you learn. Improve the instructions. Add scripts. Create skills. Add hooks to catch behaviour you don't want. Then read the transcripts again.

The iteration loop between reading transcripts and improving context is where most of the real engineering happens. Once results feel good, writing the agent.ts file and deploying to a sandbox is straightforward.

A note on security

Anthropic describes their security model as a "Swiss cheese defence." No single layer blocks everything. Together, the layers cover each other's gaps.

Your job as an agent builder is the outer layer: sandboxing. One container per user. Network isolation. Filesystem isolation. Use providers like Cloudflare, Modal, E2B, or Daytona.

The "lethal trifecta" to guard against: code execution, filesystem changes, and data exfiltration. Sandbox the network and you cut off exfiltration even if the other two are compromised.

For database access, the principle is counterintuitive but sound: give the agent broad access, then add guardrails. Letting the agent write dynamic SQL and fix its own errors through iteration produces better results than restricting it to predefined queries. Reserve strict masking for genuinely sensitive data.

Build for now. Rewrite in six months

Thariq closed with something that resonated. Model capabilities change fast. Code that was necessary six months ago may be unnecessary today. Build for current capabilities. Ship now. Rewrite later.

The companies moving fastest are the ones willing to throw away code and rebuild with current capabilities. As Thariq put it: "We can write code 10x faster. We should throw out code 10x faster."

That's not instability. That's the natural rhythm of building on a platform that's improving faster than any in history. The best time to start is now, knowing you'll improve it later. If you want a structured way to move from prototype to production, my methodology page describes how I approach these engagements, and the transformation sprint is designed specifically for getting agentic systems into production.

Building an agentic system and want to validate your architecture? Let's talk.