Concepts

Model & delegation discipline

Run agents the way you'd run a team: assign the cheapest worker who can do the job well, keep the org chart shallow, and don't let anyone quietly promote themselves.

An over-powered model on a trivial task wastes money; an under-powered one on a judgment task produces confident garbage. The discipline protects both cost and quality.

Pick the cheapest tier that preserves quality#

Match the model tier to the kind of thinking the task needs, not to how important the task feels.

Tier	Use it for
Cheapest / fastest (e.g. Haiku)	Mechanical work with no strong judgment: inventories, searches, counts, simple extraction, direct comparisons, repetitive cleanup, format conversions.
Mid (e.g. Sonnet)	Research, code exploration, reading repos, diagnosis, synthesis, judgment-bearing writing, and tasks that combine several sources.
Top (e.g. Opus)	Only real planning, conflicts between sources, complex trade-offs, architecture, product decisions, or cross-cutting changes with real impact.

Key idea

Default downward. If you're unsure whether a task needs the higher tier, it usually doesn't — but a task that mixes mechanical and judgment work should be split, not promoted wholesale. Projects may override these mappings in AGENT_POLICY.md.

Delegation limits#

The cheapest tier never spawns its own subagents. If a mechanical task needs to delegate, it was scoped wrong — return it to the parent.
Maximum depth: 2 levels (parent → subagent → one more). Deeper nesting loses context and accountability faster than it buys parallelism.
No self-escalation. If a subagent decides it needs a smarter model, it does not upgrade itself — it returns to the parent with what it found and why. The parent decides.
A subagent cannot approve, confirm, or execute in a hot zone. Authorization always returns to the human (or the parent acting under a human's approval). See the authorization model.
Delegation does not replace reading the source. A summary from a subagent is an input, not the ground truth. Validate against the real files before acting.

The tool ladder#

Reach for the lightest tool that can do the job well, and only climb when it genuinely can't:

Simple public pages → a plain fetch.
Dynamic pages, pages behind a login, or pages needing interaction → a browser-driving tool.
PDFs → extract the text first. Use a heavy visual/layout tool only when the layout itself carries meaning.
Local repos → prefer fast search (rg/grep), per-folder inventories, and selective reads before loading large amounts of context.

Encapsulate repetition#

If the same pattern shows up several times — the same multi-step lookup, the same cleanup, the same report — stop repeating it by hand and burning context. Turn it into a reusable tool, script, or documented procedure, then call that. Manual repetition is both a cost leak and an error source.

In one line#

Cheapest model that can do it well; shallow delegation (max depth 2, no self-escalation, no subagent approvals); lightest tool first; script anything you do more than a couple of times.