AI & Agent Evaluation
660total visitsadmin

Anthropic — Dynamic workflows in Claude Code

Claude Code docs · source date 2026-06-02 · added 2026-06-18 03:30:48 · updated 2026-06-18 03:30:48 · Open original blog

Problems / challenges / motivations

  • Large coding-agent tasks often exceed what one linear chat can manage. Audits, migrations, and cross-checks need many independent passes, shared structure, and reproducible coordination.
  • Static hand-written harnesses can become a bottleneck: the right decomposition depends on the repository, task, files, risks, and verification target.
  • If the orchestration lives only in a conversation transcript, it is hard to rerun, inspect, or improve.

Key ideas

  • Claude Code dynamic workflows let the agent write a script that orchestrates many subagents, then rerun that workflow as a concrete artifact.
  • The harness becomes part of the work product: it encodes task decomposition, prompts, inputs, output collection, and aggregation logic.
  • This is useful for codebase audits, large migrations, parallel investigations, and cross-checked reviews where independent subagents can reduce blind spots.
  • The design pattern is “agent writes its own harness”: instead of only asking Claude to solve the task, ask it to construct the repeatable workflow that solves and verifies the task.

Why it matters for evals

  • It makes agent evaluation more operational: the object being evaluated is not just a final answer, but a workflow that coordinates workers, collects evidence, and can be rerun.
  • It pushes agent design toward executable scaffolding: prompts, subtask routing, verification, and aggregation become inspectable code.
  • For coding-agent evals, this suggests testing whether agents can build reliable harnesses around themselves, not merely whether they can complete one isolated task.

Comments

No comments yet.