Skip to main content

Using subagents effectively

TL;DR

  • Sub agents are beneficial when the detailed intermediate work of a task does not need to be visible to the main AI thread.
  • They excel at isolated tasks like research or applying specific custom instructions, providing a summarized result rather than step-by-step exploration.
  • Avoid using sub agents for tasks requiring sequential dependencies or full visibility into intermediate failures, as information loss can hinder debugging and decision-making.

Takeaways

  • Use sub agents when exploration is separate from execution, meaning the main thread only needs a final answer, not the journey.
  • Delegate research tasks to sub agents (e.g., investigating codebases) where they can process extensive information and return a concise finding.
  • Implement reviewer sub agents to provide objective code feedback by seeing changes in a separate context, unaffected by the main thread's creation history.
  • Leverage sub agents to apply specialized system prompts for tasks like copywriting (tone, audience, style) or styling (design system files) to ensure consistent application of specific rules.
  • Avoid using sub agents simply to claim expertise (e.g., "you are a Python expert"), as this adds overhead without providing additional knowledge to the AI model.
  • Do not create sequential sub agent pipelines where each step depends on discoveries from the previous one, as information loss in handoffs creates problems.
  • Do not use sub agents as test runners if you require the full output for diagnosing issues, as they tend to hide critical diagnostic information.

Vocabulary

sub agent — An isolated AI thread that performs a specific task and returns a summary to a main AI thread. main thread — The primary AI process that orchestrates tasks, potentially delegating to sub agents. system prompt — Initial instructions given to an AI model or sub agent to define its role, behavior, or constraints. context — The information (e.g., code, files, previous turns) an AI model or sub agent has access to during its operation. JWT — (JSON Web Token) A compact, URL-safe means of representing claims to be transferred between two parties, often used for authentication. middleware — Software that acts as a bridge between an operating system or database and applications, especially in web frameworks like Express.js. git diff — A command-line utility that shows the differences between two versions of a file or a repository, commonly used in code reviews. CSS patterns — Reusable and consistent design rules or structures for styling web elements using CSS. sequential sub agent pipelines — A series of sub agents where the output of one sub agent becomes the input for the next, often problematic if steps are highly interdependent. test runner sub agents — Sub agents configured to execute tests, which can be inefficient if they only report pass/fail without detailed output.

Transcript

you know how to create sub agents and design them. Well, now let's cover when they actually help and when they get in the way. Simply put, the difference comes down to whether the intermediate work matters to your main thread. When exploration is separate from execution, sub agents shine. When each step depends on what the previous step discovered, well, information gets lost in the handoff process. Sub agents excel at research tasks where you just need an answer, not the journey. Consider investigating how authentication works in an unfamiliar codebase. Well, the main thread might need to know where is the JWT validated, but doesn't need to see every file that was searched. A research sub agent can read dozens of files, trace through function calls, and explore different code paths. All that exploration stays in the sub agents context. Your main thread receives JWT validation happens in middleware/offJS at line 42 called from the express router and route/ API.js or something like that. Claude reviews work more effectively when the code is presented as being authored by someone else. If you build a feature over many turns with your main thread, asking the main thread to then review it often doesn't give the best feedback. Claude was involved in creating it, so it has trouble seeing it with fresh eyes. A reviewer sub agent sees the changes in a separate context. It runs git diff, reads the modified files, and applies it specialized review criteria without the history of how the code was written. And this separation also lets you encode project specific review standards in the sub aent system prompt ensuring consistent review criteria across the team. Claude Code's default system prompt emphasizes concise code focused response and this works great for coding but not for everything. So one is a copywriting sub aent with instructions about tone, audience and style. This will produce better marketing text than the main thread would. Claude Code's default prompt tends towards concise technical writing, which really isn't what you want for a landing page or email campaign, unless you want to put your customers to sleep. A copywriting sub agent can have completely different instructions about voice and structure. A styling sub agent that app mentions your design system files will apply consistent CSS patterns. When the sub aent runs, those files load into the context automatically. So, it knows your color variables, spacing conventions, and component patterns before it even starts writing any CSS. Sub aents that claim expertise rarely help. Prompts like you are a Python expert or you are a Kubernetes specialist add no value because Claude already has that knowledge. The overhead of launching a sub agent, losing visibility into its work, and compressing its findings into a summary only makes sense when the sub agent does something that the main thread can't, like applying a custom system prompt or keeping exploratory work isolated. Sequential sub agent pipelines create problems. Consider a three agent flow. One to reproduce a bug, one to debug it, and one to fix it. Pipelines work when tasks are truly independent. They fail when each step depends on discoveries from the previous step. Testr runner sub aents tend to hide information you need. When tests fail, you want the full output to diagnose issues. A sub aent that returns a test failed forces you to create additional debug scripts to get details that would have been visible in direct output. testing has showed that the testr runner pattern performed worse among all configurations. Across the series, we covered how sub aents work as isolated threads that return summaries, how to create them with the / aents command, and how to design them with structured outputs and specific descriptions. Use them for research, reviews, and tasks needing custom system prompts, but avoid them for expert claims, multi-step pipelines, and test runners. The key question, does the intermediate work matter? If not, then delegate it.

Feedback / ReportSpotted an issue or have an improvement idea?