How to get to production faster with Claude Managed Agents

As AI model capabilities increase exponentially, the complexity and duration of tasks delegated to agents grow, shifting the primary bottleneck from intelligence to infrastructure, reliability, and security.
Anthropic's Claude Managed Agents platform addresses these challenges by providing a comprehensive solution with built-in infrastructure, automatic context management, checkpointing, and robust observability.
This platform enables developers to build and deploy complex, long-running agentic applications that can autonomously achieve outcomes, learn, and improve over time with enhanced reliability and security.

The "AI exponential" means agents are tackling tasks from minutes to days or even quarters of work, necessitating a true agentic runtime beyond simple prompting scaffolding.
Claude Managed Agents solve critical developer pain points: context management (one in three struggle), infrastructure concerns like security and credential management (half cite as a blocker), and lack of formal observability for probabilistic outputs (majority of users).
The platform combines essential infrastructure (tool permissioning, execution, automatic context management, checkpointing, retries) with foundational building blocks for customizable agent composition and rich observability.
The mental model for Managed Agents involves defining an agent (configuration of model, prompt, tools, skills), specifying an environment (networking, packages), and executing sessions that emit events.
A detailed event topology categorizes events into user, agent, session, and span events, providing a single pane of glass in the console for real-time analysis, debugging, and performance improvement.
Advanced capabilities include multi-agent orchestration for task decomposition, outcomes for agents to iterate until predefined criteria are met, Memory for persistent learning across sessions, and the dreaming platform for codifying new learnings.
Developers can get started using a Claude Code skill (\Claude API) for agent management, a powerful CLI for programmatic control and YAML configuration, and cookbooks for practical examples.
The "outer loop" workflow allows users to provide feedback on agent outputs, enabling tools like Claude Code to autonomously modify instructions or rubrics and kick off new, optimized sessions.

Agentic runtime — A robust execution environment designed to support complex, long-running, and autonomous AI agents, going beyond simple prompt-response interactions. Outcome-oriented tasks — Tasks given to an agent that specify a desired end state or rubric, allowing the agent to iterate and work until that outcome is achieved. Context management — The process of providing an AI agent with the right information at the right time to perform its task effectively, avoiding distraction or information overload. Infra concerns — Challenges related to the underlying infrastructure supporting AI agents, such as credential management, security, access control, and keeping a human in the loop. Observability — The ability to understand the internal state of a system (like an AI agent) by examining its outputs, logs, and events, crucial for debugging and improving performance. Checkpointing — The process of saving the state of a long-running agent execution at specific intervals, allowing it to stop, resume, or recover from failures without losing all progress. Multi-agent orchestration — The coordination and management of multiple AI agents working together, often by delegating sub-tasks to specialized agents, to achieve a larger goal. Outcomes — A feature in Claude Managed Agents where the agent continues to iterate and refine its work until a predefined set of exit criteria or goals is satisfied. Memory — A feature allowing AI agents to read from and write to persistent memory stores, enabling them to retain learnings and context across different sessions. Dreaming platform — A system where AI agents can reflect on past experiences, codify new learnings, and improve their performance or capabilities between runs.

Welcome to the stage, members of technical staff of Anthropic, Jess Yan and Lance Mark. Everyone, hope everyone's having a great time at Code with Clod. I'm Jess Yan, product for Clod Managed Agents. I'm Lance Martin, DevX team and I do whatever Jess tells me. That's news to me. But today we have a great agenda for you guys. First we'll be starting with the AI exponential and how that has materially changed the agenda development. We'll talk about the motivations for why you built Clod Managed Agents. A deep dive into the primitives, a couple of demos to bring some sizzle to life, and then we'll walk you through our getting started resources so that you can start building some agents of your own. All right, so we're all familiar with how our model capabilities are increasing exponentially. But as these capabilities increase, so do the task horizons and the complexity of the work that we're delegating to our agents. We're seeing that the bottleneck is increasingly infrastructure and not intelligence. So a couple of years ago, you might have had Opus write and test a single component. You might be testing a debug flaky test suite. And this would be minutes, maybe an hour of focus work. You'd be steering it heavily along the way and you'd be correcting it as it went off course. More recently with our latest models, we're seeing that people are running things overnight, walking away, waking up the next morning, and seeing that their entire linear backlog has been resolved by an agent. In the not too distant future, we might see agents take on work that historically took quarters of teams working on it. Multi agent coordinated teams will be running a full M&A pipeline and to end. And as tasks evolve from prompts to hours and hours and days of work, we need not just prompting scaffolding, but a true, agentic runtime. That's right. With manage agents, one of the main things we solved is reliability and security. So with long rise in tasks, these become much greater issues. You have an agent that's working for hours, weeks, or days, and need to be reliable. You also need to be secure. And also new interaction modes become possible when you have agents working over long time horizons. So for example, with chat bots, it's pretty instantaneous or short horizon interactions. Logarize an agent requires something we call outcome-oriented tasks. You give an agent a task, and an outcome like a rubric that indicates what completion means. We also need the ability for agents to stop and resume over the course of long execution, potentially to ask questions to clarify their work. Honestly, start and resume is the most human-like of interaction patterns because there's nothing more human than procrastination. Right. So it's clear that we expect a lot of our agents. And that means historically, we've pushed that burden on to you guys, the developers. We've expected a lot of you as well. In research that we conducted prior to launching managed agents, we found that developers were seriously struggling. So one in three were struggling with context management. And the context at the right time can be incredibly powerful. It's the knowledge that the agent needs to do work. But context at the wrong time can be a huge distraction. Half of our developers were citing that infer concerns were their number one production blocker. This is some of the concerns that Lance was talking about, about credential management, security and access, keeping a human in the loop. And then lastly, a majority of our users are saying that their agents are running with no formal observability. These agents are running off of predictive models, randomized outputs. And this is very unlike traditional software development of the past. How do you know if your agent is doing something good if it is producing random or probabilistic outputs? So enter Claude Managed Agents. We built this platform so that you don't have to. We combined infrastructure and harness, so tool permissioning, tool execution, automatic context management, check pointing retries with foundational building blocks that Lance will go into in a bit later. That makes it easy to understand how to compose a customizable agent quickly. And then lastly, we paired it with a rich observability platform. We don't want these agents to be running on vibes. You should be able to understand exactly what your agent is doing and how you can improve it. Yeah, that's right. And so using Managed Agents is actually really simple. The mental model is basically as follows. You're defining an agent. Aging you can think of as a configuration. It has particular model. It has a prompt. It has tools. It has skills. You're laying that out. And then you're allowing that agent to use an environment, which you can figure. You can figure the networking packages. And this is where the agent can, for example, write code. And any given execution of the agent is a session. Now sessions can have resources. For example, get a repose. They can have something like an outcome, which we'll talk about a little bit more later. And these sessions emit events, which Jess will cover here briefly, that you can then handle and process and use to understand what the agent is doing. So let's walk the event topology. As agents do more and more complex tasks, the event types that are produced are more and more complex as well. And so we've separated them into four broad categories. So there are user events. You steer the agent, you guide it, you interrupt it, you define exit criteria. There are agent events. These convey what the agent is doing, what tools it's running, how it's compacting its context over time, who it's delegating to. There are session events, which help you track the life cycle of your work. So is the agent running? Is it idle? Is it waiting for your inputs? And then lastly, there are span events. This is broader instrumentation that lets you group related events together. So we'll pivot to an example agent that we've built. We call it Pascal. It runs on a hypothetical grocery store's data set, a grocery store called Justin Time. And it produces rich analytics and insights in minutes. Leveraging, it's preloaded container loaded with a set of Python packages. You can see every event in console, and you can even diagnose the event stream after the fact. Walk through it with a bit of a demo. So we're starting the agent execution right now. You can see that the events are updating real time in console. And console supports a single pane of glass that lets you analyze the agent's configuration, as well as his environment, as you're looking at the event produced. Pascal has started to cook. It's starting to produce some outputs. It first starts with an analysis of the products. We're learning that bananas are really, really popular. The second output that it'll create is an analysis of the shoppers. And we're learning that Sunday morning is peak grocery time. And lastly, my favorite output is a bit of a predictive model where it's analyzing what is the reorder probability for a single customer given its demographic profile. Now that the agent has finished its completion, the full event stream is available in console, and we can analyze its performance. We offer debug agent and console so that you can look at the event stream, analyze bottlenecks, figure out ways to improve the agent going forward, and take recommended actions. So it looks like it has identified a few bottlenecks, which we can then go fix directly in Claude Code. Yeah. So what you saw is the console showing two really cool things. It's showing you the trace for everything that happened in the session. And it allows you to analyze what happened using Claude to look at the session log and give you analytics insights and so forth. Now how do you practically get started? First, I want to promote something that worked on quite a bit. Is a skill built into Claude Code and a ship globally? In Claude Code today, if you used to backslash Claude API, you'll access our skill, which understands managing the agent's extremely well. And I use this all the time. In fact, I don't write a lot of management code myself. I have Claude Code do it. It's a very nice trick. And I'll show you some very nice tricks later for how it can also be used to grab session logs and so forth. It uses the CLI to grab those logs. CLI is very powerful. It lets you configure agents as, for example, YAML files, which you can check in. It allows you to grab sessions programmatically, which is very useful for code agents. And we also have cookbooks. Artisanal code. Yes. So this is what we've been building with since we launched. I wanted to also touch on some of the more advanced capabilities that we've been shipping over the last couple weeks. Each one extends the capabilities of the agent experiences that you can offer. So first, there's multi-agent orchestration. Claude can clone itself. Claude can delegate to pre-configured additional agents. And this allows complex tasks to be decomposed into smaller units that are achieved with better fidelity. We have outcomes, which we've talked about a little bit earlier in this presentation, where Claude iterates until it satisfies predefined exit criteria. You define the goal, Claude keeps going until it's finished. A couple weeks ago, we launched Memory to Public Beta. And with Memory Solutions, Claude doesn't have to start each new session fresh. Instead, it's reading and writing to persistent memory stores. And then lastly, today, my colleague Mahesh announced our dreaming platform. Here, Claude is reflecting and codifying new learnings into new memories. Agents can literally improve between every single run. So we'll showcase another demo now that showcases how Claude manage agents uses outcomes and multi-agent to produce great outputs. Previously, so on console, our analysis of a single session, Lance will walk through how we can do this programmatically at scale through the course of multiple sessions. Yeah, so this is actually one of the most fun demos I'd have built for this conference. And it came from a number of weeks ago, Angela, our header product actually prompted me with question, what would the AGI-pilled CEO have at his disposal, his hard disposal? So this is an interface where you just type in a question. It will query fake organizational data in render visualizations based on the input for anything that this fake CEO wants to know. If you use, for example, the Claude app, you understand artifacts. It's basically just Claude producing SVG and rendering that in this case in a browser as a visualization. So we'll show this right now. Now, all I'm doing is, I set up a manage agent. You can see session, it has a sandbox, orchestration to handle things like retries. I'm giving it one custom tool, the ability to render code to a browser. And what's going to do is, based on the user's input, it's going to render visualization. We'll see that in a bit, which will show different, for example, graphs or tables, based on what this fake CEO wants to know. Now, this is where I use Dowcomes. And I want to make sure this is really clear because I thought this was really cool. So, outcomes allows you to specify a rubric. You're passing instructions, which means the agent runs. And when the agent finishes, a separate sub agent spins up, looks at the artifacts produced, in this case, my page. And, for example, in this case, I specified, produce timing and take a screenshot. Do an analysis, send the analysis back to the main agent. So what was really cool here is, I used outcomes to make this much faster. As you'll see shortly, the CEO will ask a question, it's going to render visualizations, and I want it to be fast. And I used outcomes to do that. Now, one nice thing is when I kick off a run with outcomes, the manager will iterate against that outcome over time. And when it finishes, in my particular case, I would look at the result, see the dashboard, and I might have feedback. I say, I don't like this. What I would do is, I would then tell Claude Code my feedback. And it could use our sale item, pull the session log, reflect on the session, look at the rubric, look at the agent instructions, update those, and kick off a new session. This is what I call the outer loop. These two things work together really nicely. You have an interloop that's using outcomes with manage agents, give it a rubric, cook against that rubric, produce an output. And then this outer loop is look at that output as a user and saying, okay, I don't like this. And, allowing, for example, the code agent, like Claude Code, to modify the rubric or modify instructions and kick off a new session. So these two work really nicely together. And these are actually my results, and you'll see the demo very shortly. I start with a pretty inefficient baseline. And these are all discovered autonomously with manage agent just using the rubric to optimize timing. It figured out how to optimize or basically parallelize tool calls. It figured out to use fast mode, perform prompt optimization, and for inputs that produce multiple charts, it uses multi agent, which saves your own seven seconds, going from around 37 seconds down to 10 seconds for rendering. And all figured out autonomously with a manage agent using outcomes. So now we can see the results. Here's boss agent in action. We're able to analyze top line metrics. Yep. And I'm glad my music got included here. Did Claude make that music? Yes. Nice. Yeah, so this one's actually cool. This is actually using multi agent to produce three visualizations simultaneously on my hypothetical, AGI-pilled CEO dashboard. Cool. So I want to wrap some of these demos with just saying that the process of building Claude managed agents was made so much more meaningful because we were working with users like you. Throughout the whole course of this whole build of the platform, we got to partner with super innovative, agentic partners who were trying to use Claude to extend the capabilities of their platforms. So Asana and Noosh and our highlighted here, but we heard feedback from all of you guys throughout the course of our public release. And it's been so gratifying to see what you guys are building on top of our platform. We're really excited to help you guys ship faster and faster. We're excited to keep pushing this platform forward. And please always reach out to us if you have feedback. These QR codes are where you can get started. So one links to our developer docs. One links to a rich, interactive quick start where you're able to walk our primitives and build an agent in minutes. And then yeah, just another thank you to our great developer community. Yeah, thank you. All right.

How to get to production faster with Claude Managed Agents

TL;DR

Takeaways

Vocabulary

Transcript