Claude Agent SDK [Full Workshop] — Thariq Shihipar, Anthropic

Okay, yeah, thanks for joining me. I'm still on the west coast time so feels like I'm doing this at like 7am So yeah, but Glad to talk to you about the claw to agent SDK. So Yeah, I think like this isn't gonna be like a rough agenda, but we're gonna talk about we're gonna talk about like What is the claw to agent SDK? Why use it? There's so many other agent frameworks. What is agent? What is the agent framework? Um, how do you design an agent using the agent SDK or just in general? And then I'm going to do some like live coding or claw is going to do some live coding on prototyping an agent and got some starter code, but yeah, I the whole Goal of this is like, you know, we got two hours. We're gonna be super collaborative ask questions Um, this is also going to be not like a super Hand demo in the sense that like we're gonna be like thinking through things live, you know I'm not gonna have all the answers right away Um, and I think that'll be a good way of like building in the agent loop I think is like really mud very much like kind of an art or a division. So um Yeah, before you start to just curious a show of hands like how many people have heard of the claw to agent SDK or Okay, great Cool. How many of like you did or tried it out Okay, awesome. Okay, so pretty good show of hands. Um, yeah, so I'll just get started on like the Like, you know, overview on agents. I think that like this is I think something that Have seen before, but I think it's still is taking some time to like really sink in Uh, how AI features are Involving, you know, so I think like when gpt, you know, three came out It was really about like single LM features right you're like oh, like hey can you categorize this like return a response in one of these categories Um, and then we've got more like Work flow like things right hey like can you like Make this email and label it or like hey here's my code base like index via rag. Can you give me like the next completion or the next um The next file to add it right and so that's what we call like a workflow where you're very like Structured you're like hey like given this code give me code back out right and now we're getting to agents right and uh Like the canonical agent is called code right claw code is a tool where you don't really Tell it we don't restrict what it can do really right you're just talking to it in text and it will take a really wide variety Actions right and so agents Build their own context like decide their own trajectories are working very very autonomously right and so Yeah, and I think like as the future goes on like agents will get more and more economists um and we Yeah, I think it's like we're kind of at a great point where we can start to build these agents Um, they're not perfect, you know, but it's definitely like the right time to get started so um Yeah, claw code and sure many of you have have tried or used um, it is yeah, I think the first true agent right like the first time where I saw an AI working for like 10 20 30 minutes, right? so um Yeah, it's a coding agent and the claw agent SDK is actually built on top of claw code and The reason we did that is because um Basically we found that when we're building agents at a thrombic we kept rebuilding the same parts over and over again And so to give you a sense of like what that looks like of course they're the models to start right um, and then in the harness you've got tools, right? And that's like sort of the first obvious step like let's add some tools to this harness and later on we'll give an example of sort of Like trying to build your own harness from scratch to and what that looks like and how challenging it can be But tools are not just like your own custom tools. They might be tools to track the trial system like with claw code um, did the volume just go up or were they not holding it close enough Okay, let's see if something that gonna have it anyways um got tools tools Tools you run in a loop and then you have the prompts, right? Like the core agent prompts the The prompts foods seems like that uh, and then finally you have the file system, right? And or enough, right? But you have the file system the file system is a way of Context engineering that we'll talk more about later, right? And I think like I One of the key insights we have through claw code was thinking a lot more through Like context not just a prompt. It's also the tool the files and the website it can use um And then there are skills which we've like rolled out recently and we can talk more about skills If that's interesting to you guys as well And then yeah things like sub agents web search, you know like Like research, compafting hooks memory. There are these like other things around the harness as well um, and It ends up being quite a lot So the claw agent SDK is all of these things packaged up for you to use, right? Um, yeah, you have your application so I think like Uh, to give you a sense of uh, yeah to give you a sense of like Maybe why the claw agent SDK is um Yeah, like like so yeah people are really building agents on the SDK a lot of software agents You know software reliability security and for the triaging bug finding site and dashboard builders through These are extremely popular if you're using it, you should absolutely use the SDK um And it's office agents if you're doing any sort of office work tons of examples there um, got some like you know, legal finance healthcare ones um So yeah, there are tons of people building on top of it um, I want to but yeah, okay, so Why the clawed agent SDK where I look why did we do it this way? It's why did we build on top of Claude Code And we realized basically that as soon as we put clawed code out yeah the engineers started using it But then the finance people started using it and the data science people started using it and the marketing people started using it and Yeah, I think it's just like It we just realized that people are using clawed code for non-coding task and We felt and and as we were building you know non-coding agents we kept coming back to it, right and so um It's a like and we'll go more into why that just works why we could use clawed code for non-coding task Spoiler alert. It's like the bash tool but yeah, it's It was something that we saw as an emergent pattern that we want to use and we built our agents on top of it, right? And uh, these are lessons that we've learned from deploying claw code that we've sort of baked in so Tool use errors or compacting or things like that stuff that it's like very can take a lot of scale to find You know like what are the best practices? We've sort of baked into the clawed agent SDK Um as a result we have a lot of strong opinions on the best way to build agents Like I think the clawed agent SDK is quite opinionated. We'll talk over some of these opinions and and why like why we chose them, right? But yeah, one of the big opinions of the bash tool the most powerful agent tool So okay um what what are like what I would describe as the anthropic way to build agents, right? And I'm not saying that you can only build agents using the API this way, right? But this is like If you're using our opinionated stack on the agent SDK, what is it, right? So roughly unix primitives like the bash and file system and you know we're gonna go over like prototype An agent using claw code and my goal is really to sort of show you what that looks like in real time Right like why is bash useful? Why is the file system useful? Why not just use tools? Um Yeah agents I mean you can also make a proposal to talk about that a bit later the agents build their own contacts Um thinking about code generation for non-coding um Like we use code gen to generate dogs for the web like do data analysis take unstructured actions, so um there's a lot of like uh this can be three counter to two to some people and again with the in the like prototype reconcession We'll we'll go over how to use code generation for non-coding agents um And yeah every agent has a container or is hosted locally because this is claw code It needs a file system and needs bash it needs to be able to operate on it and so it's a very very different part of the texture I'm not planning to talk too much about the architecture today But we can at the end if that's where people are interested in or sorry by Architecture I mean hosting architecture like how do you host agent and like what are best practices there? Have you talked about that at the end? Yeah, so Well, let me pause there because I feel like I've covered a lot already any questions so far on the agent SDK agents um Yeah, like what you get from it Yeah, can you just have can you explain what code generation for non-coding means directly on? Yeah This is um Like Basically when you ask claw code to do it has right like let's say that you ask it to uh Find the weather in San Francisco and like you know tell me what I should wear or something right like uh What it might do is it might start writing a script uh to fetch a weather API right and then Start like maybe it wants it to be reusable like maybe you want to this pretty often right so in my Fetch the weather API and then get the like maybe you get your location dynamically right based on your ip address and then it will like um You know check the weather and then maybe like call out to like a sub agent to give you recommendations Maybe there's an API for your closet or wardrobe, right? It's like so that's an example I think that like it's kind of um for any single example we can talk over how you might use code code Yeah, so we do have Oh sure yeah um so the question was about workflows versus agents and would you still use it? Yeah So we do have Yeah So the question was about workflows versus agents and would you still use it? Yeah So we do have Oh sure yeah Um so the question was about workflows versus agents and would you still use the Claude Agent SDK for workflows? Is that right? Yes and so uh i mean we I just We just sort of tell you what we do internally basically and what we do internally is we've done a lot of like GitHub automations and Slack automations built on the Claude Agent SDK So uh you know we have a bot that triage is issues when it comes in that's a pretty workflow-like thing But we've still found that you know in order to triage issues We want it to be able to clone the code base and sometimes pin up a darker container and tested and things like that And so it's still ends up being like a very Like there's a lot of steps in the middle that need to be quite through flowing Um and then you like give structured out the end so um Yes All right, we'll take one more question and then I'll keep going so yeah in the blue. Yeah uh So could you talk about security and guardry? If if you know we're using Claude Agent SDK and you know you're leaning towards using bash as the You know all powerful generic tool Yeah, is the honest on Building the Asian builder to make sure that you know if you're wanting against like common attack vectors Or is that something that the model is is doing? Yeah, so I think this is sort of like the Swiss chief. Oh, yes, I guess the question was permissions on the bash tool right or like how do you think about permissions and guardrails the like and like when you're doing the agent this much Power over you know your it's environment in the computer. How do you make sure it's aligned right and so the way we think about this is Boy call it the Swiss chief defense, right? So like there is um Like on every layer some defenses and together we hope that it like blocks everything right so obviously on the model layer We do a lot of Alignment there. We actually just put out a really good paper on reward hacking super recommended check that out So like definitely like think Claude models like we try and make them very very aligned right and uh So yeah, there's a model line of behavior Then there is like the harness itself, right? And so we have a lot of like permissioning and prompting um and uh Like we do a st pass parser on the bash tool for example, so we know Um fairly reliably like what the bash tool is actually doing and definitely not something you want to build yourself um and then finally The last layer is sandboxing right so like let's say that and someone has maliciously taken over your agent What can it actually do? We've included a sandbox and like where you can sandbox now request Man sandbox File system operations outside of the file system and so uh, yeah, ultimately that's what they call like the lethal trifacto right is like um Like the ability to like execute code and environment change the file system Um and exaltrate the code right. I think I'm getting believe trifact a little bit wrong there The idea is basically like if they can exaltrate your like information back out right um That's like they still need to be able to extract information and so if you sandbox the network that's a good way of doing it um if you're hosting on a sandbox container like cloudflare modal or you know e2b data tonal like all these is like sound sandbox providers They've also done like some level of security there right like you're not hosting it on your personal computer Um or on a computer like your prod secrets or something so Yeah, lots of different layers there and and you know we can talk more about hosting in depth um, so Okay, so i'm gonna uh Talk a little bit about bash is all you need you know um, I think this is something that um This is like my stick, you know, I'm just gonna like keep talking about the central everyone like uh agrees with me um or like I think there's something that we found at Anthropic I think it's sort of something I discovered once like not here um Bash is what makes talk goes so good right so I think like You guys have probably seen like code mode or programmatic tools use right like the different ways of like composing apps the peas Confirmals put out some blockpost on that we put out some blog posts The weight I think about code mode is like order bash is that it was like the first code mode right so The bash tool allows you to you know like store the results of your tool calls to files uh store memory dynamic and generate scripts and call them composed functionality like tail grab uh, let's use existing software like S-Fempeg or the graph is right so there's a lot of like interesting things and powerfully things that the bash tool can do and like Think about like again, what made clone codes so good if you were designing an agent harness Maybe what you would do is you'd have a search tool and a linked tool and execute tool right and like you know And tools right like every time you thought of like a new use case you're like I need to have another tool now right um Instead now clonges uses graph right right and go through your package managers or runs like npm run like Test dot ts or index dot ts or whatever right like you lint right and you can find out how you lint right and can run npm run lint If you don't have a lint or you can be like what if I install yes lint for you right so um This is like you know like said the first programmatic tool calling first code mode right like you can do a lot of different actions very very generic right um and so to talk about this a little bit in the context of non-coding agents right so let's say that we have An email agent and the user is like okay how much did I spend on right sharing this week? um a you know like it's got one tool call or generally it's got the ability to search or inbox right and so it can run a query like Hey search uber or lift right and Without bash it searching the word lift it gets like 100 emails or something and now it's just got to like Think about it, you know what I mean and I think like a good like analogy sort of like imagine if someone came to you with like Like a stack of papers and like hey how much did I spend on right sharing this week? Can you like read through my email? So you don't mean like that that would be hard right like you need very very good precision and recall to do it um Or with bash right like Let's say there's a Gmail search script right it takes an inquiry function um and then you can start to Save that query function to a file or pipe it you can grab for prices you know you can Then add them together you can check your work too right like you can say okay Let me grab all my prices store those as like in a file line numbers and then let me then be able to check afterwards like Was this actually a price like what does each one correlate to right? So there's a lot more like dynamic information You can do to check your work with the bash tool. So this is like um This simple example, but like hopefully showing you sort of the power of like the capability of bash right so All positive any questions on Bash is all you need the bash tool any anything I can make a little bit clearer Yeah, do you have stats on how many people use you? Uh stats on the olemode we probably do Um, I mean internally we we don't but that's just I think we just have a higher security posture um Yeah, I'm not sure. I can probably pull that Any other questions on bash Okay, cool um Yeah, just to give you like some more examples like let's say that you had an email API and you wanted to uh, you know like go through like fetch my like tell me who emailed me this week, right? So you've got two APIs you've got an inbox API and a contact API Um, this is like a way you can do it with me a bash you can also do it via code gen This is kind of like enough bash it is co-gen right like um bash is a ostensibly co-gen tool um, and then yeah, like let's say that you wanted to you got a video meeting agent, right? You want to say like find all the moments where the speakers have quarterly results in this earnings called right You can use ff and the name to like slice off this video, right? Um, you can use jq to like start analyzing the information afterward. So um, yeah lots of like depth like powerful ways to use uh, to use bash so Okay, I'm gonna talk a little bit about workflows and agents They can do both it could use uh build workflows and agents on the agent sqk um Yeah agents are like Claude Code so if you are like building something where you want to talk to it in natural language and Action flexibly, right? Then that's where you're building an agent, right? Like you want you have an agent that talks to your like business data You want to get insights or dashboards or answer questions or uh, write code or something like that's an agent, right? And then a workflow is kind of like, you know, we do a lot of GitHub actions for example, right? So you define the inputs and outputs very closely, right? So you're like okay, can you get a PR and give me a co-review? And yeah, both of these you can use the agnstk4 When building workflows you can use structured outputs that we just released this um You can yeah google agnstk started outputs um But yeah, so you can do both. I'm going to primarily be talking about agents right now a lot of the things that you can Like learn from this are applicable to workflows as well, so um Yeah, we'll we'll talk about this uh with In show hands how many people have like designed an agent loop before? Okay, cool. Okay, great, great um So yeah, I mean, I think the number one thing The metal learning for design agent loop to me is just to read the transcripts over and over again like every time you see See the agent run and just read it and figure out like hey, what is it doing? Why is it doing this? Can I help it out somehow, right? um And uh, we'll do some of that later, right? So we'll uh, we'll build an agent loop um but here is the uh, the three parts to an agent loop right so First it's gather context right second is taking action and the third is verifying the work right and This is like not the only way to build an agent. I think a pretty good way to think about it gathering context is Like you know for clock code is gripping and finding the files needed right You know for an email agent. It's like finding the relevant emails, right? Um, and so these are all like pretty Yeah, like I think thinking about how it finds this context is very important and I think a lot of people sort of Skip the step or like under think it this can be like very very important and then to action How does it like do its work? Uh, does have the right tools to do it like cogeneration Bash these are more flexible ways of taking action right and then verification is another really important step and so uh, though basically what I'd say right now is like if you're thinking of building an agent think about Like can you verify its work right and if you can verify its work It's like a great like candidate for an agent if you can't verify its work like it's like you know coding You can verify by lending right and you can at least make sure it compiles So that's great. Uh, if you're doing let's say deep research for example It's actually a lot harder to verify your work One way you can do it is by citing sources right so that's like a step in verification But obviously research is less verifiable than code in some ways right because like code has a compile step right you can also like Execute it and see what it does right so um, I think like thinking on You know like as we build agents the ones that are closest to being very general are the ones with the verification step that is very strong Right so I think there was a question here. Yeah I mean you you might oh, yeah, sorry that the question was when do you generate a plan um, where you run through it so um Like in Claude Code you don't always generate a plan uh, but If you want to you do sort of between the gathering context and taking action stuff right and so um Plans sort of help the agent think through step by step, but they add some latency right and so there is like some trade-off there Um, but yeah, the agents you can't help you like do some planning as well so yeah Yeah, you like make the agent create that to do this for like 100% kind of Sure that people Create that to do this then run by it uh, yeah, so the question was will the agent create that you do with? uh Yes, um, if you're using the int SDK we have like some to do tools like come with it and so it will like Maintain and check off to do is and you can display that in this you go so yeah Um Any other questions about this right now? Okay cool. Okay, so i'm gonna quickly talk about like like how do you do this stuff you Like what are your tools for doing it right? uh There are three things you can do there you have tools Bash and cogeneration right and i think traditionally I think a lot of people that are only thinking about tools and uh Yeah, basically one of the connections just for now like thinking about it more broadly right so tools are extremely structured and very very reliable Right like if you want to sort of have as fast and output as possible with minimal errors Minimal retries Tools are great Uh cons their high context usage if anyone's Built an agent with like 50 or 100 tools right like they take up a lot of context in the model It kind of gets a little bit confused right There's no like sort of discoverability of the tools Um, and they're not composable right and i say tools in the sense of like if you're using you know Messages or completion API right now Um That's how the tools work of course like you know there's like code mode and programming tool calling so you can sort of blend some of these um The new bash so bash is very composable right like uh static scripts low context usage Uh, it can take a little bit more discovery time like because like let's say that you have Whatever you have like the playwright mcp or something like that um or sorry the playwright CLI the playwright like bash tool Um, you can do playwright dash as help to figure out all the things you can do But the agent needs to do that every time right so it needs to like discover what it can do Which is kind of powerful that it helps take away some of the high context usage but add some latency Um, there might be slightly lower call rates, you know, just because like it has a little bit more time to um It right it needs to like find the tools and and what it can do Um, but this will definitely like improve as it goes and then finally code gen highly composable dynamic scripts um They take the longest to execute right so they need linkeding Pots week compilation API design becomes like a very very interesting step here right and i and i'll talk more about like uh, best like how to think about API design in an agent um But yeah, I think this is like how you like the three tools you have and so yeah using tools Think you still want some tools but you want to think about them as atomic actions your agent usually needs to execute in sequence And you need a lot of control over it so for example in Claude Code We don't use bash to write a file we have our right file tool right because we want the user to be able to sort of see the output and approve it and We're not really composing right file with other things right. It's like a very atomic action um sending an email is another example like any sort of like non-destruct like destructible or sort of like You know un reversible change is definitely like a tool is a good place for that um The new got bash uh, so for example there are like uh composable actions like switching a folder using github Lifting code and checking for errors or memory um, and so Yeah, you can write files to memory and that can be your bash like bash to be your memory system for example, right? so um and then finally you've got code generation right so if you're trying to do this like highly dynamic very flexible logic Composing APIs Like you do data analysis or deep research or what reusing patterns and so um, yeah, we'll talk more about cogeneration in a bit um any questions so far about like The SDK loop or tools versus bashers cogen Yeah Oh uploading tools on results like into the file system or That's equals the bash and the context exploded Hmm. Is it like type to command that like do everything up? Okay, or otherwise just like long outputs believe in your history. Sure. Yeah Like all the time just uploading them to files. Yeah. Yeah. I think that's a good common practice. I think um we I I remember seeing some PRs about this very recently on Claude Code about handling very long outputs and I I don't Know exactly like I think I think we are moving towards a place where more than things are being like just stored in the file system And this is like a good example. Yeah, like it's storing like long outputs over time um, I think like generally Complication to do this is a good Uh way to think about it or even if you have I think like something I just do always that was like whenever I have a tool call I um I save it like the results of the tool called to the file system so that you can like search for it And then have the tool call return the path of the result um Just because like that helps it like sort of recheck it's work so um Yes Yeah, so the question was about skills and like doing you skills to use bash better um Yeah, for context skills maybe again Skills like hey, yeah skills are basically a way of like uh, you know allowing our agent to take longer complex tasks and like sort of load in things via context right so So like for example, we have a bunch of docx skills and these docx skills tell it how to do code generation to generate these files right and so um Yeah, I think overall skills are yeah, basically just a clutch in a files They're also sort of like an example of being very like file system or bash toolpilled right um Because they're really just folders that your agent can like cd into and like read right um and so Yeah, they give like what we found the skills are really good for is pretty like repeatable instructions that need a lot of expertise in them uh like For example, we released a front-end design skill recently that I really really like and um It's really just sort of a very detailed and good prompt on how to do front-end design but it comes from like our best you know like AI front-end engineer, you know what I mean He like really put a lot of top thought and iteration to it. That's one way of using skills um Yeah So the question was about skill that mv versus claw dot md and how to think about uh That right and uh, I think like I don't say all of these concepts are so new you don't mean like even claw code is like Released at like eight or nine months ago, right like um and so skills are released like two weeks ago Like I like I won't pretend to know all of the best practices for for everything right um I think generally Skills are a form of progressive context disclosure and that's sort of a pattern that we've talked about a bunch Right like with like uh bash and you know like referring that over like you know purely like normal tool calls Is like it's a way of like the agent being like okay. I need to do this Let me find out how to do this and then when you read in the skill that md Right, so you ask it to make a doc x file and then it like CDs into the directory reads how to do it Where it's some scripts and keeps going so um Yeah, I think like there's still some intuition to build around like what what exactly you like to find as a skill and how you split it out um, but uh Yeah, I think uh Yeah, lots of best practices to learn there still um Yeah Yeah, so the question was our skills ultimately parts of the model um are there a way to bridge the gap I miss berries talk at very mentioned talk yesterday, but uh yeah I think roughly the idea is that the model will get better and better at doing a wide variety of tasks And skills are the best way to give it out of distribution tasks, right? um But I would broadly say that like It's really really hard, especially like you know if you were Like uh, not at a lab to like tell where the models are going exactly um my general Rule of thumb is like I try and like rethink or rewrite my like agent code like every six months Uh, this because I'm like uh think that probably changed it enough that I've like baked in some assumptions here and so Like I think that like our agent SDK is built to as much as possible sort of advanced with capabilities, right? The bash tool will get better and better We're building it on top of Claude Code so as Claude Code evolves you'll get those wins out of the gate um But at the same time like you know Things are so different now like then they were your uh in in terms of like AI engineering, right? And I think like a general best practice to me is sort of like Hey, we could write code 10 times faster. We should throw out code 10 times faster as well And I think thinking about like Not so like Hege and your events on like where is the future right now, but like what can we do today that really works, right? And like like let's get market share today and not be afraid to throw out code later If you restart up, this is arguably your largest advantage that you have over competitors They're like you know larger companies have like six-month incubation cycles And so they're always like stuck in the past of like the agent capabilities, right? And so your advantage is that you can like be like hey the agent the capabilities are here right now Let me build something that uses this right now, right? So um yeah Any any other questions on for we're talking about skills and bash okay as seems like there are a lot of skill questions so um Yeah, I think at the back someone you might have to shout yeah So why don't you do skill versus API they look very similar Yeah, so question was why use of skill versus an API Good question. I think that like When you like these are all forms of progressive disclosure basically today agent to figure out what it needs to do And I'll go over like Examples of like you just have an API right in in our liking in our prototyping session um is totally like use case dependent right like just I think like I don't have a like I don't think there's a general rule I think it's like read the transcript and see what your agent wants if your agent always wants Like things about API better is like a API dot TS file or something or API dot PY talk do that You know, that's great like I think skills are like Oh, like sort of an introduction into like thinking about the file system as a way of storing context right and they're a great abstraction um, but there are many ways to use the system um And I should say that like something about skills that like you need the bash tool you need a virtual file system Things like that so the agent SDK is like basically the only way to really use skills to like their full extent right now so um Yeah, yeah back there Yeah, the question was can we expect a marketplace for skills so um Yeah, clock code has a plugin marketplace that you can also use with the agent SDK We're evolving that over time, you know, like it was like a very much a v0 Um and by marketplace, I'm not sure if people will be charging for this exactly it's more just like a discovery system I think Um, but yeah, that exists right now we can do slash plugins in Claude Code Um and you can find some so yeah Yeah, what's your current thinking about when you're going to reach for it like the SDK, you know to solve a problem When yes question is when do I use the SDK to solve a problem uh if I'm building an agent basically I think that like um My overall belief is that Like for any agent the bash tool gives you so much power and flexibility and using the file system You can use so much power and flexibility You can always eat out to Performance games over it right and so uh Yeah in the prototyping part of this talk we're going to like look at an example with only tools and Example without with you know bash in the file system and compare those two um and yeah That's what I mean by that being bashful built I'm like I just like start from the agent SDK You know and I think a lot of people had a topic and started like doing that as well So um of course I do want to say that there are lots of times where agent SDK is kind of annoying because you've got like this Network sandbox container and you're like I hate playing I don't want to do this You know, I mean like I want to run on my browser locally right I totally get that I think it's there is like a real performance trade-off The way I think about it is sort of like React versus like jQuery you know the guy like I when I was coming up I was like very into web dev and like you know It's you can jQuery and backbone and then react him out and it was my Facebook and they're like you have to Here's JSX like we just made this up and and now there's a bundle error right I'm like huh. It's so annoying um, but like they generally makes the model or it makes it made web apps more powerful Right and I think we're sort of like the agent SDK is like the reactive agent frameworks to me because it's like We build our own stuff on top of it So you know what's real and all the annoying parts of it are just like things where we're annoying about it To go really just it just works like you have like gotta do this yeah, um, so yeah Uh, yeah, okay, we're we're still a question. I guess yeah right here Uh, okay, the cash Russian great. I love you got custom internal like fast cool. Yeah How do you that? Discover that what we do that have to do Um cool Okay, the question is if you have custom agent bash tools how do you love to agent discover that by custom bash tools Do you mean like bash scripts or yeah, yeah, yeah, yeah Um, yeah, so I think It's it you just put it in the file system and you tell it like hey like here is a script Uh, you can call it you know, I am generally thinking in the context of The Claude Agent SDK where it has the file system and the bash Tools are tied together. This is kind of an anti-pattern I see sometimes where people are like oh like we're gonna host the bash tool in this like virtualized place And it's not going to drag with other parts of like the agent loop, you know, and that sort of you know Makes it hard because if you got a tool result that's saving a file Then your bash tool can't like Read it, you know, I mean unless it's all in one one container. So does that answer your question? Like So you just think you put it in my system or something? Yeah, just put in system probably like hey you have access to this I would like sort of design all my CLI scripts to have like a dash dash help or something So that the model can call that and then it can like progressively disclose like every like sub command inside of the script. Yeah Uh, yeah, I'm like there. Yeah, so uh That my question is not ready to reach for the agent SDK. So have you designed or other would you recommend someone use the agent SDK to build like a generic chat agent ask them back to like oh, you know, I'm building an agent where you have some info And the agent goes and does some stuff and finally I care about the output ask them back to let's just someone like are you using or do you foresee Using the agent to build like the agent SDK to build like plot the app rather than plot code Yeah, so the question is when do we reach for the agent SDK? does um like Like would we use the agent SDK to build clonk.ai which is a more traditional chatbot Dead Claude Code I one I think Claude Code is like a very like interface with that traditional chatbot interface But like the inputs and outputs are really you input code in you you get like where you can protect it You get it text out and you can take to actions along the way um You might have seen that like when we rolled out doc creation for clonk.ai um Now it has the ability to spin up a file system and like Create spreadsheets and PowerPoint files and things like that by generating code And so that is like, you know, we're in the midst of sort of like um Like merging our agent loops and stuff like that, but but broadly like Like Yeah, I will like is getting more and more like you see it with skills and the memory tool and stuff more and more file system Pilled right so we do think this like a broad thing that you can use Just generally and happy to talk through examples Um yeah, one more question then I won't keep on yeah Trying to understand the rule of thumb on when to build the tool or use the tool when to Okay Wrap something with a script or just let the agent go wild on the bash because I'll give you an example And say I need to access database From pint to time I can use an mcp. I can wrap it in a script and I can just let the agents call an endpoint from that directly from batch, right Yeah, great question great question So it still trying to Glock like when to use tools for some bash with this cogeon and he gave an example like okay. I have a database I want the agent to be able to access it in some way what should I do should I create a tool that queries the database in some way What should I use the bash should I use cogeon right these are all these are three ways of doing it um, I think that they are like you could use any of them and I think like part of it is like I think Unfortunately, there's no like single best practice, right? This is like kind of a system design problem But let's say that you want to ask your bash your database via a tool You would do that if your database was very very structured and you have to be very careful about like I know you're assessing like user sensitive information or something like that and you're like hey I can only take in this input and I need to like give this output and I have to mask everything else about the database From the agent right? Obviously that like sort of limits what the agent can do right like I can write a very dynamic query, right? If you're writing a full on sequel query, I would definitely use bash or cogeon just because When the model is writing the sequel query it can make mistakes and the way it fixes it is that is is it's mistakes is by like Lifting or like writing the file looking at the output seeing if there are errors and then iterating on it right? And so I Generally like if I'm building an agent today I'm giving it as much access to my database as possible and then I like Putting in guard risk, right? Like I'm probably limiting. It's like right access in different ways But what I probably what I would do is like I would give it Right access and putting specific rules and then give it feedback if it tries to do something you can't do You know, I mean and so I know this is like kind of a hard problem, but I think this is the like Set of problems for us to solve right like we built a bash tool parser um And that's a super annoying problem But we need to solve that in order to like let the agent work generally right and same thing with like database like Like yes, it's quite hard to understand what is it query doing? But if you can solve that you can let your agent work more generally over time. So um Yeah, I think thinking about it Like flexibly as much as possible and keeping tools to be like very very like sort of atomic actions right that you need a lot of guarantees around Um Same thing. Yeah, how do you ensure the old-based access to post on a thinking? How do you insure? The question is how do you ensure that the role-based access controls are taken care of usually that's in like how you provision your API Key or your backend servers or something like that right like um I think that like Probably what I do is like a great like temporary evacuees Sometimes people create proxies in between to insert the evacuees If you're concerned about exultation of that But yeah, I would create like a pack ease for your agents that are scoped in certain ways And so then on the back end you can sort of check it's like you know what it's trying to do and like If it's an agent you can like give it different feedback so yeah All right, I want to question um anything you could tell us We're about the memory tool the general memory tool um I I I'm not trying to like keep a secret. I don't know exactly like I've run the code But I think it generally works on the file system and so Supposed to to be As the carers it already been um, I would say that like we've had this question a bunch I would just use the file system in the cottage and SDK I would just create like a memories folder or something and tell it to write memories there Um, it's like I don't know the exact implementation of the memory tool, but it does use the file system in in that way, so yeah Um, all right. Yeah, last question on this yeah How do you manage for the patient record? How do you are managing the Like reusability suppose the same agent in the relative hundreds of users and Same for every time it is generating and every time it is executing So how can we use the reuse of you? Yeah, that's a really good question. So uh Yeah, let's say you have two agents interacting with two Different people the question is like how do you think about reusability between agents or how do agents communicate right um, I think Uh, this is a thing to be discovered, I think like but I think there's a lot of best practices and systems Design to be done on like Because traditionally with web apps you're serving one app to like a million people right and with agents like with caught code Reserve like you know, I'll one-to-one like container when you use caught code on the web it It's like it's your container right and so there's not a lot of like Communication between containers. It's a very very different paradigm I'm not gonna say that like I know exactly the best system design to do that right and like I think there's a lot of Some best practices on like okay these agents are reusing work How can we give them like like cut like general scripts that combine together the work that they've done? How can we make them share it? Um, I would generally think this is sort of like a tangent about on like Agent communication frameworks. I would say that like we probably don't need like a whole we don't I I think there's more of a personal opinion. I think like we probably don't need to reinvent uh, like a new communication system There are like the agents are good at using the things that we have like HTTP request and hash tools and API keys and Name to pipe synology things and so like probably like the agents are just making HTTP requests back and forth from each other You know using HTTP server There's a bunch of interesting work there. I've seen people make like a virtual forum for their agents to communicate and they like post topics Like reply and stuff like that Um kind of cool. I think there's a lot of things to explore and discover there Okay, um, I'm gonna keep going a little bit. How are you doing for time? Okay. It's not an hour left. I think okay Cool, so an example of designing an agent Uh, this is like yeah, let's This is not the prototyping session, but I think this is like will be a good sort of like Like we weigh into it. Let's say we're making a spreadsheet agent Uh, what is the best way to search a spreadsheet? What's the best way to execute code it like we're worst of best way to take action in a spreadsheet What is the best way to link to spreadsheet right? These are all like really interesting things to do Uh, I'm going to do like a fig mine we can go over it um If someone could grab a water as well that would be great. I think you know like could really use water and uh Yeah, okay um, thanks Okay, so we're going to um Yeah, let's let's talk through it Uh, or when you spend like a couple minutes yourself thinking about this question you have a spreadsheet agent You want it to be able to search you want to be able to like gather context take action verifies work How would you think about it right? So like just spend some time thinking through that Take some notes or something. Okay, is everyone Get had a little bit of time to think about this do they want more time or We're just eventually okay Uh, what's the best way for an agent to search a spreadsheet? The last thing I have to type with one hand down um I should figure this out because I'm gonna be a type later okay um The okay searching is spreadsheet Uh, any any ideas. How do you search a spreadsheet? Like what would you do? CSV Okay, you got a CSV. Okay now like Your agent wants to like search the CSV. What what does it do? Hey, Grap said it okay. Uh, what does the Grap look like You just look at all the headers looks at the headers. Okay, the headers of all Sheets okay, great. Yeah, and let's say I'm looking for the revenue in 2024 or something um Now I've got my headers like Uh, I'm just gonna pull up a spreadsheet, right? Uh, let's say that the revenue is in there's a revenue column and then there's like a Uh, say let's see Okay, so yeah, let's say it's something like this, right? Like um How do I get revenue in 2026, right? So this is sort of like a Popular problem right like there is revenue here and there's also 2026 here, right? So it's like a multi-dimensional step, right? We could look at the headers that will then give us Uh, like if you just pull this you'll get 123 hundred right? So we need a little bit more and Uh, any other ideas Yeah, there's a batch to afford the Arc it'll be okay. I think Arc okay. Yeah, yeah, and what would it arc for Well, depends on what you're looking for. Yeah, yeah, yeah, that's a question right like what what is the user looking for right? They're probably looking for something like this like revenue in 2026, right? Um, maybe use the API's to use the Google tools and all the numbers together or We'll cover some Yeah, so I did it's like use the API's like use the Google API's to like look it up Um, that's great, uh, but yeah, let's say we're working locally. We need to sort of design these APIs. Yeah SQL Iter that could be Comparative CSV directly and work Oh, interesting. Okay. Yeah, I didn't know that. That's great. So yeah, you you you SQLite to curious CSV um, that's a great like sort of creative way of thinking about API interfaces, right? Like if you can translate something into a Interface that the agent knows very well That's great, right? And so like if you have a data source if you can convert it into a SQL query Then your agent really knows how to search SQL right so thinking about this transformation stuff is really really interesting It's a great way of like designing like an agentex searching phase. So Yeah, yeah It's very good to ranking within the tool with this Claude smartness to start ranking the Because that's kind of what we're talking about here is right tool Yeah, as Claude smartness to write ranked the right tool for a job Yeah, if you prompted, you know, like, or like I think this one of those things were like I don't know Let's find out like let's read the transcript If it's not like how can you help it? Yeah, just sort of like I think all of these things are like an intuition You know, it's like like kind of like writing a horse Not that I've ever wrote the horse, but I know just like Yeah, like you you like you're sort of giving these signals to the horse or coming down trying to get what it how How do you push it faster, you know, and I mean it's sort of like it's a very organic like thing, right? I think we like to say that models are grown and not designed right so we're like sort of understanding their capabilities. Yeah Yeah, what is yeah Yeah, so that's another great pattern is like okay, can you add metadata to his spreadsheet? So these are some questions that you might want to think about before Like when you're thinking about search is like what three processing can you do to make the search better, right? And so one example is that you translate it into like a sequel format or something where you do something that can Curious right that's like a translation step another step is like maybe you have a tool or Like a pre-processing step where another agent added to use the spreadsheet and and like add information So that the agent can then like search across that information better right so Um Yeah, one more um, I was curious What I mean all those tools sound great, but yeah, I can't the agent just you know do what was suggested read the header and then Just get the date but like I feel like that should be a good Re-task yeah, probably I should have like prepared this in code But yeah, well I built a kind of spreadsheet agents before basically it's not work. It's kind of hard to do yeah, and so Um, basically what I would think about is like so we've got like okay, I Should I do you have to suggest and go and how can talk and code at the time? Oh Work it was prefer or something It was a microphone button on the back stick the light in your shirt Oh So So One way to do it is like You see in spreadsheets, right like you can say here you can design formulas right so like be two So this is the syntax for example the agents pretty familiar with like be three to be five, right? And so you can design an agentic search interface which is like this, right like be three Be five or something, right? So like your agentic search interface can take in a range, right? Take in a range string, right? And these are things that like the agent knows pretty well, right? Like you can Do SQL queries, right? Agents SQL queries pretty well, right? And uh like these you can also Do XML or it's on this small Okay Yeah, you can also do XML. I I'm not sure if you got no but like X select files are XML in the back end, right? And XML is very structured You can do like an XML search query And they're different libraries that can do that. So that's one example, right? Is like how do you search in gather context? And I hope this sort of like illustrates to you that like gather context is really really creative, right? Like and like there's so many iterations and if you just If you only tried one iteration, it's probably not enough for like think about like as many different ways as you can Like try these out, right? Like try SQL try try the search try try the grepinoch and like all these things and How a few tests they were trying across different things and see what the age of likes and what it what it doesn't like It's gonna be different for each case. Yeah Yeah knowledge Yeah, so question is like who Where is the knowledge come from is in the model is it like what is what do I mean by the agent? Yeah, generally what I think what you're looking for is like you have a problem You want to make it as in distribution as possible for the agent, right? And so the agent knows a lot about a lot of different things It knows a lot about for example finance, right? So if you ask it to make a DCS model it knows what DCF is Right and you can if you want to give a more information you can make a skill, right? But so it's you can know what DCF is it knows what SQL is can it combine those things together, right? And so like Ideally you want to like you're your problem is gonna be out of distribution in some way, right? Like there's some like information that's not on the internet or something that you have Or something is somewhat unique to you and you want to try and like massage it to be as in distribution as possible And yeah, it's very very creative I think like You know, it's not like a it's not a science to be in there Very much like an art so Yeah, okay, so we've tried gathering context then taking action We can probably do a lot of the same stuff here that we've done before right like we can do like insert To the array right um If you've got like a SQL interface right we can um We can do SQL query we can edit XML um These are like often very similar right like taking action and gather context that you probably want a similar API back and forth And then the last thing is verifying work right like how do you think about how do you think about that um check for null pointers Right is one of the ways to do it um Any other ideas on underification or yeah, sorry. I'm a bit confused When you're using other is the case to build the Asian yeah, I don't need to tell it how it should gather the context Sure, I just give it the context and explain. This is what's like plain and plain English. Yeah, what this meant to do yeah and What I tend to do and you tell me if I'm wrong. I actually end up creating a separate agent for qa Oh, I'm just saying to verify because I don't trust the agent to verify itself But I'm just I'm just I think confuse about the level of detail. I need to provide the agent in that example Yeah, okay, so the question is about Do you think context to the agent versus having it gather its own context You mentioned that you sometimes use a q&a agent I can ask like what like domain you your Belieger agented or And not cyber security Okay, sure. Yeah um I think that I think I need to like look into more specifics, but the Claude Agent SDK is great for cyber security and like I would generally push people on like let the agent gather context as much as possible You know like let it find its own work as much as possible um You're trying to give it the tools to find its own work The way I think about this is kind of like let's say they someone locked you in a room And they were they were like giving you tasks, you know like so that's what your your job was Like a mr. V sort of like scenario right like you get five tons of the dollar if you stay in this room for six months Then like Like someone's giving you a message what tools would you want to be able to do it right like would you just want like a List of papers or like would you want to calculator or like a computer right? Probably I would want a computer right? I'd want a Google I'd want like all of these things right? And so like I wouldn't want the person to send me like a stack of papers And okay, this is probably all the information you need I'd rather just be like hey, just give me a computer give me the problem Let me search it and figure it out right and so that's how I think about agents as well like they need like Like you know, they're stuck in a room. I need to give them tools. So if you can go back to the slides you have to the Graph you have To the graph like this to you know So basically that gathered in context is basically these are the tools that I'm offering Yeah, exactly. Yeah, you you're I'm giving it like maybe an API for co-generation maybe I'm giving this equal tool Maybe I'm giving a badge these are all like examples, right? So yeah They won't question Interesting. Yeah, so do agents share the context for you. I think I think this is like an interesting question Is overall about how you manage context? I think and I haven't talked about this too much of but sub agents are like a very very important way managing context I think that this is like we're using more and more sub agents inside of Claude Code And I would think about like doing sub agents very generally So like what we might do for this by cheat agent is maybe we have a search sub agent, right? So like sub agents are great for when you need to do a lot of work and return an answer to the main agent So for search Let's say the question is like how do I find my revenue in 2026? Maybe you need to do a bunch of results. Maybe you need to like Search the internet. Maybe you need to search spreadsheet things like that And there's a bunch of things that don't need to go into the context of the main agent The main agent just needs to see the follow result, right? And so that's a great sub agent task I don't have a dedicated sub agent site here, but like yeah, they're very very useful and I think a great way to think about things Yeah, just to just to build on that question actually For verification for example, you can imagine doing that through a skill or a sub agent You might even want to have an adversarial security example So great one. I don't really go to town on it and not really have a sympathetic relationship with the work already done So I know I get to spectrum, but do you like are you saying yes? You do the sub agent here You do the skill. How would you think about this? Yeah, definitely so question on like Do some agents or I'm sure it'll work so it makes sure Oh sure, okay. Thank you appreciate it Okay, yeah, can you sub agents for verification? Yes, I think this is a pattern. I think like ideally the best form of verification is rule based Right? You're like is there like a null pointer or something That's like easy verification It doesn't length or compile like like as many rules as you can try and insert them and again be creative right like for example In Claude Code if the agent tries to write to a file that we know it hasn't read yet like we haven't seen Though we haven't seen it enter the read cache We throw it an error. We tell it like hey You have a red this file it try reading it first right and that's an example of sort of like a Deterministic tool that we insert into the verification stuff and so as much as possible like anytime you are thinking about No verification first step is like what can you do? Deterministically what like what like you know outputs can you do and again like When you're choosing which eight like types of agents to make the agents that have more deterministic rules are better You know like they just like Like it just makes a lot of sense right so um Of course as the models get better and better reasoning then you can have these sub agents to check the work of the main agent the main thing there is to like avoid context pollution so you probably wouldn't want to like fork the context You'd probably want to start a new context session and just be like hey, yeah adversarily check um The work of like this this output was made by a junior analyst at McKinsey or something they graduated from Like not a grade school like your two piano like you know like like just like feed in a bunch of stuff and then tell it to critiquate right like that's like One of the tools of the sub agent right and so Yeah, the more you like Yeah, the models get better and better and that sort of verification will become better as well But doing it deterministically. It's like a great start Yeah Just a question about the bearer's work. So yeah So That's a we found No pointers. It's probably easy to just stay up and fix it But like you know, that's a we could put the production and the clients So you know, that's not They somehow get into a spot where the whole spreadsheet is deleted and so like Like on what level do we need to bake in like a building to like undo Because like um, let's say the QA agent Retirement that their spreadsheet is an empty. Yeah, not necessarily is able to So like like what was your advice there? Yeah, so the question is like how do you think about state and like undoing and redoing be able to Fix errors basically, right? I think this is like a really good question and honestly another sort of like um Like When you think about like what are agents good at right like or what problem domains are agents good at how Reversible is the work is like a really good Intuition right so code is quite reversible. You can just like go back and undo the get history We come with like you know these atomic operations right out of the gate right like I use get Constantly through Claude Code. I don't typek commands anymore, right? So That's like a really good example a really bad example is computer use You know because computer use has is not reversible in state right like let's say you go to like Door dash dot com and you add like the user wants you to order a coke and you add a order a Pepsi Now like you can't just go back and click on the coke You have to like go to the cart and you have to remove the Pepsi right and so your mistake is like compounded this like You know this state and the state machine has gotten more complex right and so like whenever we're dealing with like very very complex state machines That you can't undo or redo of it does become harder right and I think one of the questions for you as an engineer is like Can you turn this into a reversible state machine kind of like you said? Can you Store state between checkpoints such that the user can be like oh my spreadsheet is messed up right now just go back to the previous Checkpoint right Potentially even can then model and go back to previous checkpoints I think someone had this like time travel tool that they were giving one of the coding agents, which was kind of cool Where you're like it's like you can time travel back to a point before this happened, you know, I mean It's kind of fun. I think like all of these tools some of them don't work that well yet But you know, we'll get there We have thinking about state and verification is very useful right so Yeah, could you question at the back? Yeah Kind of curious about scale So what if this pressure is like Millions of roads and that's under the thousand columns right or just like any Like in that type of situation how would you go and how searching there's obviously a contest button? You have to call me Yeah, this is great. I probably should have done the spreadsheet example as my coding example For preview my coding like agent is a Pokemon agent I'm a police spreadsheet. We've been better. Okay. The question was What if this spreadsheet is very big if you have a million rows How do you think about 100 column? Yeah, 100,000 columns or 100 columns or whatever like how do you think about it? Right like your database is also very big like how do you how do you do that? um I think for all of these things One of course as the data becomes larger and larger. It's just a harder problem like you know It just absolutely is your accuracy will go down right like Claude Code is worse in larger code bases than it is in smaller code bases Right as the balls get better. They will get better at all of that Um from all of these I would think about like how would I Do this if I had a spreadsheet that was like a million columns and a million rows What would I do? I mean, I would need to start searching for it right? I would need to be like like if I'm searching for revenue I'd be like searching control F revenue and then I go check each of these Like results and I'd be like is this right and then like I'd see like Is there a number here and then I'd probably keep a scratch pad like a new sheet where I'm like hey like Equals revenue equals this you know and in store this reference and keep going I think that's a good way if they came out it is like the models should and you should never like read the entire spreadsheet into Contacts because it would take too much right like um You want to give it like the starting amount of context and it's also how you work right like let's say that you open up the spreadsheet What you see is rows is this right you see like The first 10 rows and the first like you know, what 30 columns or something right? That's what you see you don't load all of it into context right away You probably have an intuition for like hey, I should load more of this Intercontext right and and like oh, I should navigate to this other sheet right and this other sheet of more data right But you need to like Sort of you gather context yourself right and so the agent can operate in the same way It can like navigate to see these sheets read them like try and like keep a scratch pad keep some nodes and keep going So that's what I would think about it Uh, yeah in the back Yeah, so my question is about managing context pollution and actually I guess relates to the previous question Um, do you have the rule of thumb for you know what fraction of the context when to use before you start any Dimension returns or just it becomes those affected Yeah, the question is yeah context management do you have a rule of thumb for like Uh, how much of the context window to use before it comes less effective? This is actually I'd say a pretty interesting problem right now um I think a lot of times when I talk to people who are using Claude Code They're like I want my fifth compact. I'm like what like I like almost have never done a compact before you know I mean like I have to like test the UX myself by like Like forcing myself to get compacted Um Just because like I tend to like clear the context window very often right when I'm using Claude Code myself just because like Um, at least in in code the state is in the the files of the code base right so let's say that I've made some changes A Claude Code can just look at my get diff and be like oh hey these are the changes you made It doesn't need to know like my entire chat history with it You know in order to continue a new task right and so in Claude Code I clear the context very very often and I'm like hey look at my outstanding good changes I'm working on this Can you help me extend it in this way right that's like a way of thinking about it and um When you're building your own agent like let's say we're building a spreadsheet agent It gets a little bit more complex because your users are less technical right and they don't know what a context window is right Um, that is like I'd say a hard problem I think there's like some UX design there of like can you reset the conversation stage right like can you Maybe every time the user asks a new question can you do your own compact or something and can you like So summarized the context um does it like in a spreadsheet A lot of the state is in the spreadsheet itself. So it probably doesn't need you know to know the entire context Can you store user preferences As it goes so that you remember some of this stuff you know like there's a lot of like again Like it's an art there like so many different angles and ways in which you can do this right Um, but yeah, you are trying to like sort of minimize context usage Um, you probably don't need sawt a million context or something You know, I mean like you just need good context management like UX design. Yeah Yeah The context of the porridge right that's right. Yeah sub agents were made to I'll be able to use multiple sub agents and try to make a process so we chunk up this spreadsheet in the case where it's super large So then the agents can kind of run through each portion like parallel Yeah, yeah, I mean um, yeah, so like One of the things I love about clock code is that we are like the best experience for you in sub agents Like especially sub agents with bash. It is very very good I didn't quite realize uh, all the pain I think if anyone's going to cue con I believe Adam wolf is giving a talk on cue con about how we did the bash tool Adams alleged and the bash tool is such a good job Um, when you're running parallel sub agents at the same time bash becomes like very complex and there are lots of like Like race conditions and stuff like that and then so there's a lot of work that we saw there right so this is Like one of the things I love about clock code is you can just be like hey like spin up three sub agents to do this task And we'll do that and in the agent SDK as well you you can just ask it to do that so number one It's a lot of agents are great primitive in the agent SDK and I haven't seen anyone do it as well. So that's like a big reason to use it Yes, generally you want it you want these sub agents to further the context Let's say you have if you have a spreadsheet you could potentially have multiple read sub agents going on at the same time Right, so maybe the main agent is like hey can this agent read and summarize sheet one can this agent read and summarize sheet two in this region Summers sheet three and then they return their results and then they agent maybe spends up more sub agents again, right? So this is like another knob you have And I think what I want to say is like There's like we've talked so many so much about like all these different creative ways that you can like Do things this is like the level at which you should think about should have to think about your problem You should not really in my opinion think about like Like how like how do I spin off a process to make a sub agent or like you know like the system engineering between like Yeah, and like what is a compact or something right? So like we take care of all of this for you in the harness So that you can think about like hey what sub agents do I need to spin off right and like how do I create a Agendic search interface and how do I like verify its work? These are the really core and hard problems that you have to solve and anytime you spend not solving these problems and and solving like lower level problems You're probably not delivering value to your users, you know, and so Yeah, I think sub agents big fan of the agen SDK sub agents. Yeah Yeah Like we have this third Action and the verification path. Yeah, where exactly we need to put the verification in this example I let's say after generation of the X-12 query. Yeah, I can verify it is the right query generated or not that is the one path Second path is like a generation that very directly executing and once I will get the output then I will Do the verification so and how do I how agent can tune dynamically like which one is the right path? Yeah, so the question is like where do you do verification? Is it only at the end you do it in the middle like things like that? I would say like everywhere you can just like constantly verification right like Like I said redo some verification in the readstep of the of Claude Code, right? So that's like a great example You can do it at the end you should absolutely do it at the end But at any other point if you have rules or heuristics especially Like if for example you're like hey, what are my rules is that you should do like The the total number of columns you should searches should be under 10,000 or under a thousand or something That's like a nice way of doing it really similarly here like maybe you should have been serving like a huge like row Like of values like give feedback to the model be like hey chump this up, right? You throw an error and give a feedback and the great thing about the model is like it listens to feedback It will read the error outputs, right? And then we'll just keep going so Yeah, verification is definitely like I know I have it in this like as a sort of a loop, but um It's definitely more like verification can happen anywhere and should happen anywhere like like put it into many places you can so Um, all right, I do need to start doing some of the prototyping but I'll take one more question. So right right here How do we say how do we form the steps? How do we say the agent that So You just tell it so like Yeah, the system prompt. Yeah, so like with clonco we just give it the bash tool and we're like hey like gather context read your files Do stuff like run your LinkedIn You know what I mean? And so yeah again with the agent you don't need to enforce this right you don't need to tell it And like you need to do this because like sometimes it might not be necessary, right? Like let's say that someone is asking a read only question for your spreadsheet You don't need to like verify that uh, like you're That there are no compilators right because there you haven't done any right errors right right operations, right? So um, let the h of e intelligent and like in the same way that you would like that same freedom when you're doing your work, right? You're trapped in this box or whatever like same way right uh, so Okay, cool. I do want to try and see if I can do some prototyping now that we have this uh The the holder as well um Okay, yeah, execute land we've done a bunch of Q&A okay prototyping. Okay. Let's say that you have an agent, right? Like you want you want to build an agent you come out of this dog and you're like great. I have a bunch of ideas. How do I do this? um I think what I say overall is like Building an agent should be Simple or agent at the end should be simple, but simple is not the same as easy, right? So like it should be very simple to get started And it is just go to Claude Code Give Claude Code some scripts and libraries and Uh, constant CLAUDE.md. Ask them to do it, right? That's what we're going to do, right? Um That's like it should be so easy to be like hey, this is my API. This is like an API key Can you like go search like you know I don't know like my customer support tickets or something and organized them by priority or something like that, right? And then look at what Claude Code does and and and iterate on it, right? And this is like A great way of like just skipping to like the hard domain specific problems that you have, right? So you have a lot of like domain problems like how do you organize your data your agentic search how do you like regard rebels on your database? These are all questions that you can just start solving right away with Claude Code, right? And so trying like build something that feels pretty good with Claude Code And I think generally what I've seen is that you can do this and get really good results just out of the bag using Claude Code locally Right and and you should have high conviction by the end of it, right? And so um Yeah, I think like And we're gonna just the more info watch my a engineer talk. This is like a deck for internal that we're using um, okay, so uh Yeah, I'm gonna be inserting this so you yeah, you're getting what we show customers, right? So um Okay, uh, yeah, so yeah use use Claude Code again simple But simple is not easy, right? So like the amount of code in your agent should not be like super large Doesn't need to be huge doesn't need to be extremely complex But it does need to be elegant It needs to be like what the model wants you want to have this interesting insight Let's turn the the model to a sequel query. Oh, let's turn this stretching to a sequel query and then go from there, right? So um, think about that way and Claude Code is like a great way of doing that So okay, uh, let's make a Pokemon agent, right? This is what we're gonna do Pokemon is a game with a lot of information. There are thousands of Pokemon each with a ton of moves Uh, we want to be pretty general and so there is actually like a polka API Um, and the reason I chose Pokemon is just because like I know that you have your own APIs as well, right? And they're all like very unique, right? And uh, so I want to choose something with the kind of complex API that I haven't tried before um So the polka API has like you know you can search up Pokemon like ditto You can search up like items and things like that um And so it's got this like Yeah, this custom API you've got All everything in the games, right? So um And yeah, like one of the questions Your agent my want your user my want to do is make a Pokemon team, right? I loved Pokemon I know very little about making an interesting Pokemon team for competitive play Could buy agent helping with that that be that'd be cool, right? So um that my goal is to make an agent that can chat about Pokemon and then people like You know see what we can do, right? And and and how far we get so um I've done like some of this work already and I will like Open up and show you so um The first step and the prompt here is like the first step is I'm gonna do mostly cogeneration for this, right? And so um Let me Is this yeah on get up somewhere? Uh, it actually is uh Yeah, so I'm my personal get hub Oh yeah, I was going to commit all of this as well Yeah, um, yeah, yeah, so uh, I think my personal get up is let's see All right, if you're a get hub or doesn't come out where in it You guys are AI engineers, I like if you even get all in that's your fault um so Um, yeah, you can you can clandestude like And it took personal as changes so okay, so um Yeah, can you ask you this should I put in dark mode instead or is this fine like um dark mode dark mode Okay, is this better? No Okay, I Okay Yeah, okay, okay, okay, so here's an example of like I've taken the the prompt I gave it was Hey, I go search bokeh API for it's API and create a type script library, right? And so this is all by coded And so you can see here that it's created this like interface for a Pokemon right and so it's created like this Pokemon API I can get by name I can list Pokemon I can get all Pokemon I can get species and abilities and stuff like that And so like this is just a prompt that I gave it right and generating this like TypeScript API also did it for moves um, and then it's created this like It's created this like API that you import bokeh API right from the bokeh API SDK and yeah, you can see like sort of how it's like set set this up and Now in contrast to right and so this is the claw that mb right this is the type script SDK for the bokeh API um, this is like the the modules in the bokeh API here are some of the key features um I'm asking it to write scripts in the examples directory and then it will execute those scripts to help me with my queries right Um, and I give it some examples scripts. It doesn't always need all this information right like uh, but yeah fetching Pokemon and listing the reason where it's getting data and stuff like so This is like my agent really it's like uh prompt I gave it to generate a TypeScript library and then this clawed mb and I I can chat with it in Claude Code I'll also show you a version of it that is just tools right so here I'm using the messages completion API right and I've given it a bunch of tools from API So like get Pokemon get Pokemon species uh, get Pokemon ability get Pokemon type get move So you define all these tools and you can see that like You know, I also just gave it a prompt and told it to make the tools Uh, it doesn't want to make a hundred tools right like there's a ton of slogan Oh, we're sorry um Poke API data um But like it you know there's only so many parameters it can do so it's got this like tool call and now um, and I made like a little chat interface with it right so let me now go here and say like uh, this is my tool calling um Did I miss? Great so yeah here we've got this chat.ts right um I use bun when I'm prototyping stuff just because like I don't want to compile from TypeScript to JavaScript um and uh Again buttons like linting built into it uh It's a way of like simplifying for the agent so the agent doesn't need to remember it to compile TypeScript is better for generation because that's types right so I'm gonna start this like bun chat And then I'm gonna try like okay. What are the generation To water pokemon um And you'll see that it's it's starting to like search and I'm logging all the tool calls here This is very very important right because like it needs to like do the tool calls And so you can see that what it's doing is like it's searching a bunch of pokemon Um, and then it told me okay here are the water pokemon for gen 2 right it's got Totadile crockin off a alligator you can see sort of like how it stopped like In between each step it's thinking through um The previous steps right now like let's say that I want to do With claw code I think I may need to Uh Really to delete this example um oh yeah Small question. How do you log the Tool calls? There's just an argument. Oh yeah, this is um This is like in the normal API right so I just like Uh in the model every time it logs in I just called this this is in the like normal anthropic API um In the SDK I I'll get back to get to the SDK um it's just like you just log every system message so um Was doing console on screen does that make sense or right now? Yeah, yeah This is a shout in a phrase you're showing. Yeah, that's using the regular API. Yeah, that's using the regular API Not the agent SDK. Yeah, and so what I'm gonna do here is Um here I'm gonna delete the script It's I don't want it to cheat Um, but okay, so here you you know that um, I'm just opening claw code I've created a bunch of files here. I'm gonna say like can you tell me all the generation to water pokemon? And then we'll see what it can do right so um I forget if I need to prompt it to read script or something I think I'll be fine. We'll see what happens Do you mind going to the core SDK file and just showing Talk about getting contacts and then action and then verification and you show that in the code and how we're Configuring the tool description. Yeah, so uh we haven't done The SDK part yet so so far. I'm just Put put some APIs in Claw code. Yeah, that's right. I thought I missed that. Yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah But yeah, so okay, you can see here Um, it's given me a lot more right and um Yeah, it's given me a lot more so it's saying there's 21 Right, and I think this is roughly right. I'd like um Uh, what did it do? I think it just no it's okay That's funny. Labdubless um um Anyways uh Yeah, the procoin is slightly in distribution which is which is I guess good But yeah, so like what would it will do is like it will try and like write like a script and Because then you don't want it to think as much right so here it's like okay, what I'm going to do is um Let's see. Gen two Okay, so yeah, you can see here it knows like okay the start of the generations it fetches these for API I guess it's decided not to use like my prebuilt API here um and then uh Yeah, and then runs it right so um I think I need to like improve the clon.indeed for this but anyways you can see that like it's able to like check 200 plus pokemon and then check for their type and and you know get their get their information right so this is like uh just a quick example on like how to do cogen and how to use clon code to do it right so um We'll run this script and then like uh um Like keep going right so uh it will be me down and um Yeah, basically what I want to show let's see we have probably 15 minutes of um Is that what play pokemon yeah actually this is one of the demos I was thinking of doing um Claude Code play pokemon so let's say you want to do like an agentic version of Claude play pokemon how would you do it um What you would do I think is like you would give it access to the internal memory of the uh the wrong Right and so let's say they wanted to find its party it could search that in memory and pokemon red is like a very well indistribution uh reverse engineered uh game right and so it could search in memory to be like hey these are the pokemon um these are like this is how I figure out where the map is just how I navigate it right so this is like Maybe actually I still the reader if you want to try it out it's like um there is like a no js gba simulator um I think i have to legally say you have to go buy pokemon red and try it um but yeah I think like uh Yeah, good example anyways here so it's it's that's all of them and it's listed all their types and um Yeah, you can see how it's like used code generation to do this right so um a quick example of using Claude Code to prototype this um Now there can be like more interesting like data here so um i do want to leave time for example So i think i'll just sort of like our questions So i'll just sort of go through like an example Well, let's say you're making competitive pokemon competitive pokemon has a lot of different variables in data this is like a text file from this online like A library basically which stores like all of the pokemon and They're like moves and who they work well with and don't work well with and you know like who they're countered by and all of these things Right, so there's a ton of data here right and it's all in text file um Which is actually pretty good for Claude Code right because i can say like that um Hey, i'm gonna get a little bit more data normally put this in the um check the data folder Tell me I want to make a team around Venusaur Can you give me some suggestions based on the smog on data um And smog on is like this online yet and so i'm not entirely sure what we're doing here yet I haven't done this career before uh, but we'll see i think it'll be um oh Yeah, but what i wanted to do is sort of grab through This this data right and sort of figure out from itself for first principles not having seen this data before How can i like answer my career right so um While it does does that i'll take any questions yeah So this is like really on top of Claude Code So my question is In words to deploy this customer face it Yeah, are we supposed to have Claude Code running in like uh Like this swarm or Or somehow Just by and the hsd Yeah, so let me show you like very quickly like what the What it looks like to use the agent sdk here um so I've already done this file system right and again I want you to think about the file system as a way of doing context engineering right like this is like a lot of the inputs into the agent So my actual agent file is like 60 lines right um and it's mostly just like Random like for the page right like i guess yeah, it's decided to stop it from Writing scripts outside of that custom script start tree again. So leave that go to so um Yeah, you can see like it just runs this query takes into working directory um and Like like runs it in a loop right and so Probably had walked to like turn into like some allowed tools here and stuff But it's very simple and and so um if i were to like productionize this The first step i do is like okay, I I've tested it on Claude Claude Claude Code. It seems to do pretty well. I write this file then i put it There two ways to do it. So one is i do think that Like Local apps might be coming back with a i because i think that like there's such an overhead to running it like for example Claude Code is a front end app right like it works on your computer So maybe the way i shift places and pokemon app is like hey I have like an app that you install and it works locally on your computer and it's writing scripts I think that's one way of doing it right um the other way is yeah You have you hosted in the sandbox And again, there is a bunch of different sandbox providers that make it really easy like howcler has a good example Um of using the agent sdk and it's just like sandbox dot start you know and then like fun Agent dot ts and that's kind of all it takes right like it. It's like the like they abstracted away a lot of it So you run like the sandbox and then you communicate with it And um Yeah, I think there is like some very interesting stuff that i'm not sure i had time to get to but um Like i think some interesting questions are like um Yeah, like how do you do this sort of like service now we're just fitting up a sub like a sandbox per user There's a lot of like i'd say best practices to solve here one thing i just want to call out for you guys to think about If you're making an agent with the UI like let's say that you have uh Yeah, my Pokemon agent and i wanted to have an UI that is adaptable to the user right like maybe some users are doing team building Some users are helping it with their games some users just want pictures of Pokemon How would i have an agent that adapts in real time to my user right Um the way i would do it is in my sandbox i would have a dev server right and the dev server would expose a port um It would run on fun or note or something It would like expose a port the agent could edit code and it would live refresh and and your user would be interacting with our website This is how a lot of like site builders like lovable and stuff work right they use sandboxes And they sent host essentially a dev server and so thinking about this for your user if you want a customized Interface this is a great way to do it um I can let's see Let's see what it did um Okay cool okay, so um it's like written this like script is Generate like Show me some base stats and the suggested it a like um uh move set and some teammates and you can see sort of like See what did it do um Control Yeah, okay, so you can see here what it started doing is like it started searching for venus or right and it started finding uh those types the Like those pokemon and when it does that it also gets Other pokemon that mentioned venus or so it gets like it's teammates and it's counters and stuff right and it's sort of over this time found Interesting pokemon right that like it might work with right so it's done a bunch of these searches and it's got these profile It's found it's most common teammates and and rudeness script to analyze it right and so this is all based on a text file Of course, I could have pre-processed the text file a little bit more um But yeah, it's like done this sort of like interesting Um and analysis for me and again, I'll push up work onto the GitHub repo and um also tweet about this I'm on twitter. I'm uh tier Q212 It to read a lot so definitely like mostly about agent SDK stuff But yeah, I have about eight minutes last. I want to spend the rest of the time taking questions about kind of anything You know, and I'm sorry. We didn't get to do more prototyping Uh, yeah Yeah, put it in in Claude play focal on yeah, I do want to make Claude Code play focal when I think that'd be fun Yeah, I think Claude plays pokemon. I think we try and keep it like I'm sure reasoning task is much as possible Yeah, other questions. Yeah I mean Yeah, here's like Yeah, I do think overall especially right now agents are kind of pricey, you know, I mean because like um The models are have just started to get agentic. We really focus on like having the most intelligent models You know, and like you generally this is just like an overall like SaaS business offer thing you'd rather Carge fewer people more money that really have like a hard problem, you know, and so I think this is still good like you probably should find um, you know these hard use cases but I would say like number one make sure you're solving a problem that people want to pay for Right, so it's like the number one step, right, and then number two um Yeah, I think you could do subscription or token base I I think this kind of comes down to like how much do you expect people to use your product versus like how much do you expect them to like use it occasionally like Claude Code Obviously people use a lot and in order to like we do a mix of like if you use a rate limits and if you exceed it We do use it to raise pricey um, I think that like yeah, it's very like Depended on your own user base and kind of like what they will do but I will say monetization is something you should Think about upfront and design Your you know agent around because it's hard to walk back these promises Um, yeah back there Uh, yeah, I just so much talk about um, folks are great. We we do ship with hooks um hooks are a way of doing deterministic verification in particular or inserting contacts of You know, we fire these hooks as events and you can register them in the in the agent SDK There's like a guide on how to do that um examples of things you might use hooks for is like for example um Yeah, you can run it to verify the like a spreadsheet each time You can also look like what Sam working with an agent and I'm the agent is doing some thredching operations and the user has also changed this spreadsheet This is an interesting Glad place to use the hook because you could be like hey has After every tool called insert changes of the user has made Uh, and so you're doing kind of live contacts changes um in a interesting way. So um Yeah, I think uh Uh, yeah, there's more stuff on like the docs about hooks um I am happy to like talk about it afterwards as well. Yeah more questions. Yeah Yeah Yeah Yeah, sure. I yeah So like let's say you've done this prototyping you found something that works what I would do is like I've somewhere the club done Indeed like obviously like When I tried doing this one time it like doing you my API directly and it wrote JavaScript I should have been more specific in my club that MD to be like hey, you should use this um I yeah, I think like so that's one thing um the second thing is Yeah, just somewhere in the club that MD have the helpers scripts that you need and then like write something like this agent Not yes Yeah, more questions than the great Yeah, this is a good question and you know like I'm I think there is some messiness right like I think one of the things if an agent knows an answer Um, and you want to like sort of like fight it kind of to be like okay like no generation nine now and like we know sort of stats of change If there's like this new like here like um This is hard. I actually think uh one of the ways of doing that is hooks so you can say for example like hey Uh don't If you have like Returned of response without writing a script, you know, you can check that you can be like give feedback to be like Please make sure you write a script. Please make sure you read this data Right and and you can use hooks to like give that feedback in the same way that in the Claude Code We have these like rules like make sure you read a file before you write to it, right? So add some determinism It can definitely be like I said it's an art, you know sometimes, you know Yeah, maybe like writing a course like us probably Yeah, and the gray Like large companies it's a working like a 50 million plus nine And so yeah, the tool doesn't work really No, I'm so happy to build like my own like semantic indexing type of things to kind of help with that right sure Kind of like Things about how that can be more native to the product You know in a couple months of the thing I'm ready just gonna go away or like how are you guys thinking about that? Okay, you're last question in a couple months. You're thinking go away. I'm just generally yes about AI I think that Symantic search this is a Claude Code question more than an agent has to create a question about happy to answer it like um, we You know, they're trained off so semantic search is more brittle I think you have to like index and and search and it's not Necessy the model is not trained on semantic search and so I think that's sort of like a problem like you know Grab its trained on because it's like it's easy to do that but like semantic search. You're implementing your bespoke query For like very large code bases, you know, we have lots of customers that work in large code bases I think what I've seen is sort of like They just do like good clock out at these you start in you know trying to make sure you start of the directory you want Have like good like verifications to add to hoax and lains and things like that and so You know, that's what we do. We don't have you know, I custom we we dog food clock code, right? So Yeah Okay, yeah, last question. We have to close unfortunately So I'll take it up for to read everyone

Claude Agent SDK [Full Workshop] — Thariq Shihipar, Anthropic

TL;DR

Takeaways

Vocabulary

Transcript