Building durable Agents with Workflow DevKit & AI SDK - Peter Wielander, Vercel

Thank you all for coming. Hello, hello. I don't know about you, but my right agents like focusing on the capabilities and the features, and I like not thinking about all of the extra effort that goes into getting something that works locally into production. And something that's very useful for that is a workflow pattern, and that's why we developed the workflow dev kit, which is what we're talking about today. Presumably, if you're here, you've had similar issues. And today we are going to turn a agent, coding agent, into a workflow, support a coding agent throughout this session. So, I'm going to stick to this slide. So we have an open-source example ready to go. This is on the the cell slash examples repository. So you can clone that and check out the bipe coding patch app inside. We're going to be using this app for today's demo. And after we're done, we get first class observability built in and also durability reliability. We get a lot of extra features like resumability, and well, for that kit makes it very easy to add, humiliating workflows, and similar things. So if you think about our general agent loop that we've all seen before, we mostly have calls back and forth from another link to our two calls and our backend code, which will include MCP server, similar to a kind of async tasks. And the usual way to go about this is to wire up some cues and a database, especially if you are doing long-running agents that might run for hours. And you want to scale and you're running on serverless, for example, you want something, some kind of reliability layer in between, which usually is felt like use. And then you'll also need to add a lot of error and retry code. You'll need to store all of the messages that people are sending and off between states. And you'll probably also need to add some kind of observability layer in between. All of those things we are going to do today using only a single library, which is the workflow development kit. It's open source, and it runs with any of your typescript, contents or backends, and can also run on any Claude. We're going to be deploying to myself today, but this could just as easily run, or any of your Claude stacks, or server-based, or any of your custom stacks. All right, so who here has heard of the workflow pattern, or has used a workflow library before? Sure, fans. All right, that's less than half. I'm going to quickly explain what a workflow pattern is to make it clear what we're doing, and then in about two minutes, what I'm going to do. So a workflow pattern is essentially a sort of orchestration layer that separates your code into steps that can run in isolation and can be retried and have their data assisted and an orchestration layer that we call a workflow, and with some platforms that have different names for that. And in our case here, the workflow part is the whatever the loop is that calls the reason the LAN calls and then goes back to the tool calls and then back to the LAN calls. And the steps would be our actual tool calls and our LAN calls. Right, so looking at that data for today, we're going to be jumping into the code. We're going to add workflow development kit, which is going to be quite fast, and then we have a lot of time to talk about cool additional features that it adds like resumeable streams out of the box, how to suspend and resume at any point, and how to add webhooks for human in the loop processes. At the end, that's going to be ample time for today, but there is a reason that you're here in the workshop and not looking online, which is that you can ask questions. Please do so at any point, feel free to raise your hand or start out the question and we'll get rid of you. All right, so as I said, we're working off the Bersel slash example repository, and we're going to be working off of the conf slash one dash is branch. Why this branch? I stripped a bunch of the access code on the example to make sure that we can focus on the most important parts. And every checkpoint from this workshop will have its own branch. So if you're not coding along directly, you can also check out the steps step by step and then check the diffs and see what change between our run. So I have already run npm.dev locally on this platform, just to show you what it looks like. I'm going to run a simple query. So this is a coding agent, right? It's like a code editor, but without the code editing. And it can take a prompt to generate some files and it'll eventually show you a iFrame with the finished app that is deployed. So it's mostly UI with a few simple tool calls that we'll look at in a second, and the file system and output runs over the cell sandbox, but you could just as easily run this locally. Looking at the code, I'm going to go and check out our actual branch. Looking at the code, we have one endpoint that accepts our check messages, right? And it does some regular sort of model ID checking, but the model is supported. And in the end, it's going to simply create an agent of s. What was that branch for more time? The branch was conf slash one dash base. You just found out? Yeah. And you can see we'll move on to these conf slash two dash, etc. Just look for the numbers and you'll find all the checkpoints. Yes, so our main endpoint just accepts some messages and calls the aisk agent, which is essentially the same thing as a stream text call. We'll pass some tools and then turn on the loop for stream text to call, stream text to call. And then it'll stream all of the chunks generated back to the client in a format that is easy for the client to understand. This is all sort of aisk regular code that you could replace with a different library if you want. That is most of that to support the UI. But again, all of the actual agent stuff is very simple. It happens here. So let's also take a look at the tools that we have. We have four tools. Create sandbox, get sandbox URL. These are very simple. We just wrap the cell sandbox.create.and.cat URL. And similar with run command, essentially, wraps sandbox.run command and run-write files will generate a file from a simple prompt. And we're going to take a look at one of these tools. For example, we have a prompt that looks somewhat like a markdown file, sort of what to do, not to do. And my hard case is I'm not working. And it's cool, call. We also have an input schema, that's a Zod schema for what the ISEC for surpass. This is all very standard. And then an execute one which wraps sandbox.run command with some error handling. So that's essentially our entire agent code. Set up on an app. And then we just call use chat from AI SDK to consume the stream and display things in the UI. So let's get started. Adding workflow to this. Any questions before I get started? Cool. All right. So step one, we're going to run the environment stuff for workflow and at workflow slash AI, which will give us the latest version. Workflow is the main library and at workflow slash AI are some help us for some records for a work well with the workflow development kit. So now that we have this installed, we are running a next JS app here. So we're going to extend the compiler to compile our workflow code by doing use workflow with workflow. Which we can import from workflow slash next. And that'll set up next JS. Yes. Very fine question. You are in the example slash slash five coding directory. Yes. So I think this will let the compiler know to also compile our words separately, which will get into more a second. And then for convenience, we can also add a TypeScript plugin to our TS config. Same package. And that'll give us some better other compilations for our workflow code. And so we talked about our workflow being having an orchestration layer and having a number of deaths. We're going to do first is we're going to write the orchestration layer. In our case, that is essentially just the does the loop the calls steps back and forth. We're going to add a new file. You can call it whatever you want. And we're going to take our agent call and move it over there. And I'm going to show this our code workflow, which is going to be all of our workflow code. And then I'm going to go and add a complete bunch of imports. Thank you, A guy. So we're just passing most of the arguments that we would otherwise get from here over there. And this completes the effect on essentially having done nothing but pull out some of the workflow code into our file. So this is what gets interesting. Now that this is a code, this is a separate function, we can use the use workflow directive, which will mark this for our compilon as a workflow function. So what this does is under the hood, yes, another hood compiles all of the code related to this function, press the button, and it ensures that there is no imports for anything that would have side effects. Because the workflow orchestration layer needs to be deterministic. So it can be rerun in a deterministic fashion and there's no worries about state pollution. So now that we have this, we would need to now mark our at a land calls as steps. And because the land calls are happening inside the agent, this is a little bit harder to do here. And so we ended up writing a durable agent class, which is essentially the same thing as agent with a use step mark on in the actual alarm calls that it does under the hood. So now that we have this setup, we're going to await the actual streaming. And let's see if there's anything. We need to do checking for errors. Oh, yes, we need a stream to write to. So previously, we could just write to the stream that the API handler gave us. Now we're going to have to create a new stream to write to. We export a get writeable function from workflow, which is, which gets a stream implicitly associated with the workflow to write to. And we're going to get into that a little bit more in a second. But for now, we'll just pass that to our agent. And we're going to see if this is right type. And then finally, back in our actual workflow, we need to call our workflow in a way that the framework understands, which far says a call to start with the arguments being passed separately, which essentially telling it to start a new workflow on this function. And start can be imported from workflow slash API. So now we essentially have the workflow fully hooked up. And a lot of this was just pulling out some of the codes and adding a directive. And it's volunteer to help anyone who's like following along and has some debugging questions. Just be sure to. And finally, this is not called returns a run instance that has the stream that just not writing to that we can return to the UI. So this completes our workflow definition. And now, we also said that we would need to mark things as steps. The durable agent class already marked the LLM calls as steps. But our tools right now are not marked as steps. Thankfully, this is very easy. In the execute function for each of these tools, you can just write use step. And that will let the compiler know that this is a separate chunk of code to execute in a separate instance. Right. If this is deployed to production, this would run in a separate serverless instance. And the inputs and outputs would be cached if it already ran and it would be retried if it failed. So I'm going to go and go through the other tool calls and also add use step to these. Thankfully, we only have four of them. And that should complete our transformation. So now we can go and run the mpm dev. See if this works as expected. We're going to reload our page. It seems like nothing changed. Let us actually run a query. And we can see that it's still screening as expected. So for us, development locally, right, all we have to do is pull out a function and then add some directives. But now, if I deploy this to any adapter, again, Vosell or an AWS adapter or maybe you have your own, this will run in isolation with your ability and all those good things. And something that's really nice for local development also is that if I go and if I go and I'm going to go into the same folder here and I'm going to run npx workflow web, which is just a city I call to start a local web UI to inspect our runs. And you can see that our run is currently still running. And every step, everything that's not just a step will have a span here and you can inspect the inputs, the outputs and the associated event. And we can see that our workflow just completed, I think. And yeah, this guy's built in. Yes. I just forgot clarification. Every time you're prompting your vibe code or that is one instance of the workflow that runs the completion. So then each one is, yeah, it's exactly. Yeah. And you could model this in any way you want. You can also model your entire like an entire user sort of session as one workflow and have the workflow sort of do a wait for the next query. And then again, you know, we can run code for weeks if we need to essentially and I'm going to go into some tools for that in a second. So now that we have this set up, you can see that on the right side, we do not get any sort of helpful feedback. But if I visit this link, as it has likely been created correctly, or it's got the customers. And either way, we're not getting any output on the right side. So the reason this is happening is that we are streaming the agent output to the client, but our tools aren't actually doing any stream calls right now. So what we could do is similarly in our tool calls, we could get the rightable which would which will get the same rightable instance as any other as the workflow itself. There is an infinite amount of streams you can we can create and consume in a workflow. And you can also like you can type them with a certain name and then fetch them from there. But this will get the default instance. And once we have a rightable, we can actually connect to the rightable by getting the right tool. And now we can write any kind of information to the the I to the consumer. I think we want something like data creates samples. I think it's what I hooked up in the UI. And then we'll call ID's, we want the sandbox ID, we'll have to do it here. So this is me just writing a data package that our UI knows how to consume. So not that it is. And I reload the app. And so it's again, we'll see that at least the sandbox create call presumably gets fairly to start. You said that you didn't hear or even had a stream that can be created. And I know what you mean by that. Yes, so a stream, the workflow, so the data they use for workflows in local development, right, is with just via file in production. This might be a right instance. Supports the workflow calling it to create a new stream, for example, in Redis, right, and then passing that stream back. And so anytime you call that right of all, it'll create a stream, for example, again, in Redis with the ID of that workflow. And it'll pass that back. So any step can attach to that and any client can attach to that. And in local host, this would be written to a file and write from a file. You're setting up right now. So previously we had a PI handlar that talks to messages called the agent and then stream back messages on that API handlar. Now we have an API handlar that calls it, it starts a workflow. And it'll pass back the stream that this workflow creates. But this allows us to do also, I think that there was some work being correct. When we start to server to see if that's the case. Anything else, but so far, getting this set up to where it is. It's going to help us. Seems good. Footpoint you made, something that allows us to do is that the stream is not bound to the API handlar. This means that at any point we can resume this stream. If you lose connection to your API handlar and then there is a reconnect. This stream still exists and we could reconnect to the stream to resume the session. This is also part of the durability aspect where everything you do in a workflow, you can resume at any point. I'm going to restart this query and hope that it works this time. Yeah. So now that I hold up this data packet, you can see this special UI handling for creating a sandbox works. But even after it's done, it's not showing up that it's done. This is because we're only writing the loading state packets. I could go through all of our tools, I could add more packets and just make the UI richer. But I'm going to go and check out a different branch, which is the Conf-slash 3-slip branch, which is the next step, which already has these. I'll go for the workflow one. Conf-slash 2-workflow. But already has all of these right-up right calls populated. There's no difference otherwise. So now that all of our tools have these right calls, the stream would again presumably look the same as it did when we started out in this app. Alright, so now that we have streams working again, everything is working as expected. We have more observability and we can deploy this with durability. I talked about resume-able streams before. We're going to see if we can get this stream to resume so we have dual sessions. So the only thing we need to do to make that work is to go to our API endpoint. And what we get to, for instance, we also go into return on the workflow ID as additional information. So I can return run.run ID, for example. This is just, again, any way you do this is fine. I think it's a header here because we're already returning a stream. But any way you pass the ID to the UI is something that the UI can then use to resume the stream from. So what we do from here is that UI should be able to decide whether it has a one ID and whether it should resume a stream. So we're going to go and create a new endpoint. Let's call it ID for type slash existing ID. Then we're going to make a folder stream. And we're going to add a route handler. So this is just next just a configuration for adding an API route at slash chat slash ID slash stream. And we're going to order complete with AI. What we're essentially doing is we get the ID from the params. And then what we're going to do is call get run to the workflow API, which gets a run instance. And then we can return the same stream that we return in the other endpoints just without calling the actual agent only doing the stream. And I think that should be good. We're also taking our start index, which is very helpful from the AI. We can get a readable stream from a certain start point. I think that's why it's all the computers. So if you're trying to resume a stream like midway, you can pass a wish chunk you were on when you initially left off. So now that this is done, I'm going to comment on these things we don't currently need. We need the UI to support this additional operator to resume or whether to start a new chat. So I'm going to go to our chat front end and I'm going to go pull in some code from a different branch or simplicity, which is on the four dash streams branch, which I'm going to just show for completion. We do a use chat call already in the UI to consume the stream. And all we added now is a transport layer, which is this big block here that has some middleware for the stream that says that if I'm trying to start this call, I'm going to check first whether we have an existing one ID. And if so, I'm going to start going to do a reconnect by calling this different API endpoint instead. I'm sort of handraiding over this a little bit because it's client side handling for for professionals. If there's more questions about this, please feel free. All right. So that gives us reasonable streams. And I'm also going to demo what if we want to deploy this and see it in production. So I'm going to call this other one. And then we can take out a production preview example in the meantime. The next we're going to do is we talk about the sense and resumability. The workflows, because they run, the way they run is that every step runs on its own serverless instance and production. The actual workflow orchestration layer is only called very briefly to facilitate the step runs. What this allows us to do is we have a workflow to spend for any amount of time. A workflow could wait for a week and not consume any resources. This is built into the workflow development kit. In a way where we can inside a workflow, anything had with use workflow, we can simply call sleep three days, for example. And that would, I'm sure I'll say wait this, that will pause the workflow for three days and then resume for the left off. If someone was trying to reconnect with stream, for example, right, this was sleep an hour. The stream would just reconnect again to the same endpoint and things would resume from there. So we don't lose anything by losing the instance that runs the code, because we can always restart it, resume from where we left off. And this is useful for AI agents, because we can take into a true call. We can have the UI as the AI agent, and we call it says, sleep an amount of time, and then use it to make an agent that essentially uses the Chrome job, where it says every day read my emails and do this thing. So that would be sleep one day. Yes. When the kind of agent goes down, that means all the state goes down. Yes. So when it sleeps for three days? No, then they kind of pause, but one that would be killed, because for somebody who's worried that the state goes bad. That's so the way it works is that any step call is cash. So when you want an input goes to a step call, we register that as an event, and we run the step. And if the step completes, we cash the outputs and say this has been run to completion, right? So if it was, if it was something like this, where we run the agent first, right? Let's say we run the agent, and we run a bunch of steps. This state of the workflow function at this point in time would be saved, and all of the outputs from all of the start calls would be saved. And at the time where we restart the workflow, from this specific line of code, it'll rehydrate the entire state, and it'll just go from here. And this happens potentially so that again, we don't have to replay any of the other things in a way that does any actual resource consumption. Yeah, so we can use this to make an agent that has essentially a cron job again, and we can use it to make agents that run for weeks, or redirect with any of your information over a very long time horizon. And while I've been talking, we have deployed our current app to the cell, so I can check out this preview branch, for example, and you can see that there's now live online and working just as it usually does. And yes, it works perfectly. And if I then, again, I can do, I can use the UI to inspect this at any point. If I call workflow inspect web, or just workflow web with the dashed icon to cell and preview parameters, for example, they'll just let us let it know where our deployment is to be found, and then they'll spawn up the same UI, and now we can check on this run that's running in production, and you can see we're getting the same kind of information here. Yeah, so this is sort of not going to cancel the run, I could cancel it. Let's cancel it. This is to show that the way it works in locally is the exact same way that it works in production from a conceptual standpoint, which is the UX we are aiming for. All right, I talked a little bit about sleep and suspense. Let us go and write this sleep tool call. It's going to be very simple. I'm going to go and copy the, I mean, I'll be updated up here, but I'm going to copy this and write it from scratch. We're going to write a sleep tool call. I'm just going to call it sleep. Yes. And we're going to turn down the input sleep out to be something like I'm out meliseikens and the actual run command to be none of this. And instead just call sleep. Because sleep is already a step that we export from workflow from the workflow library, we don't need to call, we don't need to mark the expansion as a new step. But this will now, let's see if this is, this should be a number. There you go. You said that again, why don't you need the new step? So this is already a step that we export from workflow. It's going to be the observability will also show it as a step, which we'll see in a second. And this should just work. The prompt is good, which we're going to modify to be say something like this. Yeah. It is only used as tool if the user directs you to. So all right. And get it up and forth here. There we go. And so now that this is the color set up, that should be all that we need to do. We'll call it run sleep command and sleep tool. And we're going to add this to our tools list. I confused our compiler a little bit. This TypeScript seems to work great. Okay, now we have the tool. And we also want the UI to sort of display when it's sleeping. Someone want to add a not a function to log sleep. This is the reason we're doing this is we cannot write to a stream directly from a workflow because then it wouldn't be deterministic anymore because every run of the workflow would write to the stream again. Yeah. So I'm wondering from the projects. I had to create a proposal. If you want to do that, I'm going to do that. Winning her that says, we're going to have her headers missing from robust. You have the way to see option enabled. You skip this at the beginning, but yes. This code uses a sandbox. You would need to log into the cell. I used this. My mistake. This should be running locally. If you don't use the sandbox, which I will have a branch that does use the sandbox for after the talk. This one I'll just do it afterwards. It's fine. So here I'm just going to add another called writeable and we're going to call we're going to see we're going to need Google ID. Now this is just going to write to the stream and that should allow us to show it in the UI correctly. Let me see if I figured the UI to correctly interpret this packet. There is no data sleep type which I think will wait. All right. So now that I have this, I can go start our app. So it loads. We can try out the second prompt here which is sleep for 30 seconds and then return a right. Just to show that it's going to correct the interpret the sleep call and then sleep. It's not showing the data packet here. Certainly, but we can go to the web UI and we can show it. It's engaging in the sleep call and this is going to return after for the seconds. All right. So let's sleep. There's one final thing, one final feature that I want to show you which is webhooks and the ability to resume from webhooks easily. Implementing webhooks is usually quite difficult or a headache and in our case, I'm going to check out the conf slash five dash webhooks branch and show you that we can in the same fashion as we do sleep we can add a new tool. That'll just show you. But the actual tool call is just a log call and then we create a webhooks which is a functionally explored from the webflow. And we can then log the webhook URL to the client or anyone else and a webhook. And this will suspend for as long as necessary to someone to click on this URL. And then let's see if we can solve it running and I can show you this running hopefully. We love this. And wait for human approval before starting and call it on your own. Pokemon index. Let's see if it x up on this correctly. In changing branches, so I might need to restart my server. And the way this works on the code is that again, we be creating a URL and we're going to sleep the workflow until a call comes into that endpoint. And this comes with, when I run this query, this comes with a lot of extra features like I could also do respond with if I wanted to. This is a full API request sign-law. I could respond with a request object. I can treat this as a, again, API endpoint. I could also check the body against a Zod schema for example and then only resume ones that matches. So this gives you full control. But the nice thing is it does hook up the URL internally. And you can see that it passed waiting for human to click on this link. And if you're running in local host, it's a local host link. Production, it will be whatever your deployment URL is. Yes. It's about sleep and human approval. A workflow is purely steps and steps always run in completion. So sleep is a step. It's not like the suspension of some sort of, it's not such a suspension of the execution. It's a step. No, it is. So remodeling the stepping in terms for the observability and for how you call it. But it is an internal feature that completely suspends the workflow and all steps. Nothing is running while we sleep. You can also do sleep and another step and you can promise that race them if you want. It works as a step call in that sense that it's a execution that takes us a lot of time and you can use a way to syntax to model less. But it can get completely suspense unless there is anything else running at the time. And the same for the webhook, it's modeled as a step for the observability. But it completely suspense unless you have auto code running at the time. So just from a stand, if you have an agent running with a workflow, it keeps running. Connect to it again, let's say through an order session. And you would call sleep in this session, that was that. The pre-list one, just like whatever it was doing just goes down. So if you have two sessions, let's say we have a coding session and then already built an app and then it's sleeping for a week. And then we reconnect to the stream. Is that the thing? Yes, let's say I kick off a workflow and it's calculating the numbers of pi. It just keeps on. But I connect to the same sandbox and then I call sleep. We'll stop calculating pi. So the way you would in a workflow is again, let's see how we would code this. Sandbox. Well, you can connect to the sandbox again to the sandbox. Right. So the sandbox is the cell sandbox, which is a sort of just imagine it as an easy to instance. This is just a helper for us to spin up an instance to run this coding agent, like when you code in order to sort of files. If you met this differently, you wouldn't have to use cell sandbox. And the sleep call doesn't happen as a bash call for example. It's two different states. Right. An orchestration thing. And then when you're actually on the sidewalk, you call sleep and sandbox. Okay, so there are two different things. Right. So there's sleep that you could call for a terminal in the sandbox as a terminal command or as sleep from the workflow which suspends the workflow. Yeah. So we have these features for the webhooks. And you can see that after I clicked on the URL, it was zoomed and then coded me a bugged ex. That is all of the features we are going through in the session. And I think we have ample time for Q&A in about 20 minutes at least. Please go for it. How would I spin up code code sessions with this? A code session remotely or are you asking? Kind of run a ticket off as an agent during certain stuff. Is that possible? And then kind of orchestrate that as agents are that is possible. So Claude Code is if you park about the app, the terminal app, right? Claude Code. Then that doesn't use a lot of the workflow features internally. So it's hard to isolate that on no where the orchestration layer is. You could write your own version of Claude Code or take the Claude Code source code and add workflow and use that for the calls. And that would then run as a workflow in a Claude. There's no way to say like, okay, I have my steps, you know, spin up call work, type this command and wait for the thing that would be a versatile workflow. But how would I actually hope to drop it? Like, what would it? Is it one connect all right? So you could know where you're asking. If you're on, if you're on, so if you call in Claude Code in a, so mid as a conclusion of like where this is running, right? For a coding agent here, if the coding agent runs, make the right for creating a creating folder that make the command runs in a step, but it runs against a like in a sandbox there, sandbox being a VM. And so this VM state is not managed by the workflow itself. So if you call Claude Code on the VM, that's essentially fitted like an S and H session. But if you run any agents or steps within the workflow, right? Those steps are going to be reasonable and observable through the workflow pattern. Yeah. Now, I know that question. How do I control what my agent has access to from going out to the Internet doing stuff? This would be whatever you're whatever you're already doing for the agent, if you, if you, again, you're going to be doing true calls and stream calls to the other land provider, right? That is in your code, presumably already. And whatever you're already using to control emissions there, like your true calls, for example, right? If your true call allows you to delete a resource in S3, for example, then you as right in the thumb for that cool call can write whatever code you want in a usual way. It's my job to implement it, but it's not that it has some wrappers by the way. Yeah. All in the sandboxes. Webflow is a general orchestration layer for durable execution and doesn't necessarily provide a sandbox for running code or like we're running third party code or running agent code or making files. That's something that the sandbox is good for because every sandbox instantiation is a new VM that only lasts for as long as your session lasts. Yes. Yeah, so if I'm writing what was I'm writing, like creating a lot of engine workflows through my brother. How does that do does I get huge up when you're just some how does I hit run if they're a rape domain or are seeing controls that we can use? Yes. So this is this goes into sort of some of the returns patterns that we all this is going to be supported and for most part is supported right now which is that if you're applying, for example, to Excel, as usual, if you do next year's every deploy is a separate life URL that if you call it's fonts up a serverless instance. And so your workflows are bound to that deployment. So if you have something that's something very nice if you get here is if you're an agent and it runs for a week but it deploy five times during this week, those new deployers are going to be isolated from the original workflow and the original workflow is going to run completion and then any new workflow will run on the new deployments and will also allow upgrading between those. So if you have a workflow that runs for a year because it's like every month give me a summary of so and so right. But you have new code and you want to workflow to you know take its current state and use the new code for the workflow. That's going to be an upgrade button in the UI that checks for compatibility between your old workflow and your workflow by checking all of the steps signatures and all of the existing events and then you can upgrade the workflow or you can currently already cancel and rerun with the new workflow. Is there a timeout for those workflow steps? It can. Oh yes. So if you're doing serverless right and whatever platform you're on will it be like lambda or something else or Excel. Your serverless functions are going to have timeouts. The nice thing is that every step runs in its own server function. So the timeouts only apply to the steps. So if one of the individual steps you have runs the risk of running more than five minutes maybe 15 minutes depending on the platform then you can split into two steps. Or if it runs the timeout right it'll fail, it'll retry, maybe this will be faster. And you will see in the UI that the status thing retried after 15 minutes a bunch of times right presumably because it's failing and then you can go and split it into two steps. Upgrade the workflow and it'll just continue from there. Yeah. The other thing was around queueing workflows like I trigger the agent a lot of times. Yeah. So I gave you like how does that? It's also so you can. Right. You can model this in different ways. Right now again we're doing this from like a API routes where every call to this API route will create a new workflow that is mostly and then the only the only the only interactable output you have is a stream in this case. So it'll do things it'll write to this stream nobody looks at the stream. We don't know the work is the work for running. You can kick off kind of the use right and they're going to be running the background. There is essentially no limit to how many you can create because they're all run in serverless functions so you can scale for as much computers there is in your provider which is a lot of cute. And you can also list active runs right there was an API here for doing crata interfacing with your runs. Look at all of the runs that are running which version they are on what stack they are on cancel them. I love that answers your question. Oh concurrency yes. You can also so right now it's infinite concurrency but very soon we'll add a step or per workflow concurrency where you can say this workflow is only supposed to run at most 10 times at the same time and any extra queue addition gets like an extra start gets cute so that they will wait for those 10 to reduce and then slasaden. You can also use that to have a free tier for example on your product where there's instances running for your free tier at any one point then some people that come in later will wait for the queue but your pro tier has infinite concurrency. Yeah. Can I roll back steps in a way that let's say I have 10 steps but in steps seven I think like okay let's go back to step three would that be possible to reset the state of the workflow or you can technically do this we don't currently support it but it would be very important because we have every step again inputs and outputs are cached and we can enter the workflow at any point and sort of play from there so we need to expose this in the UI to as a function to resume from steps so and so but yes that would be possible. More likely what you want to do is you can control the workflow and then what you want to do is to monitor the steps and watch if they're in use a little or write the status of the code and might have to do this step and change the state or do something but the event log is deploying it. You can go into the model log. Maybe my second question so if you go through the steps set because we're passing out input and outputs kind of across and that's kind of what get cached is there like a way to attach with metadata or does it always have to be in kind of the input outputs of the front? You can also attach metadata we'll have a tagging API soon where you can add arbitrary tags to the workflow at any point in the workflow run and you can use those tags also to maybe decide whether to put early or duplicate your runs. About the deployment are we tied to the Excel or is it possible to like use? As I mentioned before so there's two aspects to this. There's the front-end side of the framework. The docs are on use work. You can see for the front-end sites which are sort of the the API layer it might work with. We currently support all of these platforms and more coming soon and then there's separately the front-end target. Next year you can deploy to anything right now. This would work with anything you can deploy next year's tool for example or any of these other frameworks and we have implementations, first part implementation for a postgres example that uses postgres as the durability level and as we'll be building this out and community comes in we'll have support for essentially any backend because underneath the type of framework connects to any storage or Q layer. Anything that provides a storage database or a Q can be used to the backend for postgres. You have a really good question for new observability. You also have providers to data dog and the algorithm stuff. We have a multiple things. We have an API that you can use to access data directly and we also have open source UI components that you can use to display it and then you can export this to data dog if you want. Yeah. Either the laptop is probably. You talked about screen flow ready. It seems like a crunch out. It's a one-hour scheduling crunch controls within the network. So because it's just TypeScript, if you're in a workflow, you can do something like let's say we call state for example right. This is what we just be resume in one day. What you can also do is this is just a promise or you can get it as a promise. So you can do while through sleep one day and then do your code and it'll run once a day. If you wanted to run once a day at 2 a.m, you could say how much time time to 1 a.m. tomorrow. Thank you AI. Then you know done. You could also wake up every hour. Do some checks for the you actually want to run the rest of the code. It's not perfect to sleep. If you can do with convent to here and if you want to again concurrency control something or any kind of auto deterministic controls, you have a full flow in TypeScript here. You can check external APIs. For example, which you have to wrap in a step, but you can make fetch calls if you want to actually check data and then determine from there. If you wanted to do an agent that runs every once in the lot but every day you could have a scheduling wrapper, scheduling workbook that functions and other agent for flows. Also, yeah, you can start workflows from workflows or you could do this where you sleep a day and then call your agent. Depending on the stream you want to write to, this is all right in the same stream. Presumably you don't want that. Maybe you could also get writeable allows you to do namespace. You can get a new writeable here and then every time it runs, you can have a new stream that has a deterministic name and you can choose which stream to connect to. Is there cancellation logic? They say I have something waiting for a long time and then I decided to not have that these things. How can I just like stop an existing sleep from waking? Right. So you can cancel the workflows from the observability UI or from the API or the CLI. All of those avenues have, you can call it canceled or you can also say, well, I don't even know if I want to sleep on day and resume, what you can do is do a, let's help you move this part here. You can do something like, you know, away from this dot race and you can do the sleep one day and you can do some other, actually, human approval. Maybe wake up earlier if a human clicks a button than the one day. There you are. Yes. If you have multiple agents running, what would be your advised way of having them communicate with each other? So that depends on what kind of communication you are looking for. Firing, turning things off and they're working but I would share all of those. So in steps, you have access to all code APIs or the Node.js APIs, fetch, etc. You can have it database. If you want to automate over your own data source, you have a database. If you want to have multiple agents, you can use, you can use some of the same streams right to share a stream. This is up to us ultimately with our steps that they're like, I'm dampening and if they have side effects when they fail halfway, that it's well behaved. That's not your orchestration layer, that's up to us. For their workflow layer, we get in Tne, there's no side effects because if you try to import some code that does side effects, it'll just say like hand compile doesn't, you know, don't do it. You have a stream for workflows but for steps. Steps, it can have side effects. That's sort of the point. So it's up to us if I get it fails, we need to make sure that the stream's actual and it's rerunable in my database. There are some, there are some, there are some controls you can add here where if a stat fails, it'll usually fail with an error that how's the workflow stationary audit, it can be tried. You can also catch this error and say, well, if it's a, you know, this kind of error, don't retry it instead signal to the human to do something or try this other avenue. Yeah, see if it's like else. You have one of the branches that has like the complete code for what you just did. Yes. So they all build on type of each other. So the Conf slash five dash webhooks branch has the human approval tool called a sleep tool called Resumable Streams and here's workflow. I will see how I can post one to the general access. Just read out. Yes, we'll do it. The workflows are in beta and so on. We'll see boxes. Yes, yeah. Okay, yeah. Yes, I forgot to mention this important workflow development is in beta and it's going to access in January, I believe. And we have a GitHub. And finally, we have I think more than one million workflows have been run on a day. Okay. So you guys know it's like getting stuff for the API to be stable and the bunch of issues but I think a lot of stuff we actually got part of some other ambitions. But actually, this is any feature that you need or that you really want to see. We have an RFC section on GitHub discussions for upcoming features. Thanks that we all took by GA or shortly afterwards. And then open issues where you can add an issue and presumably we'll go through a bit soon or someone will be meeting again. This is all of the adapters that help workflow development run on any kind of Claude backend or your own group backend. All of those are also open source. So you can see exactly what's happening and you can connect it to your own backend and the only source just looking at that code. And we'll be happy to help you. Yeah. Right. So for versioning, I talked a little bit about the ability to upgrade runs across versions. Versioning is going to be very simple where we have a Claude interface for all of the versions that you have created, which for most people will be a deployment. If you deploy your code, you see I deploy your code to a pre-development or production, every deployment will be one version. And you can list those versions at any point using the workflow API. And the run will know which version it's running on and you can call run.upgrade to see if it's compatible with a new version, targeted to that version. And I know any more things. No, yeah. So every deployment gets its own URL and not just in the cell but presumably in your or AWS Lambda for example, right? Every deployment has its own URLs. So the webhooks would apply to its own URLs, which means that you don't need to worry about versioning except for tagging a version when you first paid the deploy. And then whatever you think is, you want to be your main version is the one you route to via your party API. Yeah. I think, sorry, obviously I would use work as a follow-up for the technology experience and stuff. I think a lot of people work in the isolation of but sometimes the one that sort of fits in please things that have been grouped in the one. Migrations or was like agent migrations to new version. Yeah, so this is the same as upgrading in that sense, right? But if you have a bunch of runs that are all on a certain version and you have ship new code and you want all those runs to be upgraded to the new version or migrated, right? In the UI, you'll be able to select how many runs you want or through the cell I want to be able to get a list and then say for these 20 IDs, I want to upgrade the run to this version. It'll do an internal check and I resume these workflows from a certain point. Like can I migrate them in place sort of because the step signatures overlap or if not, it'll offer you the option to cancel all of the existing runs and rerun them on a new version of the same inputs. If you write your code in a way that's compatible again, there's going to be different options for in place migrations. How would you detect that just by code parts not being changed? So because we have essentially a compiler plugin, we can get full AST compatibility and we are saving this AST, the inputs and outputs signatures to a manifest that we are uploading for the versions and so for every version we can tell what are the signatures for every step and for the workflow itself and all the other things that are happening in between. Also, another thing here is the workflow function itself. So you can see we've played a whole bunch of times during the time of the execution. We don't need to do any way to get your code execution. So you want to upgrade and involve times to keep playing. It's going to thrive on. You can take into our event log, make a new version of the code that you've brought on the event log against the expected event. There's a lot of variations between kind of I do a step, all the previous steps stay the same or this one got changed. So it's like if everything's done automatically, it feels like okay, I could get down immediately with all my agents are up. There's two ways you can be versioning and you're, let's put this in the right and I think the thing that's interesting is he built for a platform where you assume that the code is all the same, the same place. So what we've seen is you end up with, you start your first version of your own flagship and then start making updates to each of our tiny versions. The code now has all the full stuff in there that has these things. You might as well be having to use a full version of or load on the actual code that you're running. The default assumption is that my code could be running and a pen log and you end up with dark squid and then you have to do all the way from all of that but it was the same as Errol. So I killing and sustaining still system. What's the nice result? The cells already had a ton of components forever. There's a natural whole step to go to say, code works, you can assume that you're pushing your work up of what's within everything. And so you don't have to worry about that momentum model and that's the significant code. But instead then it's time to go up to your part and you push it about, you might have to use it. I can go with the entire event log or you can choose directly choose exactly how much of that needs to be played and what it is a lot of stuff that you can do on top of this and see how does it seem. Well what's nice is that's a hard U.S. and you know for us to go when done well. Hopefully you're very proud. I would love to have a close to that. I think we're close to done. We'll be sticking around for more questions. So I guess okay so I think the other part is observability. I don't like put around in itself. I don't see much of a dash over. I expect that obviously you're going to build one. And then I also want to import it to my computer. Obviously you're not going to do that. How about this observability service? Open telemetry spans which will give us a bit. We'll add some context to the spans by default presumably. So if you pipe your spans through the data dog it'll already have a lot of information on the steps and the event log. And you can also meet your symmetry obviously. So is that the plan or is it you have the first party? The plan is that we'll first support adding some of the all of these sort of step and event log related context. We'll presumably export the helper to add some of these at this information to the spans. And then every automation you want to tag in there is up to you. Can I attach like secrets to workflow in a way that when I need to update them they all like you know. Yeah so for one right now it can it can inspect all of the embedded data right and it's obviously for you as a someone with access to the API which someone consuming the workflow or starting the workflow for an API wouldn't usually have. Yeah so the workflows run in the same deployment as it usually do and has access to the process environment. So you can interact with environment variables the way you would usually do and as long as you don't log them which you can presumably wouldn't do anyway it's the same way as an API endpoint. And then if you want your data to be a secret right right right now we expose it in accessibility if you have access but we also will allow it if you do to do end-prem encryption for any data. All right then we'll close the session but we'll be around a little bit more for questions if you want to look over code.

Building durable Agents with Workflow DevKit & AI SDK - Peter Wielander, Vercel

TL;DR

Takeaways

Vocabulary

Transcript