Code Mode: Let the Code do the Talking - Sunil Pai, Cloudflare

Traditional tool calling for AI applications becomes inefficient and token-heavy at scale, especially with hundreds of APIs.
A new approach, "code mode," enables LLMs to generate and execute code (e.g., JavaScript) directly within a safe environment, drastically reducing token usage and allowing for complex, programmatic logic.
This shift empowers LLMs to act as general-purpose computing agents that can custom-generate and interact with systems, bridging the gap between technical and non-technical users.

Traditional tool calling limitations: With many tools (e.g., hundreds of API endpoints), JSON-based tool calling leads to high token consumption, slow back-and-forth interactions, and difficult composition.
Code mode solution: Instead of JSON, ask the LLM to generate executable code (e.g., JavaScript) that runs against a secure environment.
Benefits of code generation: Code provides type safety, syntax error detection (LLMs are pre-trained on code), and allows for complex logic like looping, state management, sequencing, and parallelization in a single execution run.
Token efficiency: Cloudflare reduced API interaction from 1.2 million tokens to 1,000 tokens by exposing just two code-accepting tools (search and execute) instead of individual tools for 2,600 endpoints.
Inhabiting the state machine: LLMs can directly interact with and interpret the state of a system (e.g., an array of strokes for a drawing app) rather than needing to generate entirely new applications.
Harness/Sandbox architecture: A critical component is a secure "harness" or "sandbox" where generated code can execute. This environment starts with no capabilities and explicitly grants them as needed.
Sandbox characteristics: Needs to be fast-starting (e.g., V8 isolates), provide full observability, control all outgoing network connections, and default to no external fetches, only explicit API access.
Future applications: This approach enables long-running, stateful workflows; perfectly custom, generative UIs for individual users; and the ability to stitch together different services closer to the user.
Developer experience for agents: Treat LLM agents as a new kind of "user" that requires good documentation (Markdown), clear error messages, and discoverability to interact effectively with systems.
Capability-based security: A fundamental security model for these environments, where access is granted explicitly based on specific capabilities rather than broad permissions.

Tool calling — A method for large language models to interact with external tools or APIs by generating structured requests, typically in JSON format. Context window — The maximum amount of input text (measured in tokens) that an LLM can process or "remember" at one time. Token — A basic unit of text, often a word, part of a word, or punctuation mark, used by LLMs for processing and generating language. API endpoint — A specific URL or entry point within an Application Programming Interface that allows software to interact with a service or retrieve data. OpenAPI JSON spec — A standardized, machine-readable format (JSON or YAML) for describing RESTful APIs, detailing their operations, parameters, and responses. Harness — An execution environment or framework used to run and test code, often providing controlled access to system resources. Sandbox — An isolated, secure computing environment where untrusted programs or code can be executed without risk to the host system. V8 isolates — Lightweight, secure execution contexts within Google's V8 JavaScript engine, used to run separate pieces of JavaScript code with strong isolation. Capability-based security — A security model where access rights are granted by possessing a "capability" object, which is an unforgeable reference to a resource combined with rights to perform operations on it. Generative UI — User interfaces that are dynamically created or customized by an AI model in real-time, often tailored to specific user contexts, preferences, or tasks.

Our next presenter created Partikit, the open source tool for real-time multiplayer apps. For his day job, he builds AI agents at Cloudflare. Please join me in welcoming to the stage, Sunil Pie. 20 minutes to the pub. Hi, my name is Sunil Pie. I work at Cloudflare. I build agents over there for the agents SDK. I'm trying very hard for this not to be a Cloudflare talk, but I think we are on the sponsor board, so that's nice. This is a talk about something we call code mode. I've been wearing the hat, and there's some prior out to it. We don't claim to have invented it, but this is a talk about the implications of something new that we're discovering. So you guys have built AI applications, and tool calling gets weird at scale. It's just a couple of tools, and very short runs. It's fine, but the moment you start stuffing in your Google services, your Jira, your Wiki, et cetera, and you're like hundreds of tools filling up the context, it starts breaking. And the composition is weird, and there's this back and forth that you have to do with the model that's really slow. We decided to take a different tact. Instead of doing this JSON back and forth thing, we asked the model to generate code, usually JavaScript, that we could run against an environment. And some of the benefits seem a little obvious to us. With code, you get a type API, you can do type checking. There are syntax errors. Those are trained on gigabytes, if not terabytes of data already in the training set. And instead of doing this back and forth, you could write code that executes it all in one run, just one execution. So this is what I mean. There are fundamental capabilities of code. You're able to do looping. You're able to hold state. You're doing sequencing, parallelization. Things that you would normally do with code anyway as an engineer. So the first place we applied this my colleague Matt Carrey, who's actually going to be speaking about this a little more tomorrow, you should watch his talk. The Cloudflare API surface is about 2,600 API endpoints. If we exposed a tool for every single one of them, it's about 1.2 million tokens in your first call. It just blows. There's no way to create an MCP server for the entire Cloudflare API surface. And he had a very clever idea where he exposes just two tools. Search and execute. Both of these endpoints accept code as an input, literally a string of code. For search, the input to the function that you passed to it is the entire open API JSON spec. And once it does that, execute gives you a whole bunch of functions that you can call against the things that you call. And it reduced that 1.2 million token thing down to a thousand tokens, kind of unheard of. I think it's like 99.9% reduction. This is going to be scary. I actually have a live demo of this. And demos don't usually do me well on stage. But the point being that we were able to take a wide, super wide API surface and make it incredibly fast. The prompt itself can be fairly generic. So I should have kicked up the font size on this one. The prompt here is, as a customer you come in and say, we are getting d-dost. I want you to find every offending IP that's like attacking us and block them. In a moment of panic when your website is going down, you don't have the time to do menu diving. The Cloudflare dashboard is famously a little cumbersome to handle. And you just want the thing done. And you can't even get an A. It's like three in the morning. With a regular MCP thing, and this isn't even talking about stuffing 1.2 million tokens. It would be about eight round trips to do each of those API calls. Instead, the model can generate this string of code, run it immediately right next to the API surface and do it in one shot. And it's just running JavaScript. Just functions and just things that you're exposing on the API surface. Live demo. This is a demo of our mythical server. I hope I'm logged in because if I'm not, I'll need all of you to close your eyes while I enter a password. Let's say I just want to like list my workers. Oh, there it is. List my workers. And there's no password required. Okay, fine, that's fine. Okay, I give it only read only access for this demo. Do the thing. Yes, allow, sure, whatever. Nice. Okay, it comes back. And you'll see it'll start executing tool calls. I should be able to open this up. It has sent saying, hey, find me all API and points that just say the words list workers or something like that. It then runs code which, hey, it's like one single request for the API endpoint to get all the workers. It must have received a whole bunch of these. It's actually going through JavaScript errors now. This is going to be fun to see if it actually succeeds. Eix. Oh, is it trying to do it like, it's trying to paginate through the thing. Assume that this worked anyway. And I'll keep talking while it does this. Love that this is happening to me on stage because I did test it 10 times before coming on. I need to pay for the MyThos model to make this work accurately. By the way, you can actually see it is actually like listing workers over here. It might just be having trouble rendering it over here. The point being, we are able to shrink that down. Now if this was a talk about optimizing MCP servers, I would be done and just it. I was like, hey, you should throw this. And trust me, it works when you're not staring at it and have 800 people looking at you on the stage. But it did give us an idea that there's something deeper going on here. The ability to run this code and feels like there's a new way of interacting with systems, with LLAMs. Here's what I think. Everyone here is a programmer. When I give you a problem statement like you have 200 photos on your desktop, I need you to categorize and rename them. First thing you do is you open up an ID. You're going to write a little script. Maybe you're going to pass every image to a vision model now because you get a nice caption for it. Rename it and you're done and dusted. That is how you interact with systems. My mother is not going to do this. Her options are to call me up or usually buy an app, either a desktop phone. And no one's made an app that does exactly just that. It's going to be like lowest common denominator apps for photo management. And it's $7 a month. And for some reason you have to install a Damon, which is stealing your crypto or some such stuff. And there's been this dichotomy. And it's fine. Until now this has been an acceptable trade-off that non-technical people will have custom-made interfaces built for their needs and desires. Elements are breaking this boundary. They, every human being on the planet now has access to a buddy that can spit out code that can interact with systems. It takes, it takes a line like rename these files by date and location and generates code and can run it on your, on whatever system you're exposed to it. I say executed safely here. And that's the bit that I do want to talk about in a minute. The other example I have, so this is Kenton. Kenton is the creator of Cloudflare Workers. Famously, I'm, so he does the work and I like taking credit for his work. This is our relationship in the company. So he had a thread a little while ago where he built a little wipe-coding environment for himself. Because no one else does that in the world right now. So unique. Build your own little wipe-coding thing. The thing he asked it to generate was a canvas, one of these teal draw, excaled draw style canvases. And it did it, it did a little canvas with little brushes and colors. And the first thing Kenton did was draw a tic-tac-toe board on it with a little x in the corner. This is the finished state and I'll get to that in a second. He did that. And what he told the model then is I want you to play tic-tac-toe with me. The model as you can guess, it started generating a tic-tac-toe app. Kenton stopped it immediately. He's like, no, you have access to the entire state of the system. And the state of the system here is an array of strokes. Like just a whole bunch of points, grid line, grid line, x-stroke, etc. He said, inspect that and play it with me. Immediately the model started, it output the state into its own context. And it's like, I recognize what this looks like. It looks like a tic-tac-toe board. And I can see that you put an x in the top left. Let me draw a perfect circle in the middle of the app. To be clear, there is no tic-tac-toe code anywhere in the system. The emergent behavior is that the model has, like, sure. I now know how to interact with the system with a set of strokes. Also, it lost. By the way, it lost the game. And then when we saw the reasoning traces, we noticed that Opus let Kenton win. Which is a whole other weird area of alignment we're not talking about. Anyway, so this actually generated a lot of conversation internally. And that's why this talk is a little weird. It's a little woo-woo. I'm not even sure where we are going. And I want to spread the idea to you and have you folks like integrated. So the phrase we have started using is, it stopped generating a program. And it instead started inhabiting the state machine. There's the ghost in the shell reference here for anyone who's over the age of 40. You need ibuprofen. You should go back home. But no, it was a very strange thing for us not to have a separate app generation stage that you then interact with. That is entirely the part of the thing. So what does this new software architecture look like? Everyone's building what they call a harness. It's because over the last three to six months, everyone has realized that these coding agents are great general purpose computing machines. It's why they're running Claude Code. No, they're running Pi on a Mac Mini, which is the wrong machine for this, by the way. You don't have to spend $400 for a thing that makes API calls. It's been driving me mad. If you check all the second hand prices of some Mac Mini's have like shot up. I got one before it, but I bought it because I'm special that way. You, everyone's building this harness and this architecture of the harness is not just that it can generate code, but it has a safe space to execute this code into which capabilities are exposed. And there are some attributes to this sandbox. We're calling it a sandbox, which is again another completely overloaded term. And I have friends in the industry, everyone's building a different kind of sandbox. We have a sandbox SDK which uses containers and VMs, but that's not even what I'm talking about right now. There are some capabilities to it. Unlike a container which comes with all sorts of features that you surround with security, you know, you do a bunch of things from the outside. You start with something that has no capabilities. The only thing it can do is execute code. It can't do fetches. There's no exposed APIs, no nothing. And then you grant capabilities to it explicitly. We have something called dynamic workers. I told you, it's not really a Cloudflare code. Someone else built something better if you think it's better. It's fine. But this is what we use. We use V8 isolates because they start up really, really quickly and it's about 10 years of security hardening. It's in our DNA. We care a lot about that. Anyway, so you start exposing capabilities as APIs, A. And we also can control all outgoing fetches and any network connections. In fact, the default way we recommend you use this is no outgoing fetches, only APIs. It has to be fast and you need absolute full observability into it. You need to know why last Tuesday, it made a trade for $2.3 million for, I don't know, man, like Lama Poo for something, right? You need to go back to that code. You need absolute observability on these systems. It can be V8 isolates like we use. You could use, I don't know, WebAssembly, a custom JavaScript interpreter. That's not the main story here. You just want something that's able to execute, that you're able to expose capabilities to and run really quickly. From here, you can start getting really ambitious. The example that I showed you was a one-off, take some code, run it on an API, expand. Now what if you could generate long-running workflows that run for days, months, years? What if each of those instances has some state that it can carry through its lifetime? What if in this world of generative UI, you can start generating perfectly custom UIs for every single user that you have? Everyone who does e-commerce knows this problem. The more popular you get, the more UI becomes this bland thing that has to work for every single user. Then you bring in the ML people and like, what if we change the color button this way if it's somebody else? No. You can go absolutely custom. So I like the fact that I got open to generate generative UI for a slide where I'm making a point about generative UI, and it still looks a little bit like shit. But the idea is, let me talk about that e-commerce. You have context about everything about the user, the things they like, the orders they have in their cart, the things that might be making them mad. You can surface these things as actions. The UI doesn't have to be a blank chat box, though honestly, blank chat box e-commerce might be a lot of fun. Here I have two different use cases. In the first one, I need to return these shoes and find something similar under $100. If the product engineers have not implemented this, it's going to suck, but you can generate something on the fly versus what is happening with my delayed order. Point being, we are now in the world where we can generate completely different programs backed by a system that you built on your backend for every single user. It's a new kind of software we're building. And this harness idea isn't just built into the product. A lot of people are finding power by running the harness closer to the user simply because then they get to start mashing up all their different services. This is an anti-cloudflare talk at this point. I'm like, you should be running the software on your iPhone, like not so much on our servers. Please run it on our servers. But there you start getting to stitch together different systems in this safe environment. You get to do it on a task-by-task basis. I put this in here because I'm a React programmer and I don't want to freak out the React people by saying no one really wants to build UI anymore. But really, it's a harkening back to rethinking everything that we have thought about UI and for this new age. I keep thinking about it as part of the tech tree we have not really explored for 30 years because Eval wasn't around. But now we have a safe Eval and we have these things that generate code for you. But you do need to be in a place where you understand that your next billion users are these little robots that are generating code for you. To be clear, your customers are still humans. Things interacting with your systems. If you really love your users, you need to find out where they hang out and they don't hang out in the pub. They hang out in registries. They dream in types and syntax errors. You need to be thinking about what is the developer experience for these agents. This is something a bunch of companies are already doing really well by the way. Docs which are marked down. Errors that let the agent know what to do next. Discoverability via search. The big one that I do want to talk, that I want you to embed in your head, I guess, is this idea of capability-based security. This isn't even a JavaScript talk. It can be in Python. It can be in Rasm. I hope it brings a resurgence of Lisp. It's how I learned how like ASTs work. It kind of breaks your brain. But the attributes are still very much the same. Events, sandboxing, capability-based security. Embeddedable so that it's really fast to start up and run, ephemeraly. React programmers simply, well, UI programmers simply because they have so much, they've been so close to users, I suspect that they'll do particularly well here. That feels really good to me by the way. I feel happy about it. So to end, for the longest time programmers like us, we got code. We had infinite power to interact with any system that we could and complain about it on Twitter because our documentation isn't have the right CSS or something. JavaScript programmers super-entitled by the way. Everyone else got buttons and forms, that distinction in breaking. In a world like this, you need to let the code do the talking. The code is the thing that interacts with all your systems. Come talk to me about it at the pub. This is like, it feels like it's opening up a whole new area of research for us. And we have a lot of ideas. And I get to finish my talk and the day with six seconds left. How good is that? Thank you very much. Let's be serious.

Code Mode: Let the Code do the Talking - Sunil Pai, Cloudflare

TL;DR

Takeaways

Vocabulary

Transcript