MCP = Mega Context Problem - Matt Carey

Exposing an entire API surface to an AI agent via traditional tool calling methods quickly exhausts the agent's context window due to the sheer volume of tool definitions.
The speaker proposes "Code Mode" where agents generate code against a typed SDK, enabling access to vast APIs without context explosion, but this introduces the challenge of safely running untrusted code.
This security challenge is addressed by using lightweight, isolated runtime environments with programmable guardrails, like Cloudflare Workers, allowing agents to execute generated code securely.

Initially, agents bundled their own tools; later, shared tools through MCP servers standardized access but led to an explosion of context when exposing an entire API surface.
A single API's OpenAPI spec can represent millions of tokens, making it unfeasible to dump all endpoints into an agent's context window for tool calling.
Current workarounds like splitting APIs into product-specific MCP servers reduce context but result in incomplete coverage and require manual selection by the user.
Three primary approaches for "progressive discovery" of tools are CLI interaction (requires shell access), Tool Search (loads relevant tools dynamically but still adds context), and Code Mode.
Code Mode involves the agent generating code against a typed SDK derived from the API spec, offering greater flexibility and leveraging the LLM's coding abilities.
Running untrusted code generated by an LLM poses significant security risks, including file system access, secret exfiltration, infinite loops, and resource consumption.
Isolated runtime environments, often called sandboxes or isolates (e.g., Cloudflare Workers, Deno), are crucial for safely executing agent-generated code with programmable guardrails for network access or environment variables.
The future involves agents widely interacting with platforms by writing and executing code, necessitating robust API rate limiting and widespread infrastructure primitives for secure code execution.
Client-side innovations will include programmatic tool calling, saving agent-generated code as "mini-scripts" for automation, and a proliferation of easier-to-build MCP clients.
The MCP protocol and SDKs are evolving to become lightweight middleware, enabling native integration into frameworks and allowing developers to easily expose their entire API surfaces to agents.

MCP — (Machine Comprehensible Protocol) A protocol for agents to interact with APIs, often used for exposing tools. CLI — (Command Line Interface) A text-based interface for interacting with computer programs, which agents can read and interpret. Tool Calling — The process where an LLM identifies the need for an external function (tool) to fulfill a user request and generates parameters to call it. Function Calling — Synonymous with Tool Calling, where the LLM is prompted to call a specific function or API endpoint. Context Window — The maximum amount of text (tokens) an LLM can process or "remember" at any given time. OpenAPI Spec — A standard, language-agnostic interface description for REST APIs, enabling both humans and computers to discover and understand the capabilities of a service. Progressive Discovery — A method for agents to dynamically find and load only the necessary tools or API capabilities as needed, rather than loading everything at once. Sandboxes — Isolated environments where untrusted code can be executed without affecting the host system or other processes. Isolate — A lightweight, independent JavaScript runtime environment, often used within a larger process (e.g., V8 Isolates in Cloudflare Workers). Programmable Guardrails — Configurable security policies or restrictions applied to a runtime environment, allowing fine-grained control over what code can access or do. Typed SDK — (Software Development Kit) An SDK that uses type definitions (e.g., TypeScript types) to provide structure and introspection for an API, making it easier for agents to generate correct code. Workadee — A Cloudflare-specific term referring to dynamic Workers or an internal tool for spawning them.

Hello everyone, welcome. Quiet down, quiet down. Awesome. How is everyone? Yeah, good? Thanks. One of here's some MCP versus CLI debates. Is that why you all came? Anyway, hello, my name is Matt. I work on MCP and agents at CloudFlare. Welcome to my talk. It's all about how we can make every API at all for agents. APIs exist in the wild, how can we connect them to agents and make them do things. So I really love my job because every day I get to decide if an agent looked like this, would he do this? So would he do this? And I think it's kind of fun. And we often fluctuate between the two of them. Someone does something that you think is slightly funny. And then six months later, we're all doing it and claiming it was the best thing in the world. So yeah, it's a really good crack. But the main part of the role, I guess, and what I end up doing day to day is how do we give agents hands? How do we let them interact with the outside world? And you're probably familiar with something like this. This is tool calling, function calling. It's been around for a while now. The LM writes a function. You execute the function, bash, bash, bash, whether in London is 18 degrees. It's not. It's like eight in its freezing. Sad times. And then from there, we went from bundled tools to something like shared tools. People made tools in their agents. And you probably, this is all recent history. So everyone's probably aware of this. But before MCP, we had people with bundled all their tools in their agents. And then they would keep them bundled in their agents. And then if I was trying to interact with Gmail or something, I would make loads of tools for Gmail bundled them with my agents. And that would be it. And the next person would have to do exactly the same thing. And then we ended up with this big explosion of MCP and remote MCP about April last year. And the service providers were like, we can give everyone MCP tools. And then everyone can use the same standardized tools. And we just make it once. And we provide it as another surface for people to consume our API. Maybe there's a CLI, there's an API, maybe there's like, I don't know, GraphQL API. And there's now MCP, there's another surface. But it's got a little bit fun because it was OK with eight tools. But then, what happens if you added a few more? Or a few more? Or a few more? Or a few more? Or a few more? And now, you're like, I want to give an agent access to our whole API surface. And well, that ain't going to happen. Why is it not going to happen? You've exploded a context window of the agent. You've completely annihilated it. This is one point, something million tokens. And this was the problem that we came across a few, well, around a year ago now. We were trying to give access to the whole of the CLI API to agents. You put all of the, you try and make naive tools at every single API endpoint. And you fully explode a context window. Open API spec is 2.3 million tokens. Into tools, that's something like 1.1 million tokens. And that's like, never going to fly even with the biggest foundational models. And in that time, we were like, we know this is not necessarily an MCP problem. But it's how everyone else is doing it. So we're going to adapt. We're going to adapt. We're going to improvise. And we're going to split up our API into lots of different product-based MCP servers. So you've probably seen this, like a company that publishes 16 MCP servers potentially. And then users have to interact with the one that they want to use when they want to use it. There's much less context. But the user has to select. And most of the time, there's kind of incomplete coverage. So for instance, one of our product suites, we might have six tools in our MCP server. But the total API maybe has 30 endpoints. You've completely missed some coverage there. And this is not fulfilling the goal of how do we make every API at all for agents. It's actually kind of annoying. So I think we did all a little bit wrong. Well, in CloudFly, we had 16 servers very, very quickly. We're hovering around to 1,000 endpoints. I think we're actually at 2,600 API endpoints now. But we basically couldn't split up all of these in two of our servers. And the users had to pick the ones that we wanted. What we really needed was progressive discovery of tools. Who's had a progressive discovery? Anyone heard? Yeah, cool. And that brings us to the crux of the debates that everyone has on the online. And that is like, how do we do progressive discovery? And is MCP dead? Was MCP like a really bad idea? And I'm going to say, I don't think it was. MCP's are protocol. All of these can be exposed to MCP. We just shouldn't be dumping loads of tools into context. That's the main thing. We shouldn't be dumping tools into context. And all capabilities. In the future, we might have prompts and resources more, skills of basically resources. And we just shouldn't be loading all of those at once. So there's three ways you can get around that problem. There's a CLI, which people would like. There's tool search. Or there's a third one that we're going to come to a little bit later. But how would a CLI work for agents? So this is a sandbox in the background. And if I use our CLI, and I do something like a just called Wrangler, we get a bunch of commands. The agent can read these commands, pass these commands, and be like, oh, I want to interact with the database. So let's do Wrangler D1. And maybe we want to list our databases, whatever. And then after some period of time and some interactive process apparently, we get the databases that have them back out. And an agent can do this mostly. And it can call dash dash help to get introspection on which parameters it needs. This mostly works. This mostly works. It's used very popular by things like OpenClore. And people generally really like CLI. But you need shell access. This is the main thing. This is the crux of it. You have to have shell access. And that's kind of annoying. So for things like Claude Code, they wanted a bit more of a structured way of doing things. So they have tool search. They have a search tool which loads the tools that they need, when they need them into context. So Sarah wants to create a worker. What it would do is it would take the user question. It would do some sort of keyword matching. And then it would add K equals, say, eight tools to context. And then at some point, the LLM is going to look at, oh, actually, workers create. This is the one we need. And so we're going to use that one. But the rest of them stay in context. Maybe it's not eight. Maybe it's six. They change. It changes. But yeah, you ended up with like 2,100 tokens. And only 500 of them are being used. But like it works. It works quite well. You only load the tools that are relevant. And then this last thing is a blog post that a couple of published in the summer of last summer. And it's like, how can we instead of doing like a static search tool, or instead of like enforcing an agent to need a CLI? How can we do something where we just let the agent write code and we let the agent write code against our API? And it turns out that TypeScript is actually, well, types are a very concise way of representing inputs and outputs in a way that an agent can reason them out. So say you have all of these endpoints, have like a get worker scripts, or a create a worker, or something like that. We generate these types. And then we let the model, given these types, write some code against these types. So here we're doing code mode list workers. I hope you guys can see that. And we're going to try and list some workers. So this might be like a user request to list workers. The model generates this code against a typed SDK that we generate from our API. You can generate them from Open APIs specs. And then we can run that and we can let list the workers that we have on our account. We could deploy a worker. That would be fun. Hello world. And we could put it behind one of the hardest things to do at Clive. Which is so weird because it's such a powerful product. But we can add access, which is like our managed IDP. And now this worker's secure behind access. Kind of cool. We'd like access policy to only allow me into it and all of this stuff. Super super easy. And an agent can generate all of this code given our types. So this feels like a step in the right direction. Just let the model write code we benefit from the model getting better. We benefit from, I don't know, like our improving our Open APIs spec. It's like that should be the source of truth. But we had this kind of weird thing where we thought this was awesome. And we were pretty stoked about it. But the clients didn't implement it. And then when I mean clients, I've gone into like MCP terms now. So the client is the agent. So we'll be referring to the agent as a client from now on. But so the clients didn't really implement it. And we were a little bit confused about why this is the case. This was sort of eight, nine months ago now. And it's a better way of interacting with APIs. Just let the model write code against the API. But they didn't implement it. And why not? And that's because running untrusted code is mega, mega scary. If I had said to you a few years ago, oh, we're just going to let a language model write some code that we're going to execute for our users without looking at it, without reading it, without seeing what it does that might have potentially like secret success. Ideally, it has some secret access. You'd be like, that's crazy. That's a CV, right? It's a CV. Like it's a vulnerability. That's a problem. And now we're proposing you to do this. So it is quite scary. Loads of things can go wrong. We could read a file system, read some secrets that you don't want to read. It could exfiltrate their secrets into a network request, run infinite loops, consume all your resources, do like really scary stuff, run a crypto miner. That would be bad. And in the past, people have tried loads of things to let people run code-like solutions. So if anyone's ever written a DSL, some sort of like JSON spec about how to interpolate that as code, that is basically this. If you ever used one of those integration software, where you have to do that, that is this. They just don't trust you to write code on their servers. VMs also, people are spinning up sandboxes to run code. Big sandboxes, big VMs, that is this. And also code of view. But it's kind of lucky, because we have a pretty cool primitive that solves this. And there will be other primitives that solve this. I just think this is the first and so it's worth like, worth shouting about really. And this is like, how do you run untrusted code in a way that's super safe for you and your infrastructure? And it's kind of like this. So we can execute a worker from a string. And a worker is just like a little, is like an isolate in V8. There's many blogs about how this works. I'm not going to go into it super deeply. I'm just going to show you what it can do. So for instance, we have this like this piece of code that was generated. And we're going to run this piece of code that was generated. And this ran on the back end. It didn't run in my browser. It ran in a dynamic worker that's fully isolated. And I guess how can I prove that to you? If we do this work, we're trying to get some secrets here, process.m. And if we print them, there are no secrets. And we also have this weird Claude flare global. Ooh, interesting. If we turn, that was with node combat on. If we turn node combat ability off, we don't even have, we don't even have a process.m there. And it all errors out. So we can like, we have this like programmable sandbox. It's not quite a sandbox. It's like a very lightweight thing. You can put load code into it and then run it. And I'll show some other options later. It's not just us that has this, but we have one that we host for you and goes to like Claude flare level scale. If you want to do billions of requests, knock yourself up. And now here's one where the agents were in some code that accesses an external API. And if we run this one, this worker is not permitted to access the internet via global functions. Well, maybe we wanted to access the internet. And now we can give it access. So it's a programmable sandbox with programmable guardrails. And all we're doing here is flicking a Boolean in the server. That's all that's happening here. But you can provide like a more in-depth function to be like, only access things to these domains. And that's what we do on the Claude flare mcp. If we go next. Oh, speaking of the Claude flare mcp, this is where I really hope the demo works. So this is an mcp client in the slide. And if we ask it a question, we're going to get an all-screen pop-up. And then hopefully all this works are insane. So now we have readown the access to the whole of the Claude flare API. All of my Claude flare infrastructure, I have readown the access to. Which is pretty cool. These account IDs don't worry about them. They're not secrets in the Claude flare world. Cool. So we just listed a worker. But you could do many more things here. You can deploy workers from your command line. You can do what we did earlier and add access to something. You could introspect your DNS. You could send emails soon. You can do loads and loads of other stuff. It's very, very cool what you can do here. Because you have access to the whole of the Claude flare API, all 2,000 and something endpoints. And I guess this guy brings up the question where are we going with letting agents access external tools? What does this look like? You have people installing CLIs for everything. I'm running it on their own machine. Maybe running it on a VM. That's kind of cool. You have us being like, oh, you could just run untrusted code in this other place that's really isolated. You have people doing tool search. You have people rendering UI's JSON. I don't know. And I guess my main thought is that we're going to have so many isolated environments on the web. And there's going to be loads of infrastructure primitives that allow you to run this type of untrusted code on the web. Because code is actually a very compact plan. Instead of doing tool calls, you can have one tool called code, where the model generates the code of your choice, and then you run it. And that code has so many more degrees of freedom than an individual tool call. So it makes sense to me that as the models get smarter, this is what we will do. And people will adapt their infrastructure primitives to do this. So there'll be so many more of this. And you see this starting with like, Pydantic Monte, Dino also, and we also have it with Workadee, the dynamic workers I showed earlier. More people are going to build these primitives because they're going to become more and more useful. So just like a little explanation. This is Workadee, like, spawning a dynamic worker in this sandbox and running some code to get a fib sequence. You can do the same thing with Dino, with Dino run, with some questionable checking. I've no idea what that does. And then you can also kind of do the same thing with Pydantic Monte, then you code interpreter for running untrusted Python. Because it's Python, we have to download Python. Sucks. This might never work. I should have no idea. Oh, there we go. Great. So maybe you can see, like, we're trying to go with this. That there was a previous time where no one would ever run untrusted code. That was a CV. You were just immediately like, you have to stop allowing that. And then it seems like it alums. It's actually really good for them to run for them to write code that you can run. And so now we're building the primitive to actually enable us to do that. And it feels like we missed out on this whole part of the tech scene that we've never tried before. In the 1950s, when you wanted to run something on a computer in your local town, you printed out some punch cards, and you stamped them, and you gave them to the guy. And that was kind of like running untrusted code, right? That was kind of it. And then when we went to the Claude, we got away from that. And now I think we're going to go much more back to that, where your users can write code, because your users are AI. And AI is very good at writing code. And that is how they're going to interact with your platform, whether through MCP, whether even through, like, bash and CLI, like, I don't mind. I think they're just going to write code against your services. And your services have to be ready for this. Like, your API is have to be ready to take a beating, because they have to have good rate limiting. Because I can run this in a full loop on multiple sandboxes at once, and just hammer your API. You have to have some way of protecting ends that. This is the new world that we're now going to be living in. And that's like on the server side, on the services side. Like, what's going to happen on the client side? So I think that's almost even more interesting, because that's the user-facing side of things. But the user's not going to see the server. The user doesn't care. The user just, why is my agent not getting my Gmail emails? Or why is it deleted my whole inbox box? They're not going to like, they're not going to see that. But on the client side, like, there's a lot of innovation that's going to happen here. And I think we've stalled a little bit recently, because actually building an MCP client in particular got really, really hard to actually build a client that was performative that worked. You needed to manage stateful connections. You needed to manage resumability between those connections. There's plenty of other reasons why building an MCP client was hard, but it was a pain, an absolute pain. And so people had the most stripped down clients they possibly could. They mostly offloaded to the MCPSD case, which are quite bare bones. And no one was building these more unique UI experiences on top of that. And I think that is going to come very, very soon. So the most obvious thing is we're going to have programmatic tool calling in the clients. The previous slide we just did showing those sandboxes with Workadie, Dino, and Pynantic. That is just running untrusted code in a client. People are going to do that. If your client is remote, you're going to do it like that. If your client is locally, well, just yo-lo it, whatever. Just e-valid. It's going to be fine. But more people are going to do this programmatic tool calling. It's going to happen. And because you're generating code, people are going to save this code. And they're going to save it in these mini scripts. And users might be able to decide, this action that I just did, that the LLM generated for me, I want to keep that for later. And then it will be much faster. So you can see things for things like Chrome jobs, a user might set up some web scraping job, without any knowledge of how web scraping works. And then it generates a script. And that script is ran like every day, every two days. And whenever it breaks, because web scraping is like pretty brittle, the agent will fix it and resave the script. This stuff is going to happen. And I think these saved mini scripts. They only work when you embrace like programmatic tool calling, but they really do work. And then the last thing is we're probably going to have many, many more clients, because they've been so hard to make up until now, and it is going to get easier. There's actually only, there's not a huge amount of really well-used MCP clients. That's going to change. And with that change, like more people are going to be able to make, the more people are going to deploy agents to the Claude that end up being an MCP client. And I think more people are going to try and do this stateless agent loop thing. It was fine to have sandboxes for every agent running called code locally, if there were a million agents in this world. I think when there are 100 agents for each person, oh, hello. That's not do that. That's going to start getting really tough. And you're going to have to embrace a Claude-native way of doing things, which means that state has to be something you can turn on or off. And this is, I think, we're nearing the end. But this is my last thing. It's like, I work a lot on MCP servers and on the SDK. And this is where I think that bit's going. I think we're going to see MCP as a middleware in an MCP server when you build an API service. It will be a flag that you can flag on in your favorite framework. The SDK itself is getting super, super lightweight. And I think by the end of this year, we'll be natively in every single, at least type script, big full-stat framework. It will just be there natively. Because it will be so small, it will literally just express the protocol in itself. And it will be silly for them not to have it. They'll just have a native integration. And they'll be able to do MCP is true on all of your APIs. And because all of the clients will be doing programmatic tool calling, you can express your 1000 APIs from one next JS app and just do MCP equals true and expose the most tools over MCP as well. And I think that will happen. I mean, I think that's going to happen for a while, but I think we're pretty, really close there. And the last blocker is like fixing the SDK really so that it's capable of doing that. It's capable of fitting in every single bundle, really. And that's the plan. You can find out more. We have a code mode blog post that came up recently. It's how we gave agents an entire API in 1000 tokens. If you have a big API, you should probably do this. And the accessibility providers please just do this. Because it's really, really good for people to access your data. And thank you. Try it out. NPMI agents. Thank you very much.

MCP = Mega Context Problem - Matt Carey

TL;DR

Takeaways

Vocabulary

Transcript