Skip to main content

Your MCP Server is Bad (and you should feel bad) - Jeremiah Lowin, Prefect

TL;DR

  • Effective MCP servers require "agentic product design," which means creating interfaces optimized for AI agents' unique capabilities and limitations, rather than treating them like human-facing APIs.
  • AI agents differ from humans in their expensive discovery and iteration processes, and their limited context windows, necessitating a curated and concise tool interface.
  • To build robust MCP servers, prioritize outcomes over atomic operations, flatten arguments, provide clear instructions and examples, and use informative error messages.

Takeaways

  • Design for Agents, Not Humans: Treat MCP servers as interfaces for AI agents, not human developers. AI agents have distinct interaction patterns and limitations compared to humans using traditional APIs.
  • Acknowledge AI Limitations: Understand that AI models are not infallible oracles. They struggle with expensive discovery, slow iteration, and limited context windows, making traditional API design inefficient.
  • "Curate" the Agent Interface: The most crucial principle is to carefully select and present information to the agent, providing only what is necessary and actionable, rather than overwhelming it.
  • Focus on Outcomes, Not Operations: Design tools around achieving specific end-user outcomes (e.g., track_latest_order_by_email) instead of exposing numerous atomic operations that agents would need to orchestrate. Avoid using the agent as "glue" or an orchestrator.
  • Flatten Tool Arguments: Avoid complex, nested configuration dictionaries for tool inputs. Instead, use simple, clearly defined primitive arguments (e.g., email: str, include_cancelled: bool).
  • Utilize Literals and Enums: For arguments with constrained choices, prefer Literal or Enum types over plain strings to provide clearer guidance to the agent.
  • Provide Clear Instructions and Examples: Document MCP servers and individual tools thoroughly. Include examples, but be aware that examples can implicitly set expectations for agents regarding parameter counts or structure.
  • Craft Informative Error Messages: Treat error messages as part of the agent's prompt. Make them helpful and actionable, guiding the agent toward recovery or correct usage, rather than being cryptic Python exceptions.
  • Respect the Token Budget: Be mindful of the cumulative token cost of tool descriptions and arguments, as agents operate within strict context window limits.
  • Use read_only Hint: Apply the read_only annotation (part of the MCP spec) to tools that do not cause side effects. This allows compliant clients to offer a more secure and streamlined user experience.

Vocabulary

MCP server — A server that exposes an API specifically designed for communication with Large Language Models (LLMs) via the Model-agnostic Code Protocol. Fast MCP — A popular open-source framework for building MCP servers quickly and efficiently, often becoming a de facto standard. Agentic product design — The discipline of designing software interfaces and products with the specific needs, strengths, and limitations of AI agents in mind, rather than human users. Context window — The limited amount of text (measured in tokens) that an LLM can process and "remember" at any given time during a conversation or task. Tokens — The basic units of text (words, subwords, or characters) that Large Language Models (LLMs) process. Token count directly impacts cost and context window usage. LLM — Large Language Model. An AI model capable of understanding and generating human-like text, often used as the "brain" for agents interacting with MCP servers. Orchestration software — Software designed to coordinate and manage multiple automated tasks or services, often used to sequence complex operations. REST API — Representational State Transfer Application Programming Interface. A common architectural style for networked applications, typically designed for human developers to integrate systems. Literal (Python type hint) — A Python type hint used to specify that a variable can only take on a specific, exact value from a predefined set. Enum (Python type) — A set of symbolic names (members) bound to unique, constant values, used to represent a fixed set of choices.

Transcript

I really do appreciate this job here. I'm going to try to make this as panaceous possible. We're not going to do an interactive part. We're going to talk through stuff. I'm happy to go off script. I'm happy to take questions. If there's stuff we want to explore at any moment in this, my goal is I'd like to share with you a lot of things that I've learned. I'm going to try to make them as actionable as possible. So there is real stuff to do here. More than we might in like a more high-level talk. But let's be very honest. It is late. It is a lot. It is long. Let's talk about MCP. I'm hoping that folks here are interested in MCP. That's why you came to this talk. If you're here to learn about MCP, this might be a little bit of a different bent. Just show a hands. Perth of MCP. Use MCP. Written in MCP server. Okay. Anyone feel uncomfortable with MCP, which is 100% fine. We can tailor. Okay. Then I would say, let's just go. Let's dive in. This is who I am. I'm a founder and CEO of a company called Prefect Technologies for the last seven or eight years. We've been building data automation software and orchestration software. Before that, I was a member of the Apache Airflow PMC. I originally started Prefect to graduate the same orchestration ideas into data science. Today we operate the full stack. Then a few years ago, I developed an agent framework called Marvin, which I would not describe as wildly popular, but it was my leg into the world of AI, at least from a developer experience standpoint, and learned a lot from that. Then more recently, I introduced a piece of software called Fast MCP, which is wildly wildly popular, maybe even too popular. Hence my status today. I'm a little overwhelmed. I find myself back in an open source maintenance seat, which I haven't been in in a few years, which has been a whole of a lot of fun. But the most important thing is that Fast MCP has given me a very specific vantage point that is really the basis for this talk today. This is our downloads. I've never seen anything like this. I've never worked on a project like this. It was downloaded a million and a half times yesterday. There's a lot of MCP servers out there. Fast MCP has just, it's become the de facto standard way to build MCP servers. I introduced it almost exactly a year ago, as many of you are probably aware, MCP itself was introduced almost exactly a year ago. In a few days later, I introduced the first version of Fast MCP. David had an anthropic called me up, said, I think this is great. I think to sell people should build servers, we put a version of it into the official SDK, which was amazing. And then as MCP has gone crazy in the last year, we found it actually to be constructive to position Fast MCP as I'm maintaining it as the high level interface to the MCP ecosystem while the SDK focuses on the low level primitives. And actually, we're going to remove the Fast MCP vocabulary from the low level SDK in a couple of months. It's become a little bit of, it's too confusing, but there are these two things called Fast MCP. So Fast MCP will be a high level interface to the world. As a result, we see a lot of not great MCP servers. I named the talk after this meme and then it occurred to me, like, do people even know what this meme is anymore? This to me is very funny and very topical. And then it's from, like, a 1999 episode of Futurama. So if you haven't seen this, my talk title is not meant to be mean. I'm sort of an optimist. I choose to interpret this as what you can do better. And so we're going to find ways to do better. That is the goal of today's talk. In fact, to be more precise, what I want to do today is I would really like to build an intuition for a gentick product design. I don't see this talked about nearly as much as it should be given how many agents are using how many products today. And what I mean by this is the exact analog of what it would be if I were giving a talk on how to just build a good product for a user, for a human. And we would talk about human interface guidelines. And we talk about user experience. And we talk about stories. And I found it really instructive to start talking about those things from an agentic perspective, because what else is an MCP server but an interface for an agent. And we should design it for the strengths and weaknesses of those agents in the same way that we do everything else. Now, when I put this thought in the world, I very, very, very frequently get this pushback. But if a human can use an API, why can't an AI? And there are so many things wrong with this question. And the number one thing that's wrong with this question is that it has a assumption that I see in so much of AI product design, and it drives me nuts, which is that AI's perfect, or they're oracles, or they're good at everything. And they are very, very, very powerful tools. But I'm assuming based on your responses before, I think everyone in this room has some scars of the fact that they are fallible, or they are limited, or they're imperfect. And so I don't like this question because it presumes that they're like magically amazing at everything. But I really don't like this question. This is a little question I've gotten. I didn't paraphrase it. I really don't like this question because humans don't use APIs. Very, very rarely do humans use APIs. Humans use products. We do anything we can to put something between us and an API. We put a website. We put an SDK. We put a client. We put a mobile app. We do not like to use APIs unless we have to or we are the person responsible for building that interface. And so one of my core arguments, and why I love MCP so much, is that I believe that agents deserve their own interface that is optimized for them and their own use case. And in order to design that interface, which is what I want to motivate today, we have to think a little bit about what is the difference between a human and an AI. And it's one of these questions that like sounds really stupid when you say it out loud, but it's instructive to actually go through. And I'd like to make the argument to you that it exists on these three dimensions of discovery, iteration, and context. And so just to begin humans, we find discovery really cheap. We tend to do it once. If you think if any of you have had to implement something against a REST API, what do you do? You call up a docs or go and squagger whatever it is. You call it up. You look at it one time, you figure out what you need. You're never going to do that again. And so while it may take you some time to do the discovery, it is cheap in the lifetime of the application you are building. Hey, guys, not so much. Every single time that thing turns on, it shakes hands with a server, it learns about the server, it enumerates every single tool and every single description on that server. So discovery is actually really expensive for agents. It consumes a lot of tokens. Next, iteration. Same idea. If you're a human developer and you're writing code against an API, you can iterate really quickly. Why? Because you do your one-time discovery, you figure out the three routes you're going to call and then you write a script that calls them one after another as fast as your language allows. So iteration is really cheap. And if that doesn't work, you just run it again until it does. Iteration is cheap. It's fast. For agents, I think we all know iteration is slow. Iteration is the enemy. Every additional call subject to your caching setup also sends the entire history of all previous calls over the wire. Like it is just you do not want to iterate if you can avoid it. So that's going to be an important thing that we take into consideration. And the last thing is on context. This is a little bit hand-wavy, but it is important. As humans, in this conversation, I'm talking, you're hearing me and you're comparing this to different memories you have and different experiences you have on different time scales. And it's all doing wonderful, amazing things in your brain. And when you plug an LLM into any given use case, it remembers the last 200,000 tokens it saw. And that's the extent of its memory plus whatever is, embedded somewhere in its weights. And that's it. And so we need to be very, very, very conscious of the fact that it has a very small brain at this moment. I think it is a lot closer to when people talk about sending Apollo 11 to the moon and with one kilobyte of RAM, whatever it was. I think that's actually how we need to think about these things that frankly feel quite magical because they go and open my PRs for me or whatever it is that they do. So these are the three key dimensions in my mind of what is different. And we should not build APIs that are good for humans on any of these dimensions and pretend that they are also good for agents. And one way that I've kind of started talking about this is this idea, which is an agent can find a needle in a haystack. The problem is it's going to look at every piece of hay and decide if it's a needle. And that's like not literally true, but it is in an intuitive sense how we should think about what we're putting in front of the agents and how we're posing a problem. And an MCP server is nothing but an interface to that problem and or solution. And so finally, to go back to our product intuition statement, I argued to you that the most important word in the universe for MCP developers is curate. How do you curate from a huge amount of information, which might be amenable for a human developer, a interface that is appropriate for one of these extremely limited AI agents, at least on the dimensions that we just went through. And that sort of brings us to this slide YMPP. And I almost made this like the Derek Zouender slide like the YMPP, but I just told you YMPP Derek, it's because it does all of these things. It gives us a standard way of communicating information to agents in a way that's controllable where we can control not only how it's discovered, but also how it is acted on. There's a big asterisk on that because client implementations in the MCP space right now are not amazing. And they do some things that are themselves not compliant with the MCP spec. Maybe at the end we'll get into that. It's not directly relevant to now, except that all we can do is try to build the best servers we can subject to the limitations of the clients that will use them. And again, I put this in here. I think we don't need to go through what MCP is for this audience. So we're going to move quickly through this, but it is of course for the sake of the transcript. The cliche is that it's USB-C for the internet. It is a standard way to connect LLMs and either tools or data. And if you haven't seen fast MCP, this is what it looks like to build a fully fully functional MCP server. This one I live in Washington, DC. This subway is often on fire there. And so this checks whether or not the subway is on fire. And indeed it is. Now the question we are here to actually explore is why are there so many bad MCP servers? Maybe a better question is do you all agree with me that there are many bad MCP servers? I sort of declare this as if it's true. I'm not trying to make a controversial statement. There are many bad MCP servers in the world. I see a lot of them because people are using my framework to build them. It does that surprise anyone that I'm sort of declaring that. I'm genuinely I'm curious if that's if I'm made an assumption. I don't think my experience with every MCP I give up tools is like that. But I know that they are like ABA wrapper. They just put that like through if I they wanted the ABA and that's it. And that's it. They call it an MCP. Yeah. And I think even I'll make the arguments that are going a little off-scope but I'll make the argument that a lot of them even when they're not wrappers are just bad products because no thought was put into them. And I mean one comparison that I talk about sometimes with my team is if you go to a bad website you know it's a bad website. We don't need to sit there and figure out why it's ugly or it's hard to use or it's hard to find what you're looking for or it's all flash. I don't know what makes a bad website exactly but you know what a bad website is when you go to one. We don't like to point out all the things because it's an infinite number of them instead we try to find great examples of good websites. And so what I think we need more than anything else are MCP best practices. And so a big push of mine right now and part of where this talk came from is I want to make sure that we have as many best practices in the world and documented. And I do want to applaud there are few firms. These are screenshots from Block has an amazing playbook which if you hate this talk read their blog post. It's like a better version of what I'm doing right now. And GitHub recently put out one and many other companies have done as well. I could have I could have put a lot here but these are two that I've referred to quite frequently and so I recommend them to you. The block team in particular is just phenomenal what they're doing on MCP. My coincidence the same team has been my customer for six years on the data side and they're I really love the work that they do and the block post they put out a very thoughtful and I highly highly recommend them to you. I want to see more of this and today is sort of one of my humble efforts to try and put some of that in the world. And so what I thought we would do today because I did not want to ask you to open your laptops up and set up environments and actually write code with me because it's 425 on Saturday. I thought that we would fix a server together sort of through slides to make this again as I said hopefully actionable but but a general a gentle approach to this. And so here is here is the server that you were describing a moment ago right so someone wrote this server. I hope that the notation is clear enough to folks we have we have a decorator that says that a function is a tool and then we have the tool itself and forgive me I didn't bore you with the with the details because we think this is a bad server to begin with. I think in this server what's our example here right we want to we want to check and order status and so in order to check in order status we need to learn a lot of things about the user and what their orders are we need to filter it we need to actually check the status and if this were a REST API which presumably it is we know exactly what we would do here we would make one call to each of the functions in a sequence and return that as some user-facing output and it would be easy and it would be observable and it would be fast and it would be testable everything would be good and instead if we expose this to an agent what order is it going to call these in does it know what the format of the arguments are how long is it going to take for the minimum three round trips this is going to require these are all the problems that we're exposing just just by looking at this we're not I mean solve them but that's the problems I see if I were reviewing this as a product-facing effort and so the first thing that we are going to think about and I think this is probably the most important thing when we think about an effective MCP server because it is product thinking is outcomes not operations what do we want to achieve and this is a little bit annoying for engineers sometimes because it's forced product thinking it's not someone coming along with a user story and mapping it all out and saying this will we need to implement we cannot put something in this server unless we know for a fact it's going to be useful and have a good outcome we have to start there there's just not enough context for us to be frivolous and so here's kind of what this feels like so that we can get a sense for it the trap when you're falling into the trap you have a whole bunch of atomic operations this is amazing if you're building a rest API it is best practice if you're building a rest API it is bad if you're building an MCP server instead we want things like track latest order and give an email it's hard to screw up and you know what the outcome is when you call it with the other version of the trap is agent as glue or agent as orchestrator please believe me since I've spent my career building orchestration software and automation software that there are things that are really good at doing orchestration and there are things that are really bad at orchestration and agents are right in the middle because they can do it but it's expensive and slow in annoying and hard to debug and it's the cast it and so if you can avoid that please do if you can't there are times when you don't know the algorithm and you don't know how to write the code and it's not programmatic that's a perfect time to use an llm as an orchestrator finding out an order status really bad time really expensive time to choose to use an llm as your orchestration service so don't instead focus on this sort of one tool equals one agent story and again even here we're trying to introduce a new vocabulary it's not a user story because user story is everyone thinks human even though it isn't user it's an agent story it's something that a programmatic autonomous agent with an objective and the limited complex window is trying to achieve and we need to satisfy that as much as we can and then this is one of those like little tips that feels obvious but I think it's important name the tool for the agent don't name it for you it's not a rest API it's not supposed to be clear to future developers who need to write you know you you're not writing an API for change you're writing an API so that the agent picks the right tool at the right time don't be afraid about using silly but explanatory names for your tools I shouldn't say silly they might feel a little silly but they're very user-facing in this moment even though it feels like a deep a deep API this just in case any of you didn't go read the block blog post I just found this section of it so important where they essentially say something very similar design top-down from the workflow not bottom up from the API endpoints two different ways to get to the same place but they will result in very different forms of product thinking and very different MCP server so again I just I really encourage you to go and take a look at that at that blog post and if we were to go back to that bad code example I showed you a moment ago and start rewriting this and if we had our laptop you're welcome to have your laptop's out and follow along the code will essentially run but there's no need here's what that could look like we did the thing that you would do as a human we made three calls in sequence that are configured that are to our API but we buried them in one agent-facing tool and that's how we went from operations to outcomes the API calls still have to happen there's no magic happening here but the question is are we going to ask an agent to figure out the outcome and how to stitch them together to achieve it or are we going to just do it because we know how to how to do it on its behalf so thing number one is outcomes over operations thing number two another thing a lot of these frankly are going to seem kind of silly actually when I say them out loud please just trust me from the download graph that these are the most important things that I could offer as advice and if and none of them apply to you think of yourself as in the top 1% of MCP developers flatten your arguments I see this so often where I do this myself I'll confess to you where you say here's my tool and one of the inputs is a configuration dictionary hopefully presumably it's documented somewhere in maybe in the agents instructions maybe it's in the doc string you have a real problem when by the way I don't remember if I have a point for this later so I'll say it now a very frequent trap that you can fall into with arguments that are complex is you'll put the explanation of how to use them in something like a system prompt or a sub agent definition or something like that and then you'll change the tool in the server and now you it's almost worse than a poorly documented tool you have a doubly documented tool and and one is wrong and one is right and only error messages will save you that's really bad we're not this is a more gentle version of that just don't ask your LLM to invent complex arguments now you could ask what if it's a paedantic model with every field annotated and fine that's better than the dictionary but it's still going to be hard there was until very recently there may still be a bug in that maybe it's not a bug no one seems to fix it but in Claude desktop all all structured arguments like object arguments would be sent as a string and this created a real problem because we do not want to support automatic string conversion to object but Claude desktop is one of the most popular MCP clients and so we actually bowed to this and as a matter of like necessity and so fast MCP will now try if you are supplying a string argument to something that is very clearly a structured object it will try to deserialize it it will try to do the right thing I really hate that we have to do that that feels very deeply wrong to me that we have a a type schema that said I need an object and yet we're doing cludgy stuff like that and so this is an example of where this isn't evolving ecosystem it's a little it's a little messy but what does it look like when you do it right up level primitives these are the arguments into the function what's the limit what is the status what is the email clearly defined just like naming your tool for the agent name the arguments for the agent and here's sort of what that looks like when we get that into code instead of having config colon dict we have an email which is a string we have include cancelled which is a a flag and then I highly highly recommend literals or enums whenever you can much better than a string if you know what the options are at this time very few llms know that this kind of syntax is supported and so they would typically write this if you had Claude Code or something write this it would usually write format colon string equals basic which works it just doesn't know to do this and so it's one of those little little actionable tips use literal or use enum equivalently when you have a a constrained choice your your agent will thank you and I do have instructions or context so I did get ahead of myself I'm sorry everybody it is 435 on a Saturday the next thing that I want to talk about is the instructions that you give to the agent this cuts both ways the most obvious way is when you have none we mentioned that a moment ago if you don't tell your agent how to use your mcp server it will guess it will try it will probably confuse itself and all of those guesses will show up in its history and that's not a great outcome please document your mcp server document the server itself document all the tools on it give examples examples are a little bit of a double edge sort on the one hand they're extremely helpful for showing the agent how it should use a tool on the other hand it will almost always do whatever is in the example this is just one of those quirks perhaps as models improve it will stop doing that but in my experience if you have an example let's say you have a field for tags you want to you want to collect tags for something if your example has two tags you will never get 10 tags you will get two tags pretty much every time they'll be accurate it's not going to do a bad job but it really uses those examples for a lot more dimensions than just the fact that they work if that makes sense so so use examples but be careful with your examples yes sir giving out of distribution examples is the way to solve for that you've seen that by out of distribution do you mean so examples that fit or not would not be represented the actual of but it's so interesting so I don't have a strong opinion on that that seems super reasonable to me I don't have been peon it I in my experience the fact that an example has some implicit pattern like the number of objects in array it's become such a strong signal that I almost give this its own bullet point called examples are contracts like if you give one expect to get something like it out of distribution is a really interesting way to sort of fight against I guess that inertia I would imagine it is better to do it that way I would just be careful of falling into this sort of more base layer trap I think so that's completely reasonable and I would endorse it I think this is just a more broad whatever example you put out there weird quirks of it will show up I I on an mcp server that I'm building I encountered this tag thing just yesterday and it really confused me no matter how much I was like used at least 10 tags it always was two and I finally figured it was because one of my examples had two tags so yes good strategy may or may not be enough to overcome these basic these basic caveats oh I do have examples or contracts I'm sorry it's 37 this one I think is one of the most interesting things on this slide errors are prompts so every response that comes out of the tool your your LLM doesn't know that it's it's like bad it's not like it gets a 400 or a 500 or something like that it gets what it sees as information about the fact that it didn't succeed in what it was attempting to do and so if you just allow Python in in fast and cp scase or whatever your tool of choice is to raise for example an empty value error or cryptic mcp error with an integer code that's the information that goes back to your LLM and does it know what to do with it or not probably it knows at least to retry because it knows it was an error but you actually have an opportunity to document your API through errors and this leads to some interesting strategies that I don't want to wholeheartedly endorse but I will mention where for example if you do have a complex API because you can't get away from that then instead of documenting every possibility in the dox string that that documents the entire tool you might actually document how to recover from the most common failures and so it's a very weird form of progressive disclosure of information where you are acknowledging that it is likely that this agent will get its first call wrong but based on how it gets it wrong you actually have an opportunity to send more information back in an error message as I said this is a kind of a not an amazing way to think about building software but it is the ultimate version of what I'm recommending which is be as helpful as possible in your error messages do go overboard they become part of as far as the agent is concerned it's next prompt and so they do matter if they are too aggressive or too scary it may avoid the tool permanently it may decide the tool is inoperable so errors really matter and I don't think this needs too much of an explanation but this is what it looks like when you have a full dox string and an example etc. Block in their blog post makes a point which I haven't seen used too widely although ChatGbT does take advantage of this in the developer mode which is this read only hint so the MCP spec has support for annotations which is a restricted subset of annotations that you can place on various components one of them for tools is whether or not it's read only and if you supply this optionally clients can choose to treat that tool a little bit differently and so the motivation behind the read only hint was basically to help with setting permissions and I don't know who here is a fan of dash dash yolo or dash dash dangerous disabled permissions or whatever whatever they're called in different different terminals but then you don't care about this but for example ChatGbT will ask you for extra permission if a tool does not have this annotation set because it presumes that it can take a side effect and can have an adverse effect so use those to your advantage it is one other form of design that the client can choose to provide a better experience with I've talked about this a bit now respects the token budget I think the meme right now is that the GitHub server ships like 200,000 tokens when you're handshake with it something like that this is a real thing and I don't think it makes the GitHub server automatically bad I think it's actually makes it endemic on folks like myself who build frameworks and folks who build clients to find ways to actually solve this problem because the answer can't always be do less in fact right now we want to do more we want an abundance of functionality and so we'll talk about that maybe a little bit later but respect for the token budget really matters it is a very scarce resource and your server is not the only one that the agent is going to talk to so I was on a call with a customer of mine recently who is so excited that they're rolling out mcp and I met with the engineering team and and just to be clear this is an incredibly forward thinking high performing massive company that I incredibly respect I won't say who they are but I really respect them and they got on the call and they were so excited and they were like we're in the process of converting our stuff to mcp so that we can use it and they had a strong argument why it actually had to be their API so that's not even the punch line of this story which is a whole other story in and of itself but it fundamentally came down to this they had 800 endpoints that had to be exposed to which I had this thought which by the time you finish reading this this is the token budget for each of those 800 tools if you assume 200,000 tokens in the context window so if each of those 800 tools had only this much space to document itself not even document itself share its schema share its name plus documentation this is the amount of space you would get and when you were done taking up this space because you were so careful and each tool really fit in this you would lobotomize the agent on handshake because it would have no room for anything else so the token budget really matters if this agent connected to a server with one more tool that had a one word doc string it would just fail it would just have a over effectively an overflow so the token budget matters there is probably a budget that's appropriate for whatever work you're doing you may know what it is you may not know what it is pretend you know what it is and be mindful of it in a worst case scenario try to be parsimonious try to be as efficient as possible that's why we do experiments like sending additional instructions in the error message it's one way to save on the token budget on handshake and the handshake is painful I'm not sure folks know that when and when an llm connects to an ncp server it typically does download all the descriptions in one go so that it knows what's available to it and it's usually not done in like a progressively disclosed way that is done outright yes absolutely okay so that's awesome let's let's talk about this idea for one second because it's a really interesting design there's a debate right now about what you can do that's compliant with the spec versus what you do that's not compliant with the spec and as long as you do things that are compliant with the spec then then by all means do them who cares one of the problems is that there are clients that are not compliant with the spec Claude desktop is one of them I mentioned it a few times I have a history with Claude desktop um Claude desktop ashes all of the tools it receives on the first contact and puts them in a SQLite database and it doesn't care what you do it doesn't care about the fact that the spec allows you to send more information I think your solution would get around this because it's a tool call but um many of the first attempts that people use to use spec compliant techniques for getting around this problem such as notifications fail in Claude desktop usually you failed before this in Claude desktop I'm not a fan of Claude desktop for mcp server I think it's a real missed opportunity because it is such a flagship product of the company that introduced mcp I think it's a real missed opportunity Claude Code is great um uh it it caches everything in SQLite database so it like doesn't matter uh what you do um techniques similar to what you've described where you provide mechanisms for learning more about a tool that's a great idea I really like that um there's a challenge where now you are back in a sort of flat arguments world because you have meta tools now where I need to use tools to learn about tools and use to tools to call tools in some extreme cases or beyond so you need to design this very carefully that's why it usually does show up as a dedicated product so thank you for sharing that there are many really interesting techniques for trying to solve this problem yes so you talk about progressive disclosure G use or um talk about skates so for example I connect to my triven entries over and my credentials only give me certain rights so therefore there are 28 tools that I don't have access to so therefore you don't need to tell me so when you say do I do I support that do you mean does mcp support that or do I in my products of what that? yeah now I'm just asking there's something I've read how for nothing I predict what the rest of okay so so the spec makes no claim about this the spec says when you call list tools you get tools back and how that happens is up to up to implementation fast mcp makes that an overrideable hook through middleware but again makes no claim on how that is prefix commercial products which I'm not here to pitch allow for tool masking on any basis and we see that as like a place to have an opinionated in the commercial landscape as opposed to an opinion in the open source landscape as opposed to the protocol which should have no opinion at all so if that's interesting we can chat about this you might be getting into this but if you take this problem this example like general management table or contents approach yes no approaches like you sort of look at a four different chunks or maybe 800 don't all justify having their own mcp server or what was that for them they can't do it they there's no solution that allowed them to have as much information as they wanted on the on the context window they have they didn't need it they didn't need it and it became a design question and frankly it was this call was probably four months ago now and it was just call after call after call after call like this which made me realize we need to have talks more like this and just talk about what it is to design a product for an agent my worry is mcp is viewed as infrastructure or a transport technology and it is and I'm very excited I think by a year from now we will be talking about context products as opposed to mcp servers I'm very excited about that we'll move past the transport but we need to figure out how to use it and so so I think that's how we talk about it the only other alternative that I have discussed with a few folks a few companies when you have a problem like this is if you control the client much more interesting things become available to you if you can instruct your client to do things a certain way for example if you have a mobile app that presents an agent to interface to an end user you control the client is what I mean by that or if it's internal and you can dictate what what client or what custom client a team uses now you can do much more interesting things because you actually do know a lot more about that token budget and how to optimize it but for an external facing server there's not a good there's not a good solution I think by now we have talked through all of this so I'll leave it for posterity in the interest of time we talked about curate as a key verb earlier in this talk it is I would argue what we have been doing in each of these little vignettes that we've been working through with the code we are curating the same information set down to one that is more amenable and more recognizable for an agent 50 tools is where I draw the line where you're going to have performance problems I think it seems really low to a lot of people some people will talk about it even lower than that some people might talk about it higher if you have more than 50 tools on a server without knowing anything else about it I'm going to start to think that it's not a great server the GitHub server as I think 170 tools does that mean it's not a great server no there's a good argument there and the GitHub team has put out a lot of really interesting blog posts on semantic routing that they're doing they had one just yesterday actually I'm like some interesting techniques they're using their software like like the one you mentioned a moment ago sir which which helps with this problem so having a lot of tools like that does not automatically make it a bad server but it is a smell and it does make me wonder can we split them on do you have admin tools mixed in with user tools could we name space these tools differently would it be worthwhile having two servers instead of one that is a little bit of a smell if you can get down to 515 that would be ideal I know that's not achievable for most people so it's one of those actionable but maybe not so actionable little tips it's an aspiration that you should have and just be careful unless you are prepared to invest in a lot of care and evaluation 50 tools per agent I should have said per agent if I have a 50 tool server and you have a 50 tool server that's a hundred tools to the agent that's where the performance bottleneck is not on the server sorry the slides should be corrected it's 50 tools to the agent is where you start to see performance degradation I love this Kelly Koleffel is someone who have known a long time you've at 5.3 now and while I was putting this talk together I happened to come across these two blog posts of his which are a little bit of like a shot and a chaser they're written almost exactly a month apart once from October once from november in the first one he talks about building up a 5 transfer and he goes from a couple of basic tools to I think 155 188 and in the second blog post he talks about how he curated that server from 188 down to 5 you could read either of these blog posts you could view them independently as a success story on what his adventure was in learning mcp I think taken together they tell a really interesting story about making something work and then making something work well which is of course the product journey in some sense and so where this where this takes us is sort of the thing that I sorry do you have a question oh sorry where this takes us is sort of the thing that I have found is the most like obvious version of this I wrote a blog post that went a little bit viral on this which is why I talk about it a lot which is please please just if nothing else stop converting rest API's and mcp servers it is the fastest way to violate every single thing we've talked about today every single one of the heuristics that we laid out about agents it really doesn't work and comma it's really complicated because this is the fast mcp documentation that's a blog post I had to write and the blog post basically says I know I introduced the capability to do this please stop and that's a really complicated thing that's that could be a workshop in and of itself I do bear a little bit of responsibility here this is not just a feature of fast mcp it's one of the most popular features of fast mcp which is why candidly it's not going anywhere and instead we're going to document around that fact but here's the problem right you just you you just can't you just can't convert it I'm not going to explain it you just can't convert it as if you guys in devs be served but comma but it is an amazing way to boot strap when you are trying to figure out if something is working do not write a lot of code where you introduce new ways to figure out if you have failed do start by picking a couple of key endpoints mirroring them out with fast mcp's auto converter or any other tool you like or even just write that code yourself make sure you solve one problem at a time and make the first problem being can you get an agent to use your tool at all once it's using it by all means strip out the the part of it that just regurgitates the rest API and start to curate it and start to apply some of what we've talked about today this this is just one of those candid things right it is the fastest way to get started you don't have to do it this way I start this way just don't end up don't ship the rest API to prod as an mcp server you will regret it you will pay for it a little bit later even though there's a dopamine hit up from so these are the five major things that we talked about today in our pseudo workshop workshop that wasn't really a workshop actionable talk outcomes not operations focus on the workflow focus on the top down don't get caught up in all the little operations don't ask your agent to be an orchestrator unless you absolutely have to flatten your arguments try not to ship large payloads try not to confuse the agent try not to give it too much choice I don't think I said out loud when we talked about that but try not to have tightly coupled arguments that really confuses the agent see if you can design around that if possible it's not always possible but if you can instructions are context seems obvious to say out of course they are there information for it use them as context design them as context really put thought into instructions the same way as you would into your tool signature and schema respect the token budget have to do it it's this is the only one on this list where if you don't actually do it you will simply not have a usable server the other ones you can get away with and frankly the art of this intuition is start with these rules and then work backwards into practicality but this is the only one where I think you can't actually cross the line and then carry recently if you do nothing else start with what works and then just tear it down to the essentials I I have been writing mcp servers about as long as anyone at this point a year and I still find myself starting by putting too many tools in the world sometimes because I'm not sure which one it will use or where I'm experimenting and I have to do I have to remind myself to go back and get rid of them and it's hard I think as an engineer especially designing normally p i's you're like okay like here's my tool here's d2 is backwards compatible right like and you keep you keep adding stuff and that's a really natural way to work and it can be a best practice and it doesn't work here you are it would be like using a UI that just showed a rest API to a user this is this is a criticism I have offered of my own products at times when I'm like this looks a little bit too much like our rest API docs right we're not doing our job to actually give this to our users in a in a consumable way so if I can leave you with just one but just one thought it's this you are not building a tool you are building a user interface and treat it like a user interface because it is the interface that your agent is going to use and you can do a better job or you can do a worse job and either you or your users will will benefit from that I think I think we are at our time so I'm going to just open it up for questions or what's next or what what other challenges we can solve I hope that I hope I found the I hope I walked the tightrope between things that are useful to you all but don't require you to write any code at 4.54 on a Saturday now but I hope I hope I hope I had some useful nuggets in there for you more than you more than you came in with and happy to take any question if there are any what are exams tightly qualified that would be where you have one argument that's like what is the file type and another argument that's like how should we process the file and your input to the file type argument determines the valid inputs for the other argument so they're now tightly coupled some some arguments on the second thing are invalid depending on what you said for the first thing it's just one extra thing to keep track of that's a good question sorry I didn't define that I hope to I will start with the first one when you're giving like an agent an entity server if up to like document the calls or the capabilities of the server in the server and in the agent it's like the idea yes so this this comes down to you to control the client or not if you control the client then this is a real choice and there are there are different ways to think about it so for example in some of my stuff that I write that I know I'm using for example code to access I might actually document my MCP server as files or plot skills because I know what the workflows are going to be I know that some of my workflows are in frequent and I don't want to pollute the context space with them so if you if you control the client you you have a real choice to make it if you don't control the client then you don't have so much of a choice you have to document it here because you have to assume you're you're working with the worst possible client um honestly many of the answers in MCP space boil down to do you control the client then you can do really interesting things on both sides of the protocol from a server author perspective you really do need to document everything in its doc string the one escape hatch is that you can document a server itself so every server has an instructions field it is not respected by every client I believe my team has filed bugs where we have determined that to be the case um so hopefully that's not a permanent thing but most clients will on handshake download not only the tools and resources everything but a instructions blob for the server itself how much information you can put in there I'd be careful I don't think it wants to read a novel but you do have this one other opportunity to document maybe the high level of your server oh yeah well why don't we let's mix it up and we'll come back did you have a question yeah I'm pretty I'm not a member of the core committee but I'm in very close contact with them so maybe I can answer your question I'm so excited about this yeah yes this I know a lot about it's gonna it's it's it's it's going to expand it's not actually gonna change so much because of the way it's implemented um uh what question could I answer like what is it am I excited about it I am excited about um so all the rules still apply that's if that is a fantastic question let's talk about this for one second um some of you I don't know if any of you were at a meetup we hosted last night where my colleague actually gave a presentation oh you were yes that's right I was like I know at least somebody's coming um my colleague Adam gave a very good talk on this which I can watch out after this I'll send you a link to to a wording of it um but the nutshell version is this is this is uh sep 1686 is the name of the proposal and it adds asynchronous background tasks to the mcp protocol not just for tools but for every operation um and we don't need to talk about too much about what that is the reason it doesn't involve changes to any of these rules is um this is essentially an opt-in mode of operating in which the client is saying I want this to be run asynchronously and therefore the client takes on new responsibilities about checking in on it and and polling for the result and actually collecting the result but the actual interface of learning about the tool or calling the tool etc is exactly the same as it is today so this is fully opt-in on the client side um and that's why from a design standpoint nothing changes the only question from a server designer um standpoint is is this an appropriate thing to be back-rounded as opposed to be done you know synchronously on the server um or sorry let me take that back you can background anything because it's a Python framework so you can chuck anything in a Python framework the question is should the client wait for it or not should it be a blocking task is really the is really the the right vocabulary for this um and that's a that's just a design question for the server maintainer is that am I in the the zone of what you were looking for? oh no kidding area yes this happens a lot actually and but until you said this I didn't think of it as like a pattern but I've seen this a lot it's a real problem yeah maybe we'll write a we'll write a book blog post on it that would be fun um yes expect to say the will still fly but as far as uh elicitation is concerned how do you do that? Elicitation is really interesting so um now we're in advanced mcp elicitation anyone not familiar with what that is yes so elicitation is basically a way to ask the client for more input halfway through a tool execution so you take your initial arguments for the tool you do an elicitation it's a formal ncp request and you say I need more information and it's uh structured is what's kind of cool about it so the most common use case of this in clients that support it is for approvals where you say I need a yes or no whether I can proceed on maybe it's some irreversible side effect or something like that um when it works it works amazingly again it's one of those things that doesn't have amazing client support and therefore a lot of people don't put in their servers because it'll brick your server if you send out this thing and the client doesn't know what to do with it so you got to be a little bit careful does it change the design? It's a fantastic question I wish it were used more so I could say yes and you should depend on it if all clients supported it and it was widely used and the reason all clients don't support this one by the way I'm not trying to it's not like a meme the clients are bad it's complicated to know how to handle elicitation because some clients are user-facing then it's super easy just ask the user and give them a form some clients are automated some are background it somewhere and so what you do with an elicitation is actually kind of complicated if you just fill it in as an LLM maybe you satisfied it maybe you didn't it's a it's a little tough to know so if it were widely used I would say absolutely it gives you an opportunity to put in particular tightly coupled arguments into an elicitation prompt or confirmations a lot of times you'll see for destructive tools you'll see confirm and it'll default to false and you're forcing the LLM to acknowledge at least as a way of you know hopefully tipping it into a more sane operating mode elicitation is a better way to design for that I didn't I don't think that made it into this and any of these examples so great question wish I could say yes I hope to say yes about that you're a second question yeah so so in my job the main thing I do is it's build agents and I do like Sangra or Popo Nihai, TG or like that and I usually just like write the tools and the tools coming to APIs and I know like really see the the news for the MCPC in the end that space to do agree that the MCPC are like I do I do I I think I would not I would not tell you to write an MCPC server yeah I think that within a year the reason you would choose to write MCPC server is because you'll get better observability and understanding of what failed whereas the agent frameworks are not great because part of the whole agent frameworks job is to not fail on tool call and actually surface it back to the LLM and similar to what we're talking about a moment ago so you often don't get good observability into tool call failures some do but not all and so one of the reasons using MCPC server even for a local case like that is just because now you have an automatic infrastructure so you can actually debug and diagnose and stuff I don't think that's the strongest reason to do it I think that's going to be in a year when the ecosystem's warmer chair I think if you are if you fully control the client and you're doing client orchestration and you are writing if you are writing the agent loop and you're the only one do whatever you want I think that all of the advice you're going to lay also applies when you're building tools it absolutely does this is this is yes everything we said today applies to like a Python tool absolutely and that's I mean that's how fast MCPC treats it it's a good question any last questions I'm happy to yes yes yes um so code mode is something that anthropic Claude flare actually blogged about first and then anthropic followed up where you actually ask you solve some of the problems I just described here you ask the lllm to write code that calls mcp tools in sequence and it's a really interesting sidestep of a lot of what I just talked through here um the reason that I don't recommend it wholeheartedly is because it brings into other other sandboxing and code exit like there's there's other problems with it but if you're in a position due to it can be super cool um I actually have a colleague wrote the data came out he wrote a fast mcp extension that supports it which we put in a package somewhere we didn't we at first didn't want to put in fast mcp main because we weren't sure fast mcp tries to be opinionated and we weren't sure how to fit that in and then actually it was so successful that we decided we're going to add an experiments flag to the CLI and have it but I don't know if it's in yet hmm yeah this will go into this new I forget if we called it experiments or optimize it's it's on our roadmap right now and this would this would go in there um and then there's like a whole world right now of optimizing tool calls and stuff but I I would like to be respectful of your time and allow you all to go back to your your lives you're very kind to spend an hour talking about mcp's with me I'm more than happy to keep talking if anybody has has questions but I I would like to read you all from the conference I hope you all enjoyed the talk and thank you very much for attending

Feedback / ReportSpotted an issue or have an improvement idea?