Skip to main content

Your Insecure MCP Server Won't Survive Production — Tun Shwe, Lenses

TL;DR

  • Designing and securing Machine Coded Protocol (MCP) servers for agent AI systems are intrinsically linked disciplines, where poor design choices directly compromise security.
  • Transitioning an MCP server from a local development environment to a remote, production-grade deployment creates a "security cliff," demanding comprehensive authorization and infrastructure security measures all at once.
  • Enterprise-grade MCP security requires moving beyond simple API keys to advanced OAuth 2.1 flows like Client ID Metadata Document (CIMD) for dynamic client registration, coupled with granular Role-Based Access Control, data masking, logging, and end-to-end observability.

Takeaways

  • Badly designed MCP servers are inherently insecure; design and security considerations for agent AI interfaces are mutually reinforcing.
  • Agents differ from humans in discovery, iteration, and context, creating unique security vulnerabilities such as tool poisoning, data leakage, and context injection or oversharing.
  • Follow five key design principles to build secure MCP servers: 1) Shrink the attack surface by consolidating operations, 2) Constrain inputs at the schema level to avoid injection flaws, 3) Treat documentation as a defensive layer to prevent tool poisoning, 4) Return only the minimum data an agent needs to prevent oversharing, and 5) Minimize the blast radius by scoping permissions granularly.
  • Deploying MCP servers to production with HTTP transport introduces immediate, complex security requirements like token management, CORS, TLS, and rate limiting, which are absent in local standard IO mode.
  • Traditional OAuth client pre-registration is impractical for the unbounded nature of MCP clients and servers; Dynamic Client Registration (DCR) offers a solution but has vulnerabilities.
  • The Client ID Metadata Document (CIMD) approach is the preferred method for secure dynamic client registration, allowing authentication servers to verify client identities via public URLs and selectively allow/deny access.
  • Achieving enterprise-grade security for MCP extends beyond OAuth to implementing Role-Based Access Control (RBAC) at the tool and resource level, data masking for sensitive information, detailed logging for compliance, and end-to-end observability and tracing of agent actions.

Vocabulary

MCP server — A server implementing the Machine Coded Protocol, designed for machine-to-machine communication, often used by AI agents. Agent AI — Artificial intelligence systems designed to autonomously perform tasks, make decisions, and interact with other systems. Tool poisoning — A security vulnerability where malicious instructions are embedded into tool descriptions, which an AI agent may execute without visible warning. Context window — The limited amount of memory or token capacity an AI agent has to process information and make decisions. Attack surface — The sum of all points where an unauthorized user can try to enter data to or extract data from an environment. Schema level — The structural definition of data, used to enforce validation rules and data types for inputs and outputs. OAuth 2.1 — An authorization framework enabling applications to obtain limited access to user accounts on an HTTP service. Dynamic Client Registration (DCR) — An OAuth specification that allows OAuth clients to register with an authorization server at runtime, rather than requiring manual pre-registration. Proof Key for Code Exchange (PKCE) — An OAuth extension that mitigates authorization code interception attacks, especially for public clients without a client secret. Client ID Metadata Document (CIMD) — An OAuth extension that allows clients to expose their metadata via a public URL, which the authorization server can fetch to verify client identity during registration. Token Exchange — An OAuth flow (RFC 8693) where a token is exchanged for another, often to obtain a more granular or scoped token for accessing a specific resource. Role-Based Access Control (RBAC) — A security mechanism that restricts system access for users based on their role within an organization. Data masking — The process of obscuring specific data with altered or random characters to protect sensitive information, while maintaining its structural integrity. Observability — The ability to infer the internal state of a system by examining its external outputs, critical for monitoring and troubleshooting complex AI agent interactions.

Transcript

Hey folks, thank you for joining us for this session on why your insecure MCP server won't survive production. My name is Tune Shui and I lead AI at lenses and day-to-day I'm an AI engineer and you can connect with me here on LinkedIn. And I'm Jeremy Fene, I work on AI engineering at lenses. First a quick note on where we work. Lenses is a data operating fabric that sits between your agents and any combination of Apache Kafka. Lenses is the de facto streaming data layer for providing trusted real-time context to agent AI. Companies work with us because we have governance, security and large scale at the top of mind. Here are a selection of our customers which gives us exposure to lots of different industry use cases at large scale. And we are here today of course because we have an open source MCP server that we're applying our learnings to from the field. So please give us a start to follow the project. And here's what we'll cover in this session. There are takeaways we want you to have are the ways of thinking about designing MCP servers and in fact any interface to make it robust for agent AI systems. And since this is a talk about MCP, we'll ensure you have tips on how to approach securing your MCP servers for production. I'll cover the first few sections and in his sections Jeremy will go over the old flows. So let's go straight into why most MCP servers aren't great. I like the way Jeremy Lohan who is the creator of Fast MCP put it. He said that agents deserve their own interface that is optimized for their use cases to approach designing for agents through a product engineering lens. I want to take that approach one step further. A badly designed MCP server is also a badly secured one or design and poor security compound each other. Jeremy put forward three dimensions in how humans and agents differ from one another and to consider these three dimensions when you're designing for MCP or any agentic interface. The extra layer I wanted to emphasize is security that each one costs a security shadow. First, this discovery. When you use a new API you pull up the docs, you scan through them once, you find the three endpoints that you need and you never look at those docs again. An agent can't do that every time it connects to an MCP server it enumerates every single tool and reads every single description and that's expensive in tokens. But here's the security set shadow. Every one of those tool descriptions is a surface for tool poisoning attackers can embed hidden instructions inside descriptions that are invisible in the UI but the model will follow them without question. More tools means more surface area for injection. Second is iteration. If your script fails you run it again, it takes a second. When an agent retries it sends the full conversation history over the wire. And here's its security shadow. An agent iterating over a poorly-scoped MCP server is broadcasting your data with every retry. The full conversation history goes over the wire including any sensitive data returned by previous tool calls. Each round trip is a chance for data leakage. Third, context. You and I have decades of memories and experiences and intuition. An agent has roughly 200,000 tokens and that's it. The security shadow is detailed in OOSPS MCP top 10 list which I recommend you all to go and read and there it's listed as number 10 context injection and oversharing. If your server dumps unfiltered data into that limited window, your handing off PII credentials, internal system details to a model that can be tricked into exfiltrating them. An agent has to load all the context in before it can make a decision. It makes it suitable for finding specific things but it comes at the cost of latency and context plate. So you think of it as finding a needle in a haystack. If some of that hay is poisoned, the agent just won't notice. So you should think about curation, curate the MCP tools available to the agent and aim to expose the smallest amount of information. The less you expose, the less can be attacked and here, less is more. Next I'll go over what I consider five key rules for secure agentic design. To think with your product engineering hat on and to apply it to MCP service. The thing I want you to take away from this section is that good MCP design and good MCP security are the same discipline. If you get their design wrong, no amount of OAuth will save you. I've got five principles here and they all give you protection against the OOSPS MCP top 10 before you write even a single line of auth code. So number one, shrink the attack surface by design. In terms of outcomes, the idea here is to squash all the fine grain operations or underlying API calls into a single course grain operation that produces a desired outcome. Every tool you expose is a door. Don't give the agent access to delete users when all it needs is to check an order. Consolidate related operations behind a single tool call with a well defined outcome. So you have one permission check, one audit log entry, one place to enforce authorization. So think fewer doors with fewer locks to manage. Number two, constrain your inputs at the schema level. You've got to accept the top level primitives like the enums that will be the best approach dictionaries are also fine as long as they're not nested and to introduce more strictness you could use a typing library like pedantic. The aim is to reject freeform nested payloads to avoid command injection floors where the root cause is almost always unconstrained string arguments that get past and stream to a shell, a query engine or an API. Constraint inputs are easier to validate and harder to exploit. Number three, treat your documentation as a defensive layer. Tool poisoning is number three on the OSP MCP guide and it works by embedding malicious instructions into descriptions that are invisible in the UI by executed by the model. If you don't write clear complete instructions and attacker control tool description in a neighboring MCP server can shadow yours. If your documentation is complete and unambiguous for every tool, it crowds out the space that a poisoned neighboring server would try to fill. Number four, return only what the agent needs. Oversharing data into responses is number 10 in OSP MCP guide and it turns the agent's context window into a liability. PII internal identifiers credentials system details all sitting in the context it they're all just one prompt injection away from exfiltration. So strip your payloads to the minimum if the agent doesn't need a piece of data for its immediate task, then don't return it. And number five, minimize the blast radius scope permissions at the tool and resource level, not the session level use the MCP read only annotation for non destructive tools so that clients can enforce boundaries. Or if an MCP tool is intended to have read only access, then consider turning it into an MCP resource. Also remember that every tool you remove is an attack vector that you eliminate and you're building an interface, not a tool. So this is the mindset to go in with an agent will use anything you provide it with confidence so you have to provide that for us. So now you've designed your server well, you followed the five principles, now you need to actually deploy it. And this is where most teams hit while I call the security cliff. If you're running MCP in standard IO mode, life is pretty comfortable. It's a local process, a single user, no network exposure, no authentication needed your MCP host talks directly to the server process on your machine. It's a world garden and it works beautifully for a single player developer productivity, but production requires something completely different. You need the streamable HTTP transport. This enables remote deployment multiple clients connecting to the same server. You can horizontally scale and you can centralize your governance. And this is really where MCP becomes genuinely valuable to an organization where you go from one developer on one laptop to a shared capability that an entire team or entire fleet of agents can use. MCP becomes the single interface that all clients can use without having to worry about whether they're the latest version of an API or considering the resources needed to scale. The problem is there's no gradual unramp. You go from zero security surface to a huge list of concerns all at once. You're suddenly needing a lot of token management, cause configuration, TLS rate limiting and more, and you need it all at once. So there's no halfway house because you can't do a little bit of production. You're either behind the wall or you're sending out in the open. And you can't just stay local and hope for the best stack lock ran low tests on standard IOT transport and the results were brutal 20 out of 22 requests failed with just 20 simultaneous connections. Standard IO falls over the moment you had concurrency. So if you want to scale out, you have to cross the chasm. And how do you start crossing that chasm? I'm going to hand it over to Jeremy to continue. Yes, so implementing it all for the organization server for MCP isn't that simple. Let's look at the list of RFCs to implement with the core of flow, the off client discovery and metadata and the management of the token lifecycle. We already have more than 10 specifications to implement. Now, let's say we read all these RFCs and I'm ready to implement an authorization server for MCP. What does the enterprise grade authorization look like? So let's start by reviewing the local versus remote MCP server setups and their respective off flows. Tune talked about the world garden, the local MCP server running of a standard IO with an API key. Let's look at the flow diagram. The MCP server runs on my machine. The client connects via sundaleio. The user must set the key as a parameter in the MCP client config. And the parameter will be stored as an environment viable passed by the MCP server with this request to the external service. That might be good for local setups, but I need to provision store and maintain the key. This key is long lived, it's rarely rotated and it isn't scope to the specific actions that my client, perform. Even worse, these keys are often shared across systems. So the key is stored in a config file, an environment viable and it isn't verified by the MCP server. Now, let's look at a remote MCP server. In this case, the MCP server runs on the remote server. The client connects via HTTP. The user must set the key in the HTTP authorization header. Again, we can see the MCP client config here on screen. So phase one is the generation of the token and the configuration of a client on step two runtime. We can see the client performing a request attaching this API key in the authorization header. This API key is validated or not by the MCP server itself and will be passed through to the upstream API where it will be this time verified. Whether the API key is validated or not, we get a 200 response or 401 response in which case the user will need to rotate the token manually. That's our majority of remote MCP servers are configured today. The key is long lived, it isn't scope to the specific action of my agent's either. The key is stored in a config file and it isn't always verified by the MCP server. Either the key is simply passed through to the API creating a confused IPTV nervousity where malicious clients obtain authorization without the proper user consent. Sometimes the key might be mapped to another key and token for the API access itself. Now we have single share credential serving many users, but credential is even more powerful harder to revoke by user and if leaked, it compromises everyone. This approach works for long lived and scope credential setups. It still represents more than 50% of the MCP servers out there. What we see the ecosystem moving towards is short lived scope tokens via of 2.1. We even see token exchange for this privilege access. Traditional oath assumes you know your clients upfront. You register them in a developer portal. You get a client ID and you move on. This works when you have 5 to 10 apps connecting to your service. But with MCP, this flow breaks completely. Think about what MCP's architecture actually looks like. Any client, Clodestop, Cursor, VS Code, a CI tool, a random agent can discover and connect to any MCP server at runtime. Pre-registration requires too much effort in a highly viable setting. It's an unbounding number of clients connected to an unbounding number of servers. You can't ask every developer to manually register their app with every MCP server they might ever want to talk to. So that's where the dynamic client registration comes in. In this case, we still have an MCP server running on a remote server, but now it's protected by an OO authorization server. The client can self-register itself against authorization server and we get a new client ID on every registration. So on phase one, the discovery or MCP client in this case, cursor will perform a request on slash MCP against the MCP server. We can see the MCP server returning a 4-1 response because we do not have a token to pass yet. But it also passes the very weak on TK header containing the resource metadata that can be used by your client in order to discover the MCP server and its metadata. The document itself looks a bit like this. It describes the resource portraying success and the authorization server protecting it. This lets our client point at the authorization server itself and discover this time the metadata exposed by the authorization server itself. Navato client knows how to authorize itself for an MCP server access. It needs to register itself against the authorization server. That is done via a post-request on slash register. As we mentioned earlier, the authorization server will generate and persist on this a new client ID and return it to the client. Now we know who we're talking to. Next, it's time to authorize our client against the authorization server. And for that, the MCP spec is modating to use the PICC, the proof key for a code exchange protocol. So our MCP client is first generating a code verifier and a code challenge. But it does pass through a request to slash authorize in order to obtain an authorization code. Authorization server will validate this request and the code challenge. And since we don't have a running session yet for this user, it will redirect the user to its identity provider. So that's your single sign-on form in order for the user to log in upon successful login. The user will be redirected to a constant page where they can grant different scopes to their client. Now that we issued a valid authorization code for client, it's time to use it in order to get a token, an access token. Does that by sending a request on slash token and passing the authorization code and the code verifier that we generated for the PICC protocol earlier? Authorization server will validate the PICC challenge and the authorization code. And he will then meet a brand new token. In this case, we are using Jason Web tokens in order to return an access token for client to now use the MCP server. The final step is actually to use the MCP server. This is when your MCP client is going to perform a tool called, for example, we can see it will pass the access token which just issued the authorization as the bearer value. Our MCP server will validate this token, check the valid scopes, and it will now perform the token exchange flow in order to change this delegation token for a session token. This means our MCP server now is actually an OV client for a new resource server or API. But it's using the exact same authorization server in order to get a token. So that is the token exchange flow that is defined in RFC 8693. And as we complete the flow, our MCP server can use this new session token in order to perform an API call by passing the token in the authorization header. So DTR serves the self registration, the dynamic registration of the client, so that all user doesn't have to go and pre-register, pre-generate static credentials and set it on that client. But it does have its own problems. First, every time a user connects a client to an MCP server, a new registration is created. Registrations are not portable, so using Claude on Windows and then on macOS creates two distinct client registration. DTR is vulnerable to phishing attacks because it doesn't provide a reliable way to verify client identities. Anyone can post to that endpoint, the slash register endpoint, including attackers. Finally, the server is just trusting whatever metadata the client self serves. It means a malicious client can claim to be clothed and the server has no way to know otherwise. So the MCP community had to come up with a better way to let clients self register. And that is CIMD, the client ID metadata document. Here in this case, we still have no authorization server in front of our MCP server, but the client owner exposes the client ID on the public URL. This will let our MCP server fetching the client ID during the authorization. Let's have a look at the diagram. So phase one is still a discovery or client is the MCP server without a token gets a 4-1 response and the resource metadata URL. It can follow this URL, discover the MCP server and it will get to discover the authorization server. But this time the authorization server isn't mentioning it needs a slash register request. It means the client, the MCP client can go straight to the authorization phase. We generate again the PC code verifier and we perform a slash authorize request. But this time, our client passes its unique ID and we can see it here. It's actually a valid URL where the metadata for the client is being exposed. This is our authorization server fetch this metadata and register a new client with a unique ID. That is the URL that is exposed by the client owner and we can move to the authentication phase. Again, the authorization server will redirect to the identity provider, wait for a valid login on our user side, present a constant screen for the user to grant some scopes. And we are ready to issue the delegation token and the session token or token used by the MCP server. So here, CIMD has no growing database of client registration to maintain. Proving that you control HTTPS, claude.ai is meaningful, unlike proving that you can post on the registration endpoint. They redirect your eyes that are explicitly bound to the client in its metadata document or making it harder for attackers to sneak in malicious callbacks. And the authorization server can selectively allow all deny clients. So in summary, DCR is a good sort, but it does create problems. CIMD is a leap forward and it is the preferred approach since November 2025. But becoming Enterprise Grade requires adding other layers of security and confidence. For permissions, all scopes get you part of the way there, but it's coped to the session. True Enterprise Grade role-based access control means scoping permissions at the individual tool and resource level, not just the session. Data masking is how you deal with the PIA fields such as email, phone and national insurance numbers. They may need to be masked before the agent sees them because agents should never be exposed to data, but they have no business handling. You will need to log what's happening in each interaction, which agent called which tool with what parameters and what data was returned. For compliance with regulations such as the EU AI Act, regulators will expect this level of transparency and detail for autonomous AI systems. Finally, you'll need to be able to observe the full request. This means the client requests validation, tool execution, data retrieval and the generated response. If you cannot trace what an agent did and to end, you cannot govern it. Tracing for agent TKI follows the same principles as distributed system of stability, but applied to autonomous decision making. Thanks very much Jeremy, and thank you all for tuning in to this session. We'd love to know how your journey with productionizing MCP services is going. So please leave us a comment or send us a message where to find us and please do check out our MCP server and give us a star. So hopefully we'll see you again soon. Thanks and bye. Thank you.

Feedback / ReportSpotted an issue or have an improvement idea?