Architecting for model step-changes: A fireside with Vercel's Guillermo Rauch

Vercel's mission has evolved from optimizing developer experience for web frontends to building "agentic infrastructure" and empowering AI agents as new "superpowers" for creation.
The company champions an "AI software factory" approach internally, where teams use and build AI agents to automate workflows and generate new tools, significantly boosting productivity.
Future interactions with AI are expected to shift towards less human-supervised, asynchronous agents that can autonomously complete complex tasks, potentially leading to the emergence of "autonomous companies."

Focus on Agentic Infrastructure: Vercel is dedicated to being the leading platform for deploying agents and transforming the Claude into an agent that can self-heal, optimize, and configure.
Cultivate an "AI Software Factory" Culture: Encourage internal teams to leverage AI agents (like V0 and Claude Code) to build their own tools, automate processes, and enhance productivity by providing resources and a safe environment for experimentation.
Prioritize Agent Ergonomics and Developer Experience: Design interfaces and workflows specifically for building and interacting with AI agents to minimize friction between an idea and its realization.
Simplify Architectures with Smarter Models: As AI models become more intelligent, re-evaluate and simplify existing codebases that might contain compensatory logic (e.g., auto-fix pipelines) for previous model limitations.
Empower Agents with Sandboxes: Provide isolated "sandboxes" for agents to execute code, test hypotheses, and problem-solve creatively, ensuring security while allowing for emergent behavior.
Arm Agents with Human-Like Tools: Equip agents with CLI tools and browser capabilities (e.g., agent browser) to inspect outputs, debug, and learn, fostering a less micromanaged approach to agent supervision.
Anticipate Asynchronous Agent Interactions: Prepare for a future where interactions with agents are predominantly asynchronous, with agents operating based on defined "exit criteria" and reporting back upon task completion.
Treat AI Tokens as Core Infrastructure: Recognize AI model tokens as a fundamental new "raw material" for building, and empower all employees—regardless of role—to creatively use these resources to shape new solutions.

Developer Experience (DX) — The overall quality of the experience a developer has when interacting with a product, platform, or tool. Agentic Infrastructure — Claude infrastructure specifically designed to support the deployment, management, and scaling of AI agents, potentially becoming an agent itself. AI Software Factory — A paradigm where AI agents are used to build, automate, and accelerate the entire software development and deployment process, often creating other tools. V0 — Vercel's AI product, an AI-powered design and code generation tool for front-end development. Claude Code — Anthropic's AI agent tool, mentioned for its ability to automate computer operations and code generation within a sandbox. Sandboxes — Isolated, secure environments where AI agents can execute code, test hypotheses, and perform actions without affecting the host system. AI Gateway — A service (like Vercel's) that acts as a proxy or CDN for AI model tokens, aggregating usage and providing management features. Evals — Short for "evaluations," a systematic process of testing and benchmarking AI models to assess their performance, capabilities, and improvements. Exit Criteria — Specific conditions or outcomes that, when met, signal that an asynchronous task or agentic process is complete. Autonomous Companies — A concept where AI agents perform most or all operational functions of a business, with minimal human supervision, akin to human board members overseeing a CEO.

Welcome to stage CEO of Versel, Guillermo Rouch, and head of product, Cloud Platform, out of Anthropic, Angela Jang. Gee, thanks for joining us. It's great to be here, IRL. Yes, G is sent me many requests for things over text. So now we can do this in person in front of a bunch of people. All right, folks. So I'm Angela. I'm from Anthropic, and everyone knows G. Incredible, incredible, I think. Thinker, builder, creator of some of the most popular technologies. And I think been really incredible to actually see Versel become, I think, a place and a provider of technology that almost so many startups and so many creators and builders have come to to actually build their agents, build their products, experience a lot of the kind of more, I think, like, AGI-pilled types of experiences out there. I actually would love to maybe kick us off with your point of view on how you've seen Versel transform. And I know you've publicly talked a little bit about, or actually a lot about, how AGI is super transformative to technology, super transformative to the way that we build these things. And maybe give your perspective on how Versel sees that and how you take that in internally as well. Yeah, it's interesting because when Versel was born, the idea was to remove any friction between an idea and bring it online. Love that. And the tools that we had to do that were, I mean, maybe to summarize it, it was developer experience. Many people credit us with introducing the developer experience obsession to cloud infrastructure. And the mission was to just make it more generally available. Like if you only knew front end, for example, React was the bet that we placed for the most part of the time. If you could only learn JavaScript and React, now maybe you can wield infrastructure that before that was only available to the massive mega manga or fan or whatever you call them this days. Those kind of massive corporations. And fast forward to today, I think we're still focused on this idea of bridging idea to reality, but we have these new amazing superpowers. It's like you've been playing an RPG game and super weapons dropped. And that's agents and AI. So I spent a lot of my time this days thinking about agent ergonomics and the developer experience for agents. And we're now living in a reality where that group of people that it could deploy to the Claude is infinitely bigger, right? With tools like VZero and Claude Code, you know, you hear every single day of your life now that everybody can ship. So it's been an accelerant of our mission, but it's really changed how we think about building. And the thing that we're narrowly focused on building now is what we call a genetic infrastructure. So being the best partner infrastructure to tools like Claude Code, helping you deploy agents, and then also turning the infight self into an agent. Meaning imagine if the Claude itself can self heal, self optimize, self configure, and so on. That's incredible. I think one of the things that really stuck with me when some of the concepts that you've kind of described publicly has been the idea of sort of a AI software factory. And I know VZero has been, many of us know you for incredible products that you put out there for builders, but also even internally. I think you guys are incredibly agent-pilled and very, very fast-paced and very innovative in that space. And we'd love to kind of hear how do you guys create agents internally, even for yourselves in your own workflows? Yeah, we were very agent-pilled because we lived through the experience of using agents and becoming more productive ourselves. One of the core thesis of our product development philosophy is we try to think, we try a lot of things internally at Versel, and whenever something works, we become big advocates. We build around it. So I remember, for example, when I first got access to Claude Code, preview release, Mikey invited me to it. And I grew very strong conviction that it wasn't just going to be, for example, front-end engineering. It wasn't just going to be auto completion on a code editor that this was fundamentally a new way of even automating your computer and your operating system that the CLI was a very beautiful layer of abstraction to sort of agentify everything. And what I did as I told the entire company, unlimited token budget, go and use all of these tools. And one of the things that's happened since then, that I think has surprised me. It has formed and crystallized this idea of the AI software factory is that people have used tools like Claude Code and V0 to build their own tools. Yeah. For example, when Ralph Wigham became a thing, I remember walking past one of our engineers that are off isn't he was like, and I was like, what is that? What is that software that you have running? I was like, oh, I created my own sort of little AI coding environment. So I experienced that this idea that if software development costs go down, people are going to reassemble all of this blocks and give birth to new tools and new ways of scaling their own productivity. And then I saw that with our design team. I've made this public on X, which is been kind of astonishing. We have an internal tool called Leap, where designers at Versailles decided, I'm going to stop just reacting to Slack requests for design. I'm going to instead put out a tool, an internal tool, that automates a lot of their work and produces beautiful artifacts of work. So we kind of have now a design factory of sorts. And earlier this week, we kind of open sourced our security engineering factory, or I call it the QA and security check of the production line of software. And so we're now firmly in this world in which not only are you you can just ship software, but now you can also create the most ergonomic tools possible for your own team. And I call this sort of the software factory. That's incredible. And I'm really curious, just kind of follow up on that. If you see a lot of this kind of creativity internally as a byproduct of the culture that you've created, or do you see it as just the fact that all these tools are so easy to reach for. And so really just the imagination of all these folks is what's driving it. Yeah, I think part of it is cultural, because we're all, everyone that comes to our sell is highly motivated by this mission of creating the best possible tools in the world. That's awesome. And I can see a world where the best possible tools in the world today are not involving agents in some way, shape, or form. For example, for DeepSec, the idea that it's not even just a tool where you are making your own code review faster. But the fact that you can spin up sandboxes in the Claude and automate your work to a degree that you had never would have been able to do by hand. We're talking about running thousands of sandboxes in parallel, testing out different hypotheses. So I think there is the intersection between developer experience and the fact that Versel builds agent again for structures. So I think people are finding creative ways to use this in front. We're talking about sandbox, workflows, AI gateway. We encourage people to test out multiple models. And so I think, if I were to share the recipe, I think a big part of this is creating a safe sandbox environment where people can create and deploy their own tools. And obviously, this is the perfect partnership between, for example, Claude Code and the Versel plugin. If I just gave you an amazing agent again, engineering tool, but the code lives and dies in your own computer. And there's no mechanism to share it with your colleagues. There's no mechanism to deploy it securely to scale it. Then you're just kind of like dying at the prototype stage. But anyone that touches these tools wants to see their ideas come to life and hit production. Yeah, completely agree with that. And super big fan of Versel sandboxes. All right, well, maybe we can take a step back in time a little bit. And I'm curious if you can take us back to the day, Opus 4.5 dropped. V0, I think known for incredible speed, beautiful production experiences. And you guys built in support the same day. Really wanted to get a sense for you of what does ready really mean for you as you build in the months ahead to make that day one experience so possible, so quick for users. Yeah. I have a lot of lessons learned from the astonishing success of just upgrading the model, which obviously just upgrading the model I'll go into detail is not as easy as it sounds. But to give you some context, Versel AI Gateway, sort of this, I call it the CDN for tokens. And it aggregates a lot of the token usage of the millions and millions and millions of Versel customers. And Opus tokens represent something like 20 something a percent of usage on AI Gateway. But they're actually upwards of 70 something percent of all spent on the AI Gateway. By looking at the data that I have from so many different customers, the lesson that strikes out to me is people really chase the best intelligence they can find. When Opus came out, we had this question of what is going to be the V0 default model? How quickly do we upgrade? Do we give it to everybody? And so it was a combination of one running the evals to understand the product is getting better. One of the surprises with Opus was actually that we could simplify the code base. We had added and documented a pretty extensive pipeline that did a lot to enhance intelligence of previous versions of models, including, for example, we had a step of looking at the output and performing auto fixes. Because earlier versions of models were making a lot of syntactical mistakes. And they would produce errors in the application, or they would slow down the agent loop. And so one of the things that was striking to us is because this model is smarter, we could simplify the architecture of the product. The other thing was just superior taste. So I remember giving a talk to the company of like, look, anytime I see AI get better, I update priors. And one of the priors that I had to update was, can models actually produce tasteful outputs? And I think it was probably a fix in the model from like purple outputs to darker, versatile outputs. But we also found that the model was very malleable. So one of the value ads that we found with Vizero is that we can really infuse a lot of the best practices that we've learned over the last 10 years of design, of our own aesthetics, of what makes for a better product. And so we kind of added that onto the model. And then we do a lot of AB testing. So one of the capabilities of Vercels infrastructure is, we make flags and experiments a first class citizen of the platform. So we're always testing things out. And so I remember ramping up Opus really fast because the results were astonishing. I believe that since the beginning of the year, since our most recent anthropic model upgrade, this is actually kind of an astonishing fact. At credit spend on the product is up by 2x. Wow. And this is a signal of not just like, oh, the outputs are amazing or it's more beautiful outputs, et cetera. It's also that you can go a lot further with more intelligent models, meaning that the ambition of every creator is to make their product as complete as possible. Production ready, secure, high quality, bug free, but also full stack. And so as we've been collaborating with you all and rolling out model upgrades, we've seen that the products that we create can become more complete and more ambitious. That's awesome. My team loves VZero. They actually actively go to VZero first to get like a beautiful experience. They'll iterate there before they kind of bring it down. One third of our signups, total signups, and now come from VZero because it's really opened and widened the aperture to who can build or at least who can get started building. And nowadays, we find that at any tech organization, the developer adjacent personas have seen a ton of upside because in the past, they really weren't able to contribute directly to software. Now it's everybody in the company can say, here's my proposal or here's my improvement or here's the tool that I'm building. Yeah. You mentioned this a little bit about the models with more intelligence. You're able to go a bit further, have with this kind of completeness in your product. And you've been talking about sort of cleaning up your code a bit to allow the model to sort of almost like breathe a bit. I'm curious from maybe the VZero perspective or alternatively other agents that you've built. Where have you kind of over engineered for maybe current generation of models? Maybe with some examples in that area and how you kind of see that maybe changing or improving as future generations of models come forward in order for you to kind of get sort of that more completeness to the end user. Yeah. I think there's been a number of things. One is how many tools we would give the model, right? We're trying to sort of maybe at times over engineer a little bit in like terms of like building very specific sub agents or building very specific tools. When in reality, I think this has also been a thing that surprised me about Claude Code, the model in combination with the sandbox can get so creative. Sometimes it gets almost like too creative. You read the commands that you generate and you're like, what did it just do, right? But that idea that the model or the agent can produce its own code to creatively solve a problem. So it's kind of crazy, right? It's not just producing the end result code of the application that the user wants to write. The model and the agent can write this intermediate steps to arrive to the right solution or to debug the output. And so one of the big leaps in capability for us has been just embracing this idea of giving each agent its own computer. So the importance of the sandbox. And why is the sandbox, right? It's because if the model is writing code, it can get kind of nutty. Like it could be any code. And so I think where we're now engineering more is around tool approvals. It's around creating the right security guard rails. I think there's a lot of be said about finding that sweet spot between security and operator oversight. And also not annoying the end user would like approve, approve, approve. Sometimes even asking the user to approve commands that they don't understand. Here's a pearl inline script. Please approve it, right? And so I think finding that balance. So the sandbox and the computer for every agent is given as a tremendously big capability because we're able to reduce the number of fixed tools and just let those tools emerge as part of the agentic process. This is also true about problem solving. I think in the early versions of these products, I was very much of the mindset that the human engineer was the one that needed to be prepared to map out and confront all of the error scenarios that your AI and agent tool can get into instead of again, that it emerged with novel solutions. So a good example here would be we created this little CLI called agent browser. And so now, the zero has the ability to write code. It knows really well how to use NextShast. And then if it finds that something is broken, it can use the browser to take a look at the output. It can take screenshots of it. It can read the developer logs. So it's a little bit like arming the agent with the same tools that a human would have and not being too much of a micromanager. It's maybe a lesson in engineering management in general. You don't want to be the manager. It's like, hey, did you check this? Just let the model breathe to your point. These are high level things. That one's embracing the CLI. So I mentioned agent browser is a CLI. It's amazing to see the agent learn new tools that are not in the training data. So agent browser, which is this partner program that we give the agent, didn't really know how to use it. We now have since complemented with skills. Obviously at Versaille, we're very skilled-pilled. We found that it helps models get faster to their intent that results and so on. But yeah, seeing that emergence of, hey, there's the agent has a sense of how this tools click together. And it can embark in very creative problem-solving. Yeah, absolutely. All right, well, I'm curious for maybe some views that you might have. Maybe where you have opinions on what the models are going that maybe some of your peers might potentially disagree with you on. I know you have some spicy takes every now and then, so curious for one here. OK, yeah, so I think this is probably a consensus that clearly agents need less and less and less supervision. So we're experimenting with Slack integration that launches VZero Tasks directly from Slack. We've seen a lot of our colleagues and peers build their own agents that they invoke themselves kind of asynchronously. So maybe we're seeing this shift from more synchronous to asynchronous. Broadly speaking, I think all of the interaction modes with humans and agents will continue to exist in some way, shape, or form. For example, I personally really like CLIs when I'm doing something that's very specific. I'm problem-solving. I need to get a debug log. I need to look at it. I need to use a bunch of tools that I already know exists in my machine. I use a CLI. Then there's this other interaction model that VZero is really good at, which is you're working on the interface of your product or it's almost like UI-driven development, where you need really fast back and forth between the front and output and your prompts. And the human operator is really engaged almost in a creative process. That was almost instead of going longer. I feel like it's shrinking. For example, I'm kind of addicted to VZero Max Fast, which is backed by Claude Code Fast Mode. It's going to nuts. It's really expensive. So good news for Anthropic. But it's going to, like, the focus of state that you fall into is pretty amazing. And this is because that kind of interaction model with agents requires this really fast loops. And I feel like a lot of customers tell me it's replaced their design tool. This is perhaps a little spicy because even at Versailles, there's a lot of designers that love their more like drag and drop, art, boardy tools, and whatnot. And by the way, I also love those tools. I always tell people it's kind of shocking. But when I created the Versailles deployment CLI and the first version of Next.js, I didn't start with code. I designed it in Sketch. Really? The predecessor of Figma. I designed all the states of the CLI. Like deployment failure. This is not, like, so it's been very design tool-pilled. But I've seen how, in customer's tell me that VZero kind of replaces those. And then the last one is where the one I think will see the most growth in the future is this more asynchronous, less human supervised model. Where you launch your task and you're like, agent, come back to me with the solution. And obviously, we know this is Ralph with the exit criteria. But for example, our CTO created this awesome tool DeepSec where we throw Claude Code and Codex at massive code bases in parallel using sandboxes. And the exit criteria is come back to me with a reproducible security bug. And the results of it have been shocking. We've now collaborated with nearly a dozen open source projects where we've surfaced critical vulnerabilities. And this process is just kind of magical. It's kind of funny because I sit in front of my CTO at the office and he's kind of like fidgeting around, kind of semi-distracted because these agents are running in the background. And so this is entirely new mode of, I think, human computer interaction and anything's going to grow a lot. The other one that is maybe even more spicy is, I mean, the goal of that unsupervised agent is to build your software. But as I have to stop there, it could be promoting your software. It could be running at campaigns. It could be responding to support requests. So I do think there is a future ahead of this where, instead of just building software, we take it and set further. It's like we're building autonomous companies and we're building autonomous organizations. And again, it's less supervised. And I think when you see the excitement around things like OpenClaw and the text message driven agent development, you're just using your text messenger as a way of sort of delegating tags to your agent. And you're kind of expecting that they're going to take a while. And again, this is a completely different paradigm from I think how we've all used computers in the past, which is like we expect that instant gratification. But instead, we're just sending off these agents to work. I used the metaphor the other day, because there's a bunch of these autonomous companies that are growing really fast on for sale, or autonomous companies' builders. And it's a little bit like you're the board member. They think about it being a board member of a company, right? You get your quarterly board meeting. You give feedback to the CEO. You give them resources. Maybe you're an investor. You give them capital. And they run off and do amazing things in the world, hopefully. And then come back and give you a status report. So you're kind of like looking at the world one quarter at a time. And I think one could extrapolate that kind of engagement model with agents. I'm going to give you some resource. I'm going to give you a virtual credit card. I'm going to give you this. Check back with me. Let's see what you did in a week, month, quarter at a time. Yeah, absolutely. I'm curious on some of these kind of takes that you have. And clearly you're seeing where some of these things will, these interaction models will kind of change. How are you at a sell sort of architecting for that kind of change? You've obviously talked a lot about sandboxes and a lot of your perspective there. But as you kind of project forward, and what you would maybe anticipate from the next couple of releases of models, how do you prepare for that? Yeah, one is continuing to empower people. Every time you join for sale, we would have you deploy something. Whether it doesn't matter what your role was going to be engineer or not, we'd have you deploy something. Get a taste. We would even have people say enough for a GitHub account. Maybe the first and last get committed they ever make for some roles. And so I think it's always been about empowering people and helping them realize that they have no ceiling. They can use these tools. And I try to lead by example. I have a Slack channel where it's like my brain dump, literally called brain route G. And I try to show the company all the time how I'm using different tools, image models, video models, how I've coded my own little productivity tools, how I use operating system level AI. So trying to be really dynamic about knowledge sharing on best practices, I was empowering people from even a capital perspective. I don't like the idea of a token leaderboard who burns the most tokens the fastest. But I do like the idea of like, and by the way, there's a parallel between the consumption model of Claude tokens and what the Claude did for us. Like when Vercels started, it's pretty magical that I could just sign up for an AWS account and then do magic with it. And I think there's something to be said about tokens being almost like the new infrastructure because you tell people, all right, you know, you have all these tokens. Some are super smart, maybe super smart and super fast like fast mode. It would be a little bit more careful with those. Some are really inexpensive like open models by certain inference providers. And so knowing that these are your new raw materials, your new clay to shape up new solutions. And I think realizing that everybody at the company, not just the CTO can do these things has gotten us a long way. That's incredible. Well, I think Vercels is an incredible company. And it's really wonderful to have you here and for your perspective on how these teams can get better and how you're building agents and empowering so many builders to do this. It's been great to partner with you guys. Thank you so much.

Architecting for model step-changes: A fireside with Vercel's Guillermo Rauch

TL;DR

Takeaways

Vocabulary

Transcript