- Vibe coding is embracing AI to generate code to the point of "forgetting the code exists," a necessary shift for leveraging AI's exponentially growing capabilities in software development.
- To responsibly implement vibe coding in production, developers must transition from reviewing every line of code to managing and verifying work at a higher abstraction layer, similar to how expert managers oversee tasks without deep implementation knowledge.
- Successful vibe coding involves acting as a "Product Manager" for the AI by providing thorough context and requirements, while strategically applying AI-generated code to "leaf nodes" in the codebase to contain potential technical debt.
Vibe coding in prod | Code w/ Claude
- Redefine "Vibe Coding": True vibe coding means fully embracing AI code generation to the extent that you "forget the code exists," rather than merely using AI tools for assistance with a tight feedback loop.
- Embrace AI's Exponential Growth: Recognize that AI's capacity for generating code is improving exponentially, necessitating a departure from traditional line-by-line code review to avoid becoming a bottleneck.
- Manage by Abstraction, Not Implementation: Verify AI-generated code at a higher abstraction layer (e.g., acceptance tests, stress tests, product usage) without needing to understand all underlying implementation details, akin to how senior managers oversee expert work.
- Target "Leaf Nodes" for AI Code: Concentrate AI-generated code on "leaf nodes" in your codebase—self-contained features with no dependencies—to limit the impact of potential technical debt and preserve the integrity of core architectural components.
- Act as the AI's Product Manager: Provide comprehensive guidance, requirements, specifications, and codebase context to the AI, dedicating significant upfront effort (e.g., 15-20 minutes in a conversational prompt) to ensure higher success rates.
- Design for Verifiability: Structure systems to have easily verifiable inputs and outputs, allowing for the creation of stress tests and checkpoints that confirm correctness and stability of AI-generated components without deep code inspection.
- Leverage AI for Codebase Exploration: Utilize AI tools to quickly understand unfamiliar codebases by asking it to identify relevant files, classes, and similar features, thereby building a mental model before starting development work.
- Implement Test-Driven Development (TDD): Encourage the AI to write minimalist, end-to-end tests first, which can serve as a primary point of human review to validate functionality and build confidence in the AI-generated implementation.
Vibe Coding— A development methodology where AI is extensively used to generate code, allowing developers to focus on product outcomes rather than implementation details to the point of "forgetting the code exists."Prod— Short for "production," referring to the live, operational environment where software applications are deployed and used by end-users.Anthropic— An AI safety and research company that develops large language models, including Claude, mentioned as the speaker's employer.Claude— An AI assistant or large language model developed by Anthropic, capable of code generation, analysis, and conversational interaction.Cursor— An AI-native code editor designed to integrate AI capabilities directly into the coding workflow for enhanced productivity.Copilot— An AI pair programmer tool developed by GitHub and OpenAI that provides code suggestions and auto-completions based on context.Exponential— A rapid and accelerating rate of improvement or growth, used in the context of AI capabilities dramatically increasing over time.Abstraction Layer— A conceptual boundary that hides the complex underlying details of a system, allowing interaction at a simpler, higher level.Tech Debt— Technical debt; the long-term cost incurred by choosing quick-fix, suboptimal solutions in code, which later requires more effort to refactor or maintain.Leaf Nodes— In a software architecture, these are the outermost components or features that do not have other parts of the system depending on them, making them suitable areas for containing experimental or AI-generated code.
Hey everyone, welcome. I'm here to talk about everyone's favorite subject, vibe coding, and somewhat, controversially, how to vibe code in prod responsibly. So let's talk about vibe coding and what this even is. So first of all, I'm Eric. I'm a researcher at Anthropic focused on coding agents. I was the author, along with Barry Zayn, of building effective agents, where we outlined for all of you our best science and best practices for creating agents, no matter what the application is. This is a subject that's near and dear to my heart. Last year, I actually broke my hand while biking to work and was in a cast for two months, and Claude wrote all of my code for those two months. And so figuring out how to make this happen effectively was really important to me, and I was luckily able to figure that out well and sort of help bring that into a lot of Anthropics, other products, and in our models through my research. So let's first start talking about what is vibe coding. A lot of people really can flate vibe coding with just extensive use of AI to generate your code. But I think this isn't quite true. A lot of people, they're using cursor, they're using copilot. It's a lot of AI, and a lot of the code is coming from the AI rather than them writing itself. But I think when you are still in a tight feedback loop with a model like that, that isn't truly vibe coding. When I say vibe coding, I think we need to go to Andre Carpathi's definition, where vibe coding is where you fully give into the vibes, embrace exponentials, and forget that the code even exists. I think the key part here is forget the code even exists. And now, the reason this is important is that vibe coding was when people outside of the engine during industry really started getting excited about code generation. Copilot on cursor were great, but only sort of four engineers. But someone that didn't know how to code suddenly with vibe coding, they could find themselves coding an entire app by themselves. And this was a really exciting thing and a big unlock to a lot of people. Now, of course, there were a lot of downsides of this. And you had people coding for the first time and really without knowing what they were doing at all. And you said, hey, random things are happening, maxed out usage on my API keys. People are bypassing the subscription, creating random shit on the DB. And so this is kind of the downside of vibe coding of what started happening. And the positive sides of vibe coding that you'd see were all things that were really kind of low stakes. It was people building video games, building sort of fun side projects, things where it's OK if there is a bug. So why do we even care about vibe coding? If it seems like something where the stakes are really high, if you do it for a real product, and the most successful cases of it are kind of these toy examples or fun things where the stakes are very low. And my answer for why we should care about vibe coding is because of the exponential. The length of tasks that AI can do is doubling every seven months. Right now, we're at about an hour. And that's fine. You don't need to vibe code. You can have cursor work for you. You can have clawed code right a feature that would take an hour. And you can review all that code. And you can be still be intimately involved as the AI is writing a lot of your code. But what happens next year? What happens the year after that? When the AI is powerful enough that it can be generating an entire day's worth of work for you at a time, or an entire week's worth of work, there is no way that we're going to be able to keep up with that if we still need to move in lockstep. Lockstep. And that means that if we want to take advantage of this exponential, we are going to have to find a way to responsibly give into this and find some way to leverage this task. I think my favorite analogy here is like compilers. I'm sure in the early day of compilers, a lot of developers really didn't trust them. They might use a compiler, but they'd still read the assembly that it would output to make sure it looks how they would write the assembly. But that just doesn't scale. At a certain point, you start needing to work on systems that are big enough that you just have to trust the system. The question, though, is how do you do that responsibly? And I think my challenge to the whole software industry over the next few years is how will we vibe code in prod and do it safely? And my answer to that is that we will forget that the code exists, but not that the product exists. Thinking again to that compiler analogy, we all still know that there's assembly under the hood, but hopefully most of us don't need to really think about what the assembly actually is. But we still are able to build good software without understanding that assembly under the hood. And I think that we will get to that same level with software. And one thing I really want to emphasize is that this is not a new problem. How does a CTO manage an expert in a domain where the CTO is not themselves an expert? How does a PM review an engineering feature when they themselves can't read all of the code that went into it? Or how does a CEO check the accountants work when they are themselves are not an expert in financial accounting? And these are all problems that have existed for hundreds or thousands of years. And we have solutions to them. A CTO can still write acceptance tests for an expert that works for them. Even if they don't understand the implementation under the hood, they can see that these acceptance tests pass and that the work is high quality. A product manager can use the product that their engineering team built and make sure that it works the way they expected, even if they're not writing the code. And a CEO can spot check key facts that they do understand and slices of the data so that they can build confidence in the overall financial model, even though they themselves might not be an expert in how the entire thing flows. And so thinking about these examples, managing implementations that you yourself don't understand is actually a problem as old as civilization. And every manager in the world is actually already dealing with this. Just we as software engineers are not used to this. We are used to being purely individual contributors where we understand the full depth down to the stack. But that's something that in order to become most productive, we are going to need to let go of in the way that every manager in order to be most productive is going to need to let go of some details. And just like us as software engineers, we let go of some of the details of understanding the assembly itself that's happening under the hood. And the way that you do this while still being safe and being responsible is to find an abstraction layer that you can verify, even without knowing the implementation underneath it. Now, I have one caveat to that today, which is tech debt. So right now there is not a good way to measure or validate tech debt without reading the code yourself. Most other systems in life, like the accountants example, the PM, you have ways to verify the things you care about without knowing the implementation. Tech debt, I think, is one of those rare things where there really isn't a good way to validate it other than being an expert in the implementation itself. So that is the one thing that right now we do not have a good way to validate. However, that doesn't mean that we can't do this at all. It just means we need to be very smart and targeted where we can take advantage of code. My answer to this is to focus on leaf nodes in our code base. And what I mean by that is parts of the code and parts of our system that nothing depends on them. They are kind of the end feature. They're the end-bellar whistle rather than things that are the branch or trunks beneath them, like here in white. Here, the orange dots are all these leaf nodes, where honestly, if you have a system like this, it's kind of OK if there is tech debt in these leaf nodes because nothing else depends on them. They're unlikely to change. They're unlikely to have further things built on them. Versus the things that are in white here, the trunks and the underlying branches of your system, that is the core architecture that we as engineers still need to deeply understand because that's what's going to change. That's what other things are going to be built on. And it's very important that we protect those and make sure that they stay extensible and understandable and flexible. Now, the one thing I will say here is that the models are getting better all the time. And so we might get to a world where this gets further and further down, where we trust the models more and more to write code that is extensible and doesn't have tech debt. Using the Clawed 4 models over the last week or two within Anthropic has been a really exciting thing. And I've given them much more trust than I did with 3.7. So I think that this is going to change and more and more of the stack we will be able to work with in this way. So let's talk about how to succeed at vibe coding. And my main advice here is ask not what Clawed can do for you, but what you can do for Clawed. I think when you're vibe coding, you are basically acting as a product manager for Clawed. So you need to think like a product manager. What guidance or context would a new employee on your team need to succeed at this task? I think a lot of times we're too used to doing sort of a very quick back and forth chat with AI of make this feature, fix this bug. But a human if it was their first day on the job and you just said, hey, implement this feature, there's no way you'd expect them to actually succeed at that. You need to give them a tour of the code base. You need to tell them what are the actual requirements and specifications and constraints that they need to understand. And I think that as we vibe code, that becomes our responsibility to feed that information into Clawed to make sure that it has all that same context and is set up to succeed. When I'm working on features with Clawed, I often spend 15 or 20 minutes collecting guidance into a single prompt and then let Clawed Cog out after that. And that 15 or 20 minutes isn't just me writing the prompt by hand. This is often a separate conversation where I'm talking back and forth with Clawed. It's exploring the code base, it's looking for files. We're building a plan together that captures the essence of what I want, what files are going to be changed, what patterns in the code base should it follow. And once I have that artifact, that all that information, then I give it to Clawed, either in a new context or say, hey, let's go execute this plan. And I've typically seen once I put that effort into collecting all that information, Clawed has a very, very high success rate of being able to complete something in a very good way. And the other thing I'll say here is that you need to be able to ask the right questions. And despite the title of my talk, I don't think that vibe coding in Clawed is for everybody. I don't think that people that are fully non-technical should go and try to build a business fully from scratch. I think that is dangerous because they're not able to ask the right questions. They're not able to be an effective product manager for Clawed when they do that. And so they're not going to succeed. We recently merged a 22,000 line change to our production reinforcement learning code base that was written heavily by Clawed. So how on earth did we do this responsibly? And yes, this is the actual screenshot of the diff from GitHub for the PR. The first thing is we asked what we could do for Clawed. This wasn't just a single prompt that we then merged. There was still days of human work that went into this of coming up with the requirements, guiding Clawed and figuring out what the system should be. And we really, really embraced our roles as the product manager for Clawed in this feature. The change was largely concentrated in leaf nodes in our code base, where we knew it was okay for there to be some tech debt, because we didn't expect these parts of the code base to need to change in the near future. And the parts of it that we did think were important that would need to be extensible, we did heavy human review of those parts. And lastly, we carefully designed stress tests for stability. And we designed the whole system so that it would have very easily human verifiable inputs and outputs. And what that let us do these last two pieces is it let us create these sort of verifiable checkpoints so that we could make sure that this was correct even without understanding or reading the full underlying implementation. Our biggest concern was stability, and we were able to measure that even without reading the code by creating these stress tests and running them for long durations. And we were able to verify correctness based on the input and outputs of the system that we designed it to have. So basically we designed this system to be understandable and verifiable even without us reading all the code. And so ultimately by combining those things, we were able to become just as confident in this change as any other change that we made to our code base, but deliver it in sort of a tiny fraction of the time and effort that it would have taken to write this entire thing from hand by hand and review sort of every line of it. And I think one of the really exciting things about this is not just that this saved us, you know, a week's worth of human time. But knowing that we could do this, it made us think differently about, you know, our engineering, about what we could do. And now suddenly when something costs one day of time instead of two weeks, you realize that you can go and make much bigger features and much bigger changes sort of like the marginal cost of software is lower and it lets you consume and build more software. So I think that was the really exciting thing about this is not just saving the time, but now kind of feeling like, oh, things that are going to take two weeks, let's just do them. It's only going to take a day. And that's kind of the exciting thing here. So to leave you with the closing thoughts about how to vibe code in prod responsibly, B-clods PM, ask not what Claude can do for you, but what you can do for Claude. Focus your vibe coding on the leaf nodes, not the core architecture and underlying systems so that if there is tech that it's contained and it's not in important areas, think about verifiability and how you can know whether this change is correct without needing to go read the code yourself. And finally, remember the exponential. It's okay today if you don't vibe code, but in a year or two, it's going to be a huge, huge disadvantage. If you yourself are, you know, demanding that you read every single line of code or write every single line of code, you're going to not be able to take advantage of the newest wave of models that are able to produce very, very large chunks of work for you. And you are going to become the bottleneck if we don't get good at this. So overall, that is a vibe coding and prod responsibly. And I think this is going to become one of the biggest challenges for the software engineer for the software engineering industry over the next few years. Thank you. And I have plenty of time for questions. Yeah. In the past, we spent a lot of time dealing with syntax problems or libraries or connections amongst components of the code. And that was how we learned vibe code and like that. But how do we learn now? How do we become better vibe code? Is how do we know more to become better product managers of the agent AI? Yeah. So I think this is a really interesting question. And I think there are reasons to be very worried about this and also reasons to be very optimistic about this. I think the reason to be worried, like you mentioned, is that, you know, we're not going to be there in the struggle and the grind. I think that that is actually OK. I've met some of my professors in college who would say, like, oh, man, like, coders today aren't as good because they'd ever had to write their assembly by hand. They don't really feel the pain of how to make something run really fast. I think the positive side of this is that I have found that I'm able to learn about things so much more quickly by using these AI tools. A lot of times when I am coding with Claude, I'll be reviewing the code and I'll say, hey, Claude, I've never seen this library before. Tell me about it. Like, why did you choose it over another? And having sort of that always-there pair programmer, like, again, I think what's going to change is that people that are lazy are not going to learn. They're just going to glide by. But if you take the time and you want to learn, there's all these amazing resources. And like, Claude will help you understand what it vibed coded for you. The other thing I will say is that for learning some of these higher-level things about what makes a project go well, what is a feature that gets you product market fit versus flops, we're going to be able to take so many more shots on goal. I feel like especially sort of like system engineers or architects, it takes, you know, oftentimes like two years to like make a big change in a code base and really kind of come to terms with, was that a good architecture decision or not. And if we can collapse that time down to six months, I think engineers that are investing in their own time and trying to learn, they're going to be able to, you know, learn from four times as many lessons in the same amount of calendar time as long as they're putting in the effort to try. Yeah, going back to your pre-planning process, what's the balance between giving it too much information and too little, are you giving it a full product requirement document, is there kind of a standardized template that you put together before you actually move into vibe coding? Yeah, I think it depends a lot on what you care about. I would say that if it ranges further for things where I don't really care how it does it, I won't talk at all about the implementation details. I'll just say these are my requirements, like this is what I want at the end. There's other times where I know the code base well and I will go into much more depth of like, hey, these are the classes you should use to implement this logic. Look at this example of a similar feature. I'd say it all comes down to sort of what you care about at the end of the day. I would say though that like our models do best when you don't over-constraint them. So, you know, if you, I wouldn't put too much effort into creating sort of a very rigorous, you know, format or anything. I would just, you know, think about it as like a junior engineer, what you would give them in order to succeed. So, sorry about what you're saying. How did you balance effectiveness and cyber security? Like there were reports a couple months back of like the top 10 vibe coded apps being super vulnerable and a lot of important information was released. Well, not released but proven to be released. And the person who did it wasn't even like a pro hacker and stuff. And so, like there's that how did you balance being able to keep things secure even at a leaf node level and then also being effective because something can be effective but not secure? Yeah, that's a great question. And I think that all comes down to this first point here of like being Claude's PM and understanding enough about the context to basically know what is dangerous, know what's safe, and know where you should be careful. And I think yeah, the things that get a lot of press about vibe coding are people that have no business coding at all doing these things. And that's fine. It's great for games, that's great for like creativity and like having people be able to create. But I think for production systems, you need to know enough about like what questions to ask to guide Claude in the right direction. And for our internal case of this example, it was something that's fully offline. And so, we knew there weren't any, like there were, we were very, very confident that there was like no security problems that could happen into this. In our case, it's like run in something that's fully offline. So this is more about people you're mentioning as like have no business. And maybe I shouldn't have said like that. No business vibe coding in production for an important system. I will say that. Yeah. But if we look at the numbers, right, we less than 0.5% of the world's population are software developers. And software is an amazing way to scale ideas. So how do you think the products need to change? To make it easier for people to vibe code and build software while also avoiding some of the things that we run into with people leaking API keys and things like that? That's really a great question. And I would be super excited to see more products and frameworks emerge that are kind of like provably correct. And maybe what I mean by that is I'm sure people could build some backend systems that the important off parts, the payment parts are built for you. And all you have to do is fill in the UI layer. And you can vibe code that. And it basically gives you some nice fill in the blank sandboxes where to put your code. I feel like there's tons of things like that that could exist. Maybe the simplest example is like clawed artifacts, where clawed can help you write code that gets hosted right there in clawed AI to display. And of course that is safe because it is very limited. There is no auth, there is no payments, it's front end only. But maybe that's a good product idea that someone should do here is build some way to make a provably correct hosting system that can have a backend that you know is safe, no matter what shenanigans happens on the front end. But I hope people build good tools that are compliments to vibe coding. So for test driven development do you have any tips? Because like I often see that claw just spits out the entire implementation and then writes test cases. Sometimes they fail. And then I just want, you know, I'm trying to prompt it to write the test cases first. But I also don't want to like, you know, verify them by myself because I haven't seen them in the implementation yet. So do you have an iteratable approach that, you know, have you ever tried it for test driven development? Yeah, yeah. I definitely, test driven development is very, very useful in vibe coding. As long as you can understand what the test cases are. Even without that it helps clawed sort of be a little bit more self-consistent. Even if you yourself don't look at the test. But a lot of times I'd say it's easy for clawed to go down a rabbit hole of writing tests that are like two implementation specific. When I'm trying to do this a lot of times I will encourage, I will give clawed examples of like, hey, just write three end to end tests and, you know, do the happy path and error case and this other error case. And I'm kind of like very prescriptive about that I want the test to be like general and end to end. I think that helps make sure it's something that I can understand and it's something that clawed can do without getting two in the weeds. I'll also say a lot of times when I'm vibe coding the only part of the code or at least the first part of the code that I'll read is the tests to make sure that, you know, if I agree with the tests and the tests pass, then I feel pretty good about the code. That works best if you can encourage clawed to write sort of very minimalist end to end tests. I think that's good for the very fascinating talk. I also appreciate that you've done what a lot of people haven't done and tried to interpret one of the more peculiar lines in Carpathee's original post embrace exponentials. So I wonder if I could pin you down a little more and say, how would I know if I've embraced the exponentials? What precisely means following that advice and to maybe put down a little more in what I think it intends to mean. Maybe it leads to the models will get better. Do you think there's some legitimacy in saying just the fact that the models will get better, doesn't mean they'll get better at every conceivable dimension we might be imagining we hope they'll be in. So yeah, to how do I embrace an exponentials? Yeah, absolutely. So I think you got close with sort of the quote of, keep assuming the models were going to get better. But it's a step beyond that. The idea, the exponential, is not just that they're going to keep getting better, but they're going to get better faster than we can possibly imagine. And that's kind of like when you, you can kind of see the shape of the dots here. It's, it's not just that it's getting steadily better. It's that it's getting better and then it's, it goes wild. I think the other funny quote I heard from this, this was a, I think in a Dario and Mike Krieger's talk is machines of loving grace is not science fiction. It's a product road map. Even though it sounds like something that's very far out, like when you are on an exponential, things get wild very, very fast and faster than you expect. And I think, you know, if you, if you talk to someone that was doing computers in the 90s, it's like, okay, great, we have a couple kilobytes of RAM. We have a couple more kilobytes of RAM. But if you fast forward to where we are now, it's like we have terabytes. And it's like, it's not just that it got twice as good. It's that things got millions of times better. And that's what happens with exponentials over a course of 20 years. So we shouldn't think about 20 years. So now it's like, what happens if these models are twice as good? We should think about what happens if these models are a million times smarter and faster than they are today, which is wild. Like I, we can't even think about what that means. In the same way that someone working on computers in the 90s, I don't think they could think about what would happen to society if a computer was a million times faster than what they were working with. But that's what happened. And so that's what we mean by the exponential, is it's going to go bonkers? Yes. I got a couple, well I got one question, but it's kind of two parts. The first part, when it comes to biocoding, I have like two different workflows. I have one where I'm in my terminal, and then I have one when I'm in VS code, or cursor. Which workflow do you use? And if you're using Claude Code in the terminal, how often do you compact? Because what I find is my functions will get a new name as the long ride vibe code. Or just things kind of go off the rails, and if I compact, it still happens. If I create a document to kind of guide it, I still have to get it back on track. Yeah, great question. I do both. I often code with Claude Code opened in my terminal in VS code. And I'd say that like Claude Code is doing most of the editing, and I'm kind of reviewing the code as I go in VS code. Which is not true vibe coding in the sense here, or maybe I'm reviewing just the tests from it. I like to compact or just start a new session. Kind of whenever I get Claude to a good stopping point, where it kind of feels like, okay, as a human programmer, when would I kind of stop and take a break and maybe go get lunch and then come back? If I feel like I'm at that kind of stage, that's like a good time to compact. So maybe I'll start off with having Claude find all the relevant files and make a plan. And then I'll say, okay, like, you know, write all this into a document and then I'll compact. And that gets rid of 100K tokens that we took to create that plan and find all these files and boils it down to a few thousand tokens. Hey, so one question is following up his previous question, which is, have you used other tools along with Claude Code to increase your speed a little bit more, like running multiple Claude codes together using get work trees and then like sort of merging few things or stack PR or something like that. Is that something that you personally follow or would advise to? Second question is, how do you very structurally and in a very nice engineering approach, a part of the code base that you're not very familiar with, but you want to like ship a PR in it really fast and you want to do it in a really nice way and not wipe code it. So yeah, like what are the, what are your ways of like using Claude Code to help do both these things? Yep. So I definitely use Claude Code as well as cursor. And I'd say typically all like start things with Claude Code and then I'll use cursor to fix things up or if I was like, if I have very specific changes that I know exactly the change that I wanted to do to this file, I'll just do it myself with cursor and sort of target the exact lines that I know need to change. The second part of your question was, oh yeah, like how to get spun up on a new part of the code base. Before I start trying to write the feature, I use Claude Code to help me explore the code base. So I might say like, tell me where in this code base off happens or you know where in this code base something happens. Tell me similar features to this and like, have it tell me the file names, have it tell me the classes that I should look at. And then kind of use that to try to build up a mental picture to make sure that I can do this and not vibe code, make sure I can still get like a good sense of what's happening. And then I go work on the feature with Claude. Thank you so much. I'll be east around and can Miller and answer other questions.
TL;DR
- Vibe coding is embracing AI to generate code to the point of "forgetting the code exists," a necessary shift for leveraging AI's exponentially growing capabilities in software development.
- To responsibly implement vibe coding in production, developers must transition from reviewing every line of code to managing and verifying work at a higher abstraction layer, similar to how expert managers oversee tasks without deep implementation knowledge.
- Successful vibe coding involves acting as a "Product Manager" for the AI by providing thorough context and requirements, while strategically applying AI-generated code to "leaf nodes" in the codebase to contain potential technical debt.
Takeaways
- Redefine "Vibe Coding": True vibe coding means fully embracing AI code generation to the extent that you "forget the code exists," rather than merely using AI tools for assistance with a tight feedback loop.
- Embrace AI's Exponential Growth: Recognize that AI's capacity for generating code is improving exponentially, necessitating a departure from traditional line-by-line code review to avoid becoming a bottleneck.
- Manage by Abstraction, Not Implementation: Verify AI-generated code at a higher abstraction layer (e.g., acceptance tests, stress tests, product usage) without needing to understand all underlying implementation details, akin to how senior managers oversee expert work.
- Target "Leaf Nodes" for AI Code: Concentrate AI-generated code on "leaf nodes" in your codebase—self-contained features with no dependencies—to limit the impact of potential technical debt and preserve the integrity of core architectural components.
- Act as the AI's Product Manager: Provide comprehensive guidance, requirements, specifications, and codebase context to the AI, dedicating significant upfront effort (e.g., 15-20 minutes in a conversational prompt) to ensure higher success rates.
- Design for Verifiability: Structure systems to have easily verifiable inputs and outputs, allowing for the creation of stress tests and checkpoints that confirm correctness and stability of AI-generated components without deep code inspection.
- Leverage AI for Codebase Exploration: Utilize AI tools to quickly understand unfamiliar codebases by asking it to identify relevant files, classes, and similar features, thereby building a mental model before starting development work.
- Implement Test-Driven Development (TDD): Encourage the AI to write minimalist, end-to-end tests first, which can serve as a primary point of human review to validate functionality and build confidence in the AI-generated implementation.
Vocabulary
Vibe Coding— A development methodology where AI is extensively used to generate code, allowing developers to focus on product outcomes rather than implementation details to the point of "forgetting the code exists."Prod— Short for "production," referring to the live, operational environment where software applications are deployed and used by end-users.Anthropic— An AI safety and research company that develops large language models, including Claude, mentioned as the speaker's employer.Claude— An AI assistant or large language model developed by Anthropic, capable of code generation, analysis, and conversational interaction.Cursor— An AI-native code editor designed to integrate AI capabilities directly into the coding workflow for enhanced productivity.Copilot— An AI pair programmer tool developed by GitHub and OpenAI that provides code suggestions and auto-completions based on context.Exponential— A rapid and accelerating rate of improvement or growth, used in the context of AI capabilities dramatically increasing over time.Abstraction Layer— A conceptual boundary that hides the complex underlying details of a system, allowing interaction at a simpler, higher level.Tech Debt— Technical debt; the long-term cost incurred by choosing quick-fix, suboptimal solutions in code, which later requires more effort to refactor or maintain.Leaf Nodes— In a software architecture, these are the outermost components or features that do not have other parts of the system depending on them, making them suitable areas for containing experimental or AI-generated code.
Transcript
Hey everyone, welcome. I'm here to talk about everyone's favorite subject, vibe coding, and somewhat, controversially, how to vibe code in prod responsibly. So let's talk about vibe coding and what this even is. So first of all, I'm Eric. I'm a researcher at Anthropic focused on coding agents. I was the author, along with Barry Zayn, of building effective agents, where we outlined for all of you our best science and best practices for creating agents, no matter what the application is. This is a subject that's near and dear to my heart. Last year, I actually broke my hand while biking to work and was in a cast for two months, and Claude wrote all of my code for those two months. And so figuring out how to make this happen effectively was really important to me, and I was luckily able to figure that out well and sort of help bring that into a lot of Anthropics, other products, and in our models through my research. So let's first start talking about what is vibe coding. A lot of people really can flate vibe coding with just extensive use of AI to generate your code. But I think this isn't quite true. A lot of people, they're using cursor, they're using copilot. It's a lot of AI, and a lot of the code is coming from the AI rather than them writing itself. But I think when you are still in a tight feedback loop with a model like that, that isn't truly vibe coding. When I say vibe coding, I think we need to go to Andre Carpathi's definition, where vibe coding is where you fully give into the vibes, embrace exponentials, and forget that the code even exists. I think the key part here is forget the code even exists. And now, the reason this is important is that vibe coding was when people outside of the engine during industry really started getting excited about code generation. Copilot on cursor were great, but only sort of four engineers. But someone that didn't know how to code suddenly with vibe coding, they could find themselves coding an entire app by themselves. And this was a really exciting thing and a big unlock to a lot of people. Now, of course, there were a lot of downsides of this. And you had people coding for the first time and really without knowing what they were doing at all. And you said, hey, random things are happening, maxed out usage on my API keys. People are bypassing the subscription, creating random shit on the DB. And so this is kind of the downside of vibe coding of what started happening. And the positive sides of vibe coding that you'd see were all things that were really kind of low stakes. It was people building video games, building sort of fun side projects, things where it's OK if there is a bug. So why do we even care about vibe coding? If it seems like something where the stakes are really high, if you do it for a real product, and the most successful cases of it are kind of these toy examples or fun things where the stakes are very low. And my answer for why we should care about vibe coding is because of the exponential. The length of tasks that AI can do is doubling every seven months. Right now, we're at about an hour. And that's fine. You don't need to vibe code. You can have cursor work for you. You can have clawed code right a feature that would take an hour. And you can review all that code. And you can be still be intimately involved as the AI is writing a lot of your code. But what happens next year? What happens the year after that? When the AI is powerful enough that it can be generating an entire day's worth of work for you at a time, or an entire week's worth of work, there is no way that we're going to be able to keep up with that if we still need to move in lockstep. Lockstep. And that means that if we want to take advantage of this exponential, we are going to have to find a way to responsibly give into this and find some way to leverage this task. I think my favorite analogy here is like compilers. I'm sure in the early day of compilers, a lot of developers really didn't trust them. They might use a compiler, but they'd still read the assembly that it would output to make sure it looks how they would write the assembly. But that just doesn't scale. At a certain point, you start needing to work on systems that are big enough that you just have to trust the system. The question, though, is how do you do that responsibly? And I think my challenge to the whole software industry over the next few years is how will we vibe code in prod and do it safely? And my answer to that is that we will forget that the code exists, but not that the product exists. Thinking again to that compiler analogy, we all still know that there's assembly under the hood, but hopefully most of us don't need to really think about what the assembly actually is. But we still are able to build good software without understanding that assembly under the hood. And I think that we will get to that same level with software. And one thing I really want to emphasize is that this is not a new problem. How does a CTO manage an expert in a domain where the CTO is not themselves an expert? How does a PM review an engineering feature when they themselves can't read all of the code that went into it? Or how does a CEO check the accountants work when they are themselves are not an expert in financial accounting? And these are all problems that have existed for hundreds or thousands of years. And we have solutions to them. A CTO can still write acceptance tests for an expert that works for them. Even if they don't understand the implementation under the hood, they can see that these acceptance tests pass and that the work is high quality. A product manager can use the product that their engineering team built and make sure that it works the way they expected, even if they're not writing the code. And a CEO can spot check key facts that they do understand and slices of the data so that they can build confidence in the overall financial model, even though they themselves might not be an expert in how the entire thing flows. And so thinking about these examples, managing implementations that you yourself don't understand is actually a problem as old as civilization. And every manager in the world is actually already dealing with this. Just we as software engineers are not used to this. We are used to being purely individual contributors where we understand the full depth down to the stack. But that's something that in order to become most productive, we are going to need to let go of in the way that every manager in order to be most productive is going to need to let go of some details. And just like us as software engineers, we let go of some of the details of understanding the assembly itself that's happening under the hood. And the way that you do this while still being safe and being responsible is to find an abstraction layer that you can verify, even without knowing the implementation underneath it. Now, I have one caveat to that today, which is tech debt. So right now there is not a good way to measure or validate tech debt without reading the code yourself. Most other systems in life, like the accountants example, the PM, you have ways to verify the things you care about without knowing the implementation. Tech debt, I think, is one of those rare things where there really isn't a good way to validate it other than being an expert in the implementation itself. So that is the one thing that right now we do not have a good way to validate. However, that doesn't mean that we can't do this at all. It just means we need to be very smart and targeted where we can take advantage of code. My answer to this is to focus on leaf nodes in our code base. And what I mean by that is parts of the code and parts of our system that nothing depends on them. They are kind of the end feature. They're the end-bellar whistle rather than things that are the branch or trunks beneath them, like here in white. Here, the orange dots are all these leaf nodes, where honestly, if you have a system like this, it's kind of OK if there is tech debt in these leaf nodes because nothing else depends on them. They're unlikely to change. They're unlikely to have further things built on them. Versus the things that are in white here, the trunks and the underlying branches of your system, that is the core architecture that we as engineers still need to deeply understand because that's what's going to change. That's what other things are going to be built on. And it's very important that we protect those and make sure that they stay extensible and understandable and flexible. Now, the one thing I will say here is that the models are getting better all the time. And so we might get to a world where this gets further and further down, where we trust the models more and more to write code that is extensible and doesn't have tech debt. Using the Clawed 4 models over the last week or two within Anthropic has been a really exciting thing. And I've given them much more trust than I did with 3.7. So I think that this is going to change and more and more of the stack we will be able to work with in this way. So let's talk about how to succeed at vibe coding. And my main advice here is ask not what Clawed can do for you, but what you can do for Clawed. I think when you're vibe coding, you are basically acting as a product manager for Clawed. So you need to think like a product manager. What guidance or context would a new employee on your team need to succeed at this task? I think a lot of times we're too used to doing sort of a very quick back and forth chat with AI of make this feature, fix this bug. But a human if it was their first day on the job and you just said, hey, implement this feature, there's no way you'd expect them to actually succeed at that. You need to give them a tour of the code base. You need to tell them what are the actual requirements and specifications and constraints that they need to understand. And I think that as we vibe code, that becomes our responsibility to feed that information into Clawed to make sure that it has all that same context and is set up to succeed. When I'm working on features with Clawed, I often spend 15 or 20 minutes collecting guidance into a single prompt and then let Clawed Cog out after that. And that 15 or 20 minutes isn't just me writing the prompt by hand. This is often a separate conversation where I'm talking back and forth with Clawed. It's exploring the code base, it's looking for files. We're building a plan together that captures the essence of what I want, what files are going to be changed, what patterns in the code base should it follow. And once I have that artifact, that all that information, then I give it to Clawed, either in a new context or say, hey, let's go execute this plan. And I've typically seen once I put that effort into collecting all that information, Clawed has a very, very high success rate of being able to complete something in a very good way. And the other thing I'll say here is that you need to be able to ask the right questions. And despite the title of my talk, I don't think that vibe coding in Clawed is for everybody. I don't think that people that are fully non-technical should go and try to build a business fully from scratch. I think that is dangerous because they're not able to ask the right questions. They're not able to be an effective product manager for Clawed when they do that. And so they're not going to succeed. We recently merged a 22,000 line change to our production reinforcement learning code base that was written heavily by Clawed. So how on earth did we do this responsibly? And yes, this is the actual screenshot of the diff from GitHub for the PR. The first thing is we asked what we could do for Clawed. This wasn't just a single prompt that we then merged. There was still days of human work that went into this of coming up with the requirements, guiding Clawed and figuring out what the system should be. And we really, really embraced our roles as the product manager for Clawed in this feature. The change was largely concentrated in leaf nodes in our code base, where we knew it was okay for there to be some tech debt, because we didn't expect these parts of the code base to need to change in the near future. And the parts of it that we did think were important that would need to be extensible, we did heavy human review of those parts. And lastly, we carefully designed stress tests for stability. And we designed the whole system so that it would have very easily human verifiable inputs and outputs. And what that let us do these last two pieces is it let us create these sort of verifiable checkpoints so that we could make sure that this was correct even without understanding or reading the full underlying implementation. Our biggest concern was stability, and we were able to measure that even without reading the code by creating these stress tests and running them for long durations. And we were able to verify correctness based on the input and outputs of the system that we designed it to have. So basically we designed this system to be understandable and verifiable even without us reading all the code. And so ultimately by combining those things, we were able to become just as confident in this change as any other change that we made to our code base, but deliver it in sort of a tiny fraction of the time and effort that it would have taken to write this entire thing from hand by hand and review sort of every line of it. And I think one of the really exciting things about this is not just that this saved us, you know, a week's worth of human time. But knowing that we could do this, it made us think differently about, you know, our engineering, about what we could do. And now suddenly when something costs one day of time instead of two weeks, you realize that you can go and make much bigger features and much bigger changes sort of like the marginal cost of software is lower and it lets you consume and build more software. So I think that was the really exciting thing about this is not just saving the time, but now kind of feeling like, oh, things that are going to take two weeks, let's just do them. It's only going to take a day. And that's kind of the exciting thing here. So to leave you with the closing thoughts about how to vibe code in prod responsibly, B-clods PM, ask not what Claude can do for you, but what you can do for Claude. Focus your vibe coding on the leaf nodes, not the core architecture and underlying systems so that if there is tech that it's contained and it's not in important areas, think about verifiability and how you can know whether this change is correct without needing to go read the code yourself. And finally, remember the exponential. It's okay today if you don't vibe code, but in a year or two, it's going to be a huge, huge disadvantage. If you yourself are, you know, demanding that you read every single line of code or write every single line of code, you're going to not be able to take advantage of the newest wave of models that are able to produce very, very large chunks of work for you. And you are going to become the bottleneck if we don't get good at this. So overall, that is a vibe coding and prod responsibly. And I think this is going to become one of the biggest challenges for the software engineer for the software engineering industry over the next few years. Thank you. And I have plenty of time for questions. Yeah. In the past, we spent a lot of time dealing with syntax problems or libraries or connections amongst components of the code. And that was how we learned vibe code and like that. But how do we learn now? How do we become better vibe code? Is how do we know more to become better product managers of the agent AI? Yeah. So I think this is a really interesting question. And I think there are reasons to be very worried about this and also reasons to be very optimistic about this. I think the reason to be worried, like you mentioned, is that, you know, we're not going to be there in the struggle and the grind. I think that that is actually OK. I've met some of my professors in college who would say, like, oh, man, like, coders today aren't as good because they'd ever had to write their assembly by hand. They don't really feel the pain of how to make something run really fast. I think the positive side of this is that I have found that I'm able to learn about things so much more quickly by using these AI tools. A lot of times when I am coding with Claude, I'll be reviewing the code and I'll say, hey, Claude, I've never seen this library before. Tell me about it. Like, why did you choose it over another? And having sort of that always-there pair programmer, like, again, I think what's going to change is that people that are lazy are not going to learn. They're just going to glide by. But if you take the time and you want to learn, there's all these amazing resources. And like, Claude will help you understand what it vibed coded for you. The other thing I will say is that for learning some of these higher-level things about what makes a project go well, what is a feature that gets you product market fit versus flops, we're going to be able to take so many more shots on goal. I feel like especially sort of like system engineers or architects, it takes, you know, oftentimes like two years to like make a big change in a code base and really kind of come to terms with, was that a good architecture decision or not. And if we can collapse that time down to six months, I think engineers that are investing in their own time and trying to learn, they're going to be able to, you know, learn from four times as many lessons in the same amount of calendar time as long as they're putting in the effort to try. Yeah, going back to your pre-planning process, what's the balance between giving it too much information and too little, are you giving it a full product requirement document, is there kind of a standardized template that you put together before you actually move into vibe coding? Yeah, I think it depends a lot on what you care about. I would say that if it ranges further for things where I don't really care how it does it, I won't talk at all about the implementation details. I'll just say these are my requirements, like this is what I want at the end. There's other times where I know the code base well and I will go into much more depth of like, hey, these are the classes you should use to implement this logic. Look at this example of a similar feature. I'd say it all comes down to sort of what you care about at the end of the day. I would say though that like our models do best when you don't over-constraint them. So, you know, if you, I wouldn't put too much effort into creating sort of a very rigorous, you know, format or anything. I would just, you know, think about it as like a junior engineer, what you would give them in order to succeed. So, sorry about what you're saying. How did you balance effectiveness and cyber security? Like there were reports a couple months back of like the top 10 vibe coded apps being super vulnerable and a lot of important information was released. Well, not released but proven to be released. And the person who did it wasn't even like a pro hacker and stuff. And so, like there's that how did you balance being able to keep things secure even at a leaf node level and then also being effective because something can be effective but not secure? Yeah, that's a great question. And I think that all comes down to this first point here of like being Claude's PM and understanding enough about the context to basically know what is dangerous, know what's safe, and know where you should be careful. And I think yeah, the things that get a lot of press about vibe coding are people that have no business coding at all doing these things. And that's fine. It's great for games, that's great for like creativity and like having people be able to create. But I think for production systems, you need to know enough about like what questions to ask to guide Claude in the right direction. And for our internal case of this example, it was something that's fully offline. And so, we knew there weren't any, like there were, we were very, very confident that there was like no security problems that could happen into this. In our case, it's like run in something that's fully offline. So this is more about people you're mentioning as like have no business. And maybe I shouldn't have said like that. No business vibe coding in production for an important system. I will say that. Yeah. But if we look at the numbers, right, we less than 0.5% of the world's population are software developers. And software is an amazing way to scale ideas. So how do you think the products need to change? To make it easier for people to vibe code and build software while also avoiding some of the things that we run into with people leaking API keys and things like that? That's really a great question. And I would be super excited to see more products and frameworks emerge that are kind of like provably correct. And maybe what I mean by that is I'm sure people could build some backend systems that the important off parts, the payment parts are built for you. And all you have to do is fill in the UI layer. And you can vibe code that. And it basically gives you some nice fill in the blank sandboxes where to put your code. I feel like there's tons of things like that that could exist. Maybe the simplest example is like clawed artifacts, where clawed can help you write code that gets hosted right there in clawed AI to display. And of course that is safe because it is very limited. There is no auth, there is no payments, it's front end only. But maybe that's a good product idea that someone should do here is build some way to make a provably correct hosting system that can have a backend that you know is safe, no matter what shenanigans happens on the front end. But I hope people build good tools that are compliments to vibe coding. So for test driven development do you have any tips? Because like I often see that claw just spits out the entire implementation and then writes test cases. Sometimes they fail. And then I just want, you know, I'm trying to prompt it to write the test cases first. But I also don't want to like, you know, verify them by myself because I haven't seen them in the implementation yet. So do you have an iteratable approach that, you know, have you ever tried it for test driven development? Yeah, yeah. I definitely, test driven development is very, very useful in vibe coding. As long as you can understand what the test cases are. Even without that it helps clawed sort of be a little bit more self-consistent. Even if you yourself don't look at the test. But a lot of times I'd say it's easy for clawed to go down a rabbit hole of writing tests that are like two implementation specific. When I'm trying to do this a lot of times I will encourage, I will give clawed examples of like, hey, just write three end to end tests and, you know, do the happy path and error case and this other error case. And I'm kind of like very prescriptive about that I want the test to be like general and end to end. I think that helps make sure it's something that I can understand and it's something that clawed can do without getting two in the weeds. I'll also say a lot of times when I'm vibe coding the only part of the code or at least the first part of the code that I'll read is the tests to make sure that, you know, if I agree with the tests and the tests pass, then I feel pretty good about the code. That works best if you can encourage clawed to write sort of very minimalist end to end tests. I think that's good for the very fascinating talk. I also appreciate that you've done what a lot of people haven't done and tried to interpret one of the more peculiar lines in Carpathee's original post embrace exponentials. So I wonder if I could pin you down a little more and say, how would I know if I've embraced the exponentials? What precisely means following that advice and to maybe put down a little more in what I think it intends to mean. Maybe it leads to the models will get better. Do you think there's some legitimacy in saying just the fact that the models will get better, doesn't mean they'll get better at every conceivable dimension we might be imagining we hope they'll be in. So yeah, to how do I embrace an exponentials? Yeah, absolutely. So I think you got close with sort of the quote of, keep assuming the models were going to get better. But it's a step beyond that. The idea, the exponential, is not just that they're going to keep getting better, but they're going to get better faster than we can possibly imagine. And that's kind of like when you, you can kind of see the shape of the dots here. It's, it's not just that it's getting steadily better. It's that it's getting better and then it's, it goes wild. I think the other funny quote I heard from this, this was a, I think in a Dario and Mike Krieger's talk is machines of loving grace is not science fiction. It's a product road map. Even though it sounds like something that's very far out, like when you are on an exponential, things get wild very, very fast and faster than you expect. And I think, you know, if you, if you talk to someone that was doing computers in the 90s, it's like, okay, great, we have a couple kilobytes of RAM. We have a couple more kilobytes of RAM. But if you fast forward to where we are now, it's like we have terabytes. And it's like, it's not just that it got twice as good. It's that things got millions of times better. And that's what happens with exponentials over a course of 20 years. So we shouldn't think about 20 years. So now it's like, what happens if these models are twice as good? We should think about what happens if these models are a million times smarter and faster than they are today, which is wild. Like I, we can't even think about what that means. In the same way that someone working on computers in the 90s, I don't think they could think about what would happen to society if a computer was a million times faster than what they were working with. But that's what happened. And so that's what we mean by the exponential, is it's going to go bonkers? Yes. I got a couple, well I got one question, but it's kind of two parts. The first part, when it comes to biocoding, I have like two different workflows. I have one where I'm in my terminal, and then I have one when I'm in VS code, or cursor. Which workflow do you use? And if you're using Claude Code in the terminal, how often do you compact? Because what I find is my functions will get a new name as the long ride vibe code. Or just things kind of go off the rails, and if I compact, it still happens. If I create a document to kind of guide it, I still have to get it back on track. Yeah, great question. I do both. I often code with Claude Code opened in my terminal in VS code. And I'd say that like Claude Code is doing most of the editing, and I'm kind of reviewing the code as I go in VS code. Which is not true vibe coding in the sense here, or maybe I'm reviewing just the tests from it. I like to compact or just start a new session. Kind of whenever I get Claude to a good stopping point, where it kind of feels like, okay, as a human programmer, when would I kind of stop and take a break and maybe go get lunch and then come back? If I feel like I'm at that kind of stage, that's like a good time to compact. So maybe I'll start off with having Claude find all the relevant files and make a plan. And then I'll say, okay, like, you know, write all this into a document and then I'll compact. And that gets rid of 100K tokens that we took to create that plan and find all these files and boils it down to a few thousand tokens. Hey, so one question is following up his previous question, which is, have you used other tools along with Claude Code to increase your speed a little bit more, like running multiple Claude codes together using get work trees and then like sort of merging few things or stack PR or something like that. Is that something that you personally follow or would advise to? Second question is, how do you very structurally and in a very nice engineering approach, a part of the code base that you're not very familiar with, but you want to like ship a PR in it really fast and you want to do it in a really nice way and not wipe code it. So yeah, like what are the, what are your ways of like using Claude Code to help do both these things? Yep. So I definitely use Claude Code as well as cursor. And I'd say typically all like start things with Claude Code and then I'll use cursor to fix things up or if I was like, if I have very specific changes that I know exactly the change that I wanted to do to this file, I'll just do it myself with cursor and sort of target the exact lines that I know need to change. The second part of your question was, oh yeah, like how to get spun up on a new part of the code base. Before I start trying to write the feature, I use Claude Code to help me explore the code base. So I might say like, tell me where in this code base off happens or you know where in this code base something happens. Tell me similar features to this and like, have it tell me the file names, have it tell me the classes that I should look at. And then kind of use that to try to build up a mental picture to make sure that I can do this and not vibe code, make sure I can still get like a good sense of what's happening. And then I go work on the feature with Claude. Thank you so much. I'll be east around and can Miller and answer other questions.