The future of agentic coding with Claude Code

Software engineering is rapidly shifting from direct text manipulation to agent-driven development, where AI models actively write and modify code within the developer's workflow.
The effectiveness of AI coding agents like Claude Code relies heavily on both continuous improvements to underlying models and robust "harnesses" that manage context, provide tools, and enable seamless user interaction.
To adapt, engineers should focus on high-level problem-solving and creativity, leveraging agents to quickly prototype ideas, and strategically using them for tasks ranging from codebase research to full code generation based on complexity.

Agent-Driven Development: The industry has transitioned significantly in the past year, with AI agents moving from gimmicks to being integral to the "inner loop" of development, actively writing code rather than just assisting.
Harnessing AI Models: The success of AI coding tools like Claude Code depends on a sophisticated "harness" that complements the raw model by providing system prompts, context management, tools, settings, and permissions.
Organic Model Co-evolution: AI models and tools like Claude Code improve in tandem because the model builders themselves use the product daily, identifying limitations that then inform subsequent model training and feature development.
Feedback-Loop Driven Iteration: A highly responsive internal feedback channel, where user issues are addressed and communicated quickly, is critical for rapid iteration and improvement of AI-powered developer tools.
Extensible Workflows: Claude Code is designed for hackability, offering various extension points such as CLAUDE.md for repo-specific context, a hooks system, MCP for external context, and customizable slash commands and subagents for tailored workflows.
Future of Engineering Work: Engineers will increasingly guide AI agents to achieve higher-level goals (e.g., building an entire application), shifting focus from minute code details to architectural design, reviewing agent-generated code, and proactively managing projects.
Onboarding Best Practice: For new users, start by asking Claude Code questions about the codebase (e.g., "how to add a logger?") to familiarize yourself with its research capabilities before attempting direct code generation.
Strategic Task Application: Categorize tasks by difficulty: use Claude Code for easy tasks (full PR generation), initiate "plan mode" for medium tasks (collaborating on a plan), and treat it as a pairing tool for hard tasks (human in the driver's seat).

Agent — In AI, a program or system designed to perceive its environment and take actions that maximize its chances of achieving specific goals, often involving complex, multi-step operations in a domain like coding. Harness — A system or framework (like Claude Code) that wraps around an AI model, providing context management, tools, system prompts, and other interfaces to guide and control the model's behavior, similar to scaffolding. Context management — The process of providing and maintaining relevant information (code, documentation, chat history, logs, etc.) for an AI model to understand the current task and environment. System prompt — Initial instructions or guidelines given to an AI model at the beginning of a conversation or task, defining its role, constraints, and general behavior. MCP — (Multi-Context Protocol) A mechanism for pulling in additional, diverse contextual information for an AI model to use, enhancing its understanding and capability. Slash commands — Custom commands (often starting with /) used within an AI agent or chat interface to trigger specific, predefined workflows, often defined by reusable scripts or markdown files. Subagents — Specialized AI agents that function similarly to slash commands but operate with a forked or separate context window, allowing for more focused or isolated sub-tasks. Dogfooding — The practice of an organization using its own products or services internally to test and improve them before release. Evals — Short for "evaluations," referring to benchmarks, tests, or metrics used to assess the performance, capabilities, or quality of an AI model or system. Plan mode — A specific interaction mode in Claude Code where the user and the AI agent collaborate to define a step-by-step plan for a coding task before the agent attempts execution.

I think back to when I first started learning coding, I was the kid that sat in the back of math class in middle school and I had my little TI 83 plus calculator. And we just programmed it with basic. Because at some point I realized that I can actually program the answers of the math test into the calculator. I don't know. Hey, I'm Alex. I lead Claude relations here at Anthropic. Today, we're going to be talking about Claude Code in the future of software engineering. And I'm joined by my colleague, Boris. I'm a member of technical staff here at Anthropic and creator of Claude Code. A lot has happened in the past 12 months and things are moving very, very fast, especially in the coding domain. For folks that maybe aren't following the news every single day or even staying on top of the latest and I have trouble myself sometimes, you kind of catch us up here on what's happened and where are we standing currently? Yeah, a year ago coding was totally different than what it is today. A year ago, if you want to write code, you have IDE. You have some sort of autocomplete in the IDE. And then there's some sort of chat app and you might copy and paste code back and forth a little bit. And that was the state of the art. That was AI in coding. And I think maybe sometime around a year ago, we started to see agents appear as a thing that people earnestly use in coding. It's like a part of the work flow. It's not like a gimmick or prototype. It's actually part of the inner loop when you're doing dev. And I think this is the thing that's changed the most in the last year is now when you code use agent, you don't directly manipulate text in IDE anymore. It's not just about tab, it's about the model writing code for you. And I think what we've started to see is the shift from directly manipulating text to having the model do the text manipulation for you. And I think projecting it out, this is sort of the trajectory that we're on, is this continuing into the future? I see. So we've gone from all being within a web app where you're copy-based and you can code out and you're making very targeted at its almost to just being a lot more hands off and telling an agent what you wanted to do and then trusting it to go make tons of edits and create whole apps sometimes even by itself. Yeah, exactly. And this was something that I think the reason we couldn't do it a year ago. And people have tried to make AI do coding for the longest time and to just automate more and more of coding in various ways. And it hasn't really worked, I think, probably for a couple reasons. One is the models weren't really good enough. And the second one is that this scaffolding, the thing on top of the model wasn't good enough. And when we initially launched Claude Code, the very, very first versions late last year, I think this was still using Sonnet 3.5. This wasn't even 3.6. Whatever we call this thing, the new Sonnet 3.5. Yeah, upgraded Sonnet. Yeah, it wasn't even this. And it like sort of worked. I used it for maybe 10% of my code or something like that. But even then, I remember when we launched it, we gave it to the core team. And it was just me and a few other people on the team at the time. And I remember walking in one morning and kind of on the way to my desk, there was a few engineers sitting there and one of them was Robert and there was a couple other engineers. And I just walked in and I saw Claude Code on their screen the first time. And like I just gave this to them that they were for it. They're already using it. And it was just the craziest thing. And the model wasn't very good. The harness wasn't very good. But even in this early version, it was already a little bit useful. And I think that over the last year, what's happened is the model has gotten way better at agent decoding. And that's happened with like 3.7 and now 4.0 and Opus 401. And the harness has also gotten a lot better. And obviously the harness is Claude Code because the way interact with the model, you can't just directly use the model. You have to use a harness. It's sort of like, if you're writing a horse, you need some sort of saddle and that saddle makes a giant difference when you're writing a horse. I'm not a horse writer. I like that analogy though. I mean, it is kind of like Claude is the horse. And as the engineer, you're trying to get it to go in a certain direction and you're trying to guide it and you need some sort of scaffolding around it to be able to steer it correctly. And the harness in this case, to store on the same page, is everything from the tools we're giving it to how we handle the context and everything for the model. Exactly, exactly. It's like all of Claude Code. The model is the thing behind the API. And then Claude Code is the system prompt, it's context management, it's tools, it's the ability to plug in MCP servers, settings, permissions, all this kind of stuff, all of this interfaces with the model. And the model sees all the context, all the output from the stuff and it makes a giant difference in the way that it performs. And I think over the last year we learned how exactly we build for the model. And the model has kind of co-evolved with not just Claude Code, but all these different products that are using anthropic models to build agent-acoding tools. Maybe let's speak more on that. When you say co-evolved, is that because it's like a deliberate thing in which we're doing with the training or how is the model also getting better at these sorts of things as we make the product features itself better? It's pretty organic, honestly. At an anthropic, everyone uses Claude Code. That includes the researchers. And so every day the people building the models are using the model in order to do their job. And I think as part of that, you kind of see these natural limits that you hit with the model. So as an example, maybe the model's really bad at doing certain kinds of edits. And sometimes when you use Claude Code, you see like, oh, failed to replace string, failed to replace string. Like this is a model capability. And we can improve this if we weren't from it. Or another example, maybe something like higher level is if you just let the model cook for 30 minutes, with 3.5, it could kind of do it for a little bit. Maybe for a minute or something, it would stay on track. And then with newer models, it kind of gets longer and longer. This kind of this amount of time the model can operate autonomously. And I think this is really based on experience. Because you use the model. You kind of see where as a human, you have the course correct and steer it. And then we've learned from that. And we can kind of incorporate that into the model and teach it better to do this itself. When you're evaluating a new model, do you kind of have a vibe check set of tests that you run? Or if it's like a new feature that we're rolling out to make something better in the harness, how do you personally evaluate if the performance is getting better? I just do my work that day. Interesting. Yeah. Like, my perfect day is I'm just coding all day. And whatever the model is, whatever is the new thing we're testing, I'll just code using that and see what the pipe is. There's not like a specific thing I do. Right. You just see how does it actually work for me in my day to day? Yeah, exactly. And in day to day work, you do all sorts of stuff. You're writing new code, you're maybe fixing bugs, you're reading sphack messages or GitHub issues to respond to feedback. And I think more and more of the model is able to do more and more of this. So actually, in a way, if you had maybe one thing that you always used the model for, you would miss out on some of these newer capabilities, like pulling in context through MCP, like reading your sphack messages, or automatically debugging stuff, because you can pull in century logs automatically. Yeah. So the best e-val in some sense is the one that most looks like real life. And in that case, just using it gives you the best result. We tried really hard when building Claude Code to build a product e-vails. Yeah. Just like to have some sort of benchmark. When we change a system prompt or whatever is the model getting better, and we have a little bit of this, but honestly, it's just so hard to build e-vails. And by far, the biggest signal is just the vibes. Like does it feel smarter? Because there's such a broad range of tasks they use it for. Yeah, that's actually a question I hear from developers all the time is they would appreciate more guidance on how we go about prompt testing and iterating. I know for different products, we've like various sorts of e-vails that we've tried to create. But for Claude Code, it really is just kind of this tight feedback loop that almost gives us more immediate signal than any hard-coded set of e-vails. I wonder if people kind of want to hear a better answer from a... Yeah, yeah. But yeah, man, it's all vibes. Yeah. I think at this point, the models are doing so good on e-vails like sweet bench. We're just trying to find these harder e-vails. And now there's like T-bench, which is like a little bit less kind of saturated. But I think it's just really hard to find synthetic e-vails that capture all the complexity and software engineering. Right. Right. Do you think there's something we did uniquely to set up that feedback loop internally? Because I feel like Claude Code has the best dog fooding cycle I've seen of any type of product. Initially, I built it the way that I do any other product. Which is just listen to users and make it as easy as possible to listen to users. And I think one part of it is when we built Claude Code, there was just like a single feedback channel in Slack. And at any time, anyone had feedback, I would just direct them to that. Just be like, yeah, post there. And I feel like people hesitated sometimes a little bit. Because sometimes when you give feedback, you expect that no one listens. And it kind of goes into the black hole like into a void. And I think one of the things that we did really right was from the beginning, whenever someone gave feedback, I would try to fix it as fast as I can. And sometimes I would kind of go into the office and then just spend like three hours or two hours or whatever, just go through as many bugs as I can and fix them as fast as I can. And then every time comment back and tell people it's fixed. Right. And this kind of encourages them to keep giving feedback. And to this day, the Claude Code feedback channel internally is just this fire hose. Yeah. It's non-stop. Oh, totally. I remember on those early days, and still do dropping in there, posting something and immediately you're emoji reacting, or you're asking for more clarification and more questions. And you do feel like, oh, came my feedbacks being heard. And then you're able to actually be incentivized to go post more feedback in the future. Yeah, because honestly, I don't know what I'm doing. No one really knows what they're doing with AI. We're kind of discovering this thing as we build it. And the best indicator is what the users want. So you just, you got to listen. Right. Switching gears slightly. What is the current state of Claude Code as a product? What are the latest features? What are you excited about? Some things that you're seeing folks do with it right now. Claude Code from the start was built to be the simplest thing again, and to be as hackable as possible. And I think the hackability is something that we've been developing a lot. And that's something I'm really excited about. So originally, the way to hack Claude Code is adding to its CLAUDE.md. That was the original extension point. In CLAUDE.md, as this file, you can put it in the root directory. You can put it in child directories. There's kind of different places you can put it. And it's just additional context. They give Claude Code, and it kind of goes with your repo. You often check it into your code base. So it's kind of a little bit more information about the code. But over time, we've added a lot more extension points. So now there's a very sophisticated setting system and permission system. There's hooks now which Dixon built. Dixon's an engineer on our team. He just kind of saw all these different user ask coming in for, I want to extend it this way. I want to hook into this, hook into this. And so he built a super extensive hook system. MCP, obviously, this is a really great extension point. And now there's slash commands and subagents. And user defines slash commands as something we've invested in a lot. And the idea is it's just a work load. It's like a markdown file. You put it in your code, and it's something that you can reuse a lot. So for example, I have a slash command for making commands. And I have some instructions in there. Here's how you write a good kit commit. I pre-allow the Git commit bash command, so I don't have to accept it every time. And the model can just do it. So I think slash commands are really interesting. And agents are kind of a different view of slash commands. It's like a slash command, but it has a forked context window. And so you can kind of think of agents and slash commands two sides of the same thing. And this is also very exciting. It's just another way to extend Claude Code. And so when I look at the future, I think a lot of it is just about how do we extend the Claude Code more? How do we make it easier for other people to build on top? How do we make the SDK more useful for people? So it's useful for code if you want to build a coding agent. But also you can use it for other stuff. Like anything that you need an agent for, you can just use the SDK for. And I think these are the things that I'm the most excited about. And obviously all of this benefits are all the other work we're doing to make the model more autonomous, to make it work for longer periods of time, to make it better here to instructions, to make it remember things better. And so everything along the way benefits. So I'm using Claude Code or whatever form of it in six to 12 months. What does my work actually look like? Am I reviewing PR this all day? Or what is it day to day breakdown to? Yeah, I think there's going to be a mix of more hands-on coding. I don't think that's going away. And maybe it'll look different though. So maybe hands-on coding today is directly manipulating text. But in the future it might be using Claude to manipulate the text for you. And then I think there's going to be this other bucket of maybe less direct coding, where Claude proactively does something. And maybe Claude even reviewed it. And it's your job to decide if this is a change that you want or not. And I think maybe 12 or 24 months from now, we're going to start seeing Claude that's more about goals and more about these higher level things that it needs to do, and less about the specific tasks that go into it. The same way that as an engineer I think about, what is it that I want to do over the next month and that I kind of make small changes to work towards that? Maybe Claude will go through the same thing. Right. So moving up and up the stack to some degree of these abstraction levels of getting clawed to make individual changes to files, to getting clawed to make changes to a whole PR, to getting clawed to think about a goal building an app or whatever else it is. Yeah. OK, that's interesting. If I'm an engineer and I'm hearing that, it seems like there's going to be a lot changing in a very short amount of time, especially with my role in what I should be doing. What's your advice for folks out there that are looking to prepare themselves and adapt to this world about what they should be learning or what skills they should be developing? I think back to when I first started learning coding, I was the kid that sat in the back of math class in middle school and I had my little TI83 plus calculator. He was like a transparent gray one. You can kind of see their circuit. And we just program it with basic. Because at some point I realized that I can actually program the answers for the math test into the calculator. And you can get better grades that way. And there's just something about this visceral feeling of being able to hack and having this idea of maybe there's this one program I can make and just I go into my calculator and I code it. And then I can just restart and use it really quick. This kind of feedback cycle, that was really amazing. And it made it possible for me to build stuff that I never could have before. And it was just so easy to get started. And I think about the difference between that world and the world before agent coding where stacks just got way, way too complicated. If I wanted to make a JavaScript website, I had to learn about React and maybe next JS. And then three different build systems and the deploy system. And it was just so complicated. And I think one really cool thing about agents is that they're changing this. So with coding agents, it makes it really easy to get started. And if you have an idea, you can just build it. And it's a lot more about the idea now than it is about the details. Because just like Claude Code, you can rewrite the code over and over. Claude Code itself, we rewrite all the time. And I think this is just something that coding agents enable. The code itself is no longer precious. And there's still an art writing it. And I'll still code by hand sometimes. And one of the engineers on the team, Lina, she was talking about how on the weekends, she's still sometimes write C++ by hand, just because it's fun. And as a coder, it can be a really joyous thing to do this. But I think more and more, it's going to be about the thing you make and not about the process of making it as much. And I think my advice for people learning to code today is you still have to run the craft. So you still have to run a code, run languages, run compiwares, run times, how to build web apps, how to build programs, system design. You still have to know all the stuff. But also just start to get more creative. And if you have an idea for a startup or an idea for a product, you can just build it now in a way that you just couldn't before. And we don't really understand what this means. But there's just so much potential that's about to be unlocked because of it. Yeah, I love that. I think that's great advice, too. The idea has suddenly become something you can action on in a span of a few minutes almost. Whereas before it could be just in your backlog forever. Before we wrap, I want to ask you as a creator of Claude Code, what are your best practices for using Claude Code in any tips or tricks? Yeah, I think the biggest thing that I recommend, OK, maybe two tricks. So one thing I recommend is that if you're brand new to Claude Code and you haven't used it before, don't use it to write code. And I know it sounds crazy. Explain it. Explain it. But you got to stop yourself. Don't use it to write code yet. The thing to start with is use a task questions about the code base. So you can ask if I want to add a new logger, how do I do that? And then ask Claude Code to explore the code base and figure it out for you. Or why is this function designed the way that it is? Claude Code can go in and it can look through, get history, and it can answer the stuff for you. So I think ask Claude Code questions about the code base and just don't code yet. And then once you feel comfortable with using Claude Code this way, and you get comfortable with this idea of an agent that's doing this research for you, then start to use it to code. I think the second thing is when you are using Claude Code to write code, think about what kind of work do you want to do and how big is the task. So for something that's really easy, in my mind, I have these three categories, easy, medium, and hard, very roughly. And so easy tasks are something that Claude can write in one shot. Like one prompt, it'll get it pretty much right. And nowadays, I'll just go to GitHub and I'll tag at Claude on the issue and just have Claude write the PR for me. And this is how I do easy tasks. Because that frees up my terminal, I don't have to spend it on this. Medium tasks all start in the terminal and all start in plan mode. So just shift tab into plan. And I'll align on a plan with Claude first. And then once I feel good about the plan, I'll go into auto-except and I'll have it implemented. And then for really hard tasks, I'm still the one driving and Claude is more of a tool. And I'm kind of pairing with it, but really I'm the one in the driver's seat, not Claude for this. And so I'll use Claude maybe to do some code based research, maybe prototype a few ideas, maybe I'll just like vibe code a few options and to understand the boundaries of the system and what works well. But I'll still mostly implement it myself and maybe Claude will write the unit tests. But it's still mostly me doing the coding. So I think that'll be the second advice. Is just think about what's the task that you're doing and what's the right way to use Claude Code to do it. Those are great tips. Really appreciate the time for us. This has been awesome. Thank you. Yeah, thanks Alex.

The future of agentic coding with Claude Code

TL;DR

Takeaways

Vocabulary

Transcript