Agentic Engineering: Working With AI, Not Just Using It

The rapid evolution of AI in software engineering has shifted from simple autocomplete to autonomous agents that can execute tasks and create pull requests, requiring a new "agentic engineering" approach.
Effective collaboration with AI agents means treating them as highly capable but context-agnostic junior developers, demanding human judgment and clear direction.
Mastering "context engineering" and adopting a "research-plan-implement" workflow are crucial for leveraging AI agents efficiently, preventing wasted effort, and maximizing productivity gains.

AI coding agents are best viewed as "energetic, enthusiastic, extremely well-read, often confidently wrong junior developers" who excel at breadth of knowledge but lack judgment and business context.
Context Engineering is the art and science of providing an AI agent with precisely the right information in its context window for each step of a task.
More context does not always mean better results; too much context can make the model "dumber" (especially over 50% full) and bad context can corrupt outputs.
To manage context effectively: persist information outside the context window (e.g., scratch pads, agents.md), be highly selective about what's included, summarize/trim old context, and isolate tasks into separate agent sessions or parallel agents.
Adopt the Research-Plan-Implement (RPI) workflow to guide AI agents:
1. Research: Understand the problem, system, and edge cases. Use "ask mode" to generate a detailed research document.
2. Plan: Outline explicit steps, file changes, verification methods, and scope. Create a clear plan file.
3. Implement: Start a new session with only the plan, allowing the agent to execute code. Review changes frequently and commit incrementally.
Configure agents using specialized modes (e.g., ask, code, architect) for different tasks, define project-specific rules in an agents.md file, and create reusable skills for common workflows.
Be judicious with Model Context Protocol (MCP) servers; disable any that are not actively being used, as they consume tokens, incur cost, and can introduce irrelevant context.

Agentic engineering — A paradigm for working with AI agents as collaborators, involving directing, managing, and reviewing their autonomous work. AI agent — An AI system capable of taking a task, breaking it down, executing steps, and making changes (e.g., writing code, running tests). Context window — The limited amount of information (tokens) an AI model can process and retain in a single interaction or session. Tokens — The basic units of text (words or sub-words) that AI models use to process and generate language; context length and cost are often measured in tokens. Context engineering — The practice of strategically selecting, curating, and presenting information within an AI agent's context window to optimize its performance and output. Model Context Protocol (MCP) — A framework that allows AI agents to interact with external tools and APIs, expanding their capabilities beyond their base knowledge. Agents.md — A file (often in a repository) that serves as a README for AI agents, providing always-on project rules, conventions, and build/test requirements. Skills — Reusable, on-demand playbooks or specific workflows that an AI agent can execute for a given task, like compiling a change log. Research-Plan-Implement (RPI) workflow — A structured approach to using AI agents where the problem is first researched, then a detailed plan is created, and only then is the implementation (coding) phase begun. Ask mode — An agent mode focused solely on research and understanding, designed to chat and learn about a system without attempting to write code.

Let's talk a little bit about what I mean by agentic engineering. And let's maybe start with a question. If I were to ask you right now, how are you using AI in your work? Could you actually really explain it? Not just, you know, it helps me code faster. It can write code really fast. But like the real workflow, what you hand off, what you keep, how you decide in between. Most engineers can't, and that's a little wild to me because 90% of engineers are already using AI tools or have used them. Maybe only half of them are using them on a regular basis, but that's a number that's definitely growing all the time. And that's the current state. So the question isn't whether your team is using AI. They are. The question is whether you're getting the most out of it, or you're just kind of auto completing your way through the day. That gap between using AI and being able to articulate how you work with it. That's what this talk is all about. And really, I think it represents a paradigm shift of how we think about AI. And you know, the history of AI and software engineering is moving very fast. It's also very surprisingly short, right? In the early 2020s, we got tools that could finish the lines for you. You type, you know, half of a function signature and the model would guess the rest of it. You know, kind of like auto complete on steroids. It's a neat trick. And then in 2022, model started to be able to suggest entire functions, right? You could describe what you wanted and chat with a model and maybe get a working implementation back. And this is where GitHub co-pilot first came on the scene and broke through. And millions of developers started using it. And for the first time, it was starting to seem like maybe AI wasn't a novelty. Maybe it was generally useful. But then in 2025, something really broke. You know, what we're living in now in 2026, the models don't just suggest they can execute. They can take a task and break it down and figure out which files need to be touched and make the changes and run the test themselves and then come back with an actual pull request. And so that's not just fancy auto complete. It's not just a faster horse. It's a collaborator. It's a different way of working. And Armand, the creator of Flask for those Python folks here put it, I think perfectly. We're no longer just using machines. We're now working with them. And that framing, I think, captures this real shift, right? Tools are things that you pick up and put down. You use a hammer. You don't work with a hammer. But the AI coding agents we have today, they're kind of somewhere more in between. And there may be a little bit more like working with another engineer. Now it just happens to be an engineer who's read every stack overflow answer ever written. And I think that needs a mental model shift. And this is the mental model I want you to carry through the rest of this video. And honestly, through the rest of your next couple of years of your career in working with these tools, I do think there's still tools, but we have to think about them differently. You kind of have to think about your AI agent as an energetic, enthusiastic, extremely well read, often confidently wrong junior developer. That junior developer is incredibly fast. They don't easily get tired. They don't have any ego about their code. They'll happily rewrite something six times if you ask them to. And they have an astonishing breath of knowledge. They've seen lots of languages. They've seen lots of frameworks. They've seen lots of patterns. But, and this is critical, what they don't have is judgment. They don't know your business context. They don't understand the reasons why you made that very specific architectural decision three months ago. And they'll confidently write code that is technically correct and contextually wrong. Armand also said that he's gained more than 30% of time in his day because the machine is doing a lot of the work. That's a real gain. But he's getting that 30% piece he knows what he can hand off and what he has to keep for himself. He's not just blindly accepting every suggestion. He's directing the work. And that's the difference between using AI and working with AI. And that's what agentic engineering actually means. And so let's get tactical. If you're an engineer, how do we really get good at this? I think the number one thing to think about is context engineering. And here Carpathi says context engineering is a delicate art and science of filling the context window with just what needs to happen for the agent to have the right context for the right iteration for the next step. And I think that's really critical for a couple of reasons. First context is expensive, right? Every token you add into the context is going to add costs because all of those things that whole chat history is sent back in as an input tokens every time that you send it. And that can add up pretty quickly. And the other key is that more context doesn't always mean better results. And in fact, it can make the model actually dumber. Right? It's not just about the money. The quality came to grade as you get over about 50% full. And there's lots of things that can trap you here. And not the least of which are, you know, the facts that fact that MCP servers became so popular that we have a lot of these enabled all the time now. Well, each one of those loads more and more context, you know, more and more input tokens than the context. And that can be a real problem if you start to get into this dumb zone around 50% context. And that also isn't the only problem because not only can more context be a problem, but bad context can be a problem and can poison everything. Right? So this happens when you're maybe mixing two different tasks that didn't really overlap or you've kind of got some outdated comments either in the code or that you've made it an agent. Or even worse, what I've seen a lot of people do is they start walking down the road with an agent and then realize, hey, we're down the wrong path. We've made a lot of wrong decisions and they try to steer the agent back. But the problem is again, the agent is not doing real reasoning like you and I as a human. Right? It's taking all that context every time. And it may get lost in the middle or even see some of those negative things that you had before as still part of the context. And you see those negative patterns creeping back in if you're not careful. That's why it's better, you know, to not let these things kind of compound. But also, you know, always start a new session once you realize things are kind of off the rails. Right? Because not only is context expensive. The more we have doesn't always mean better quality. In fact, at a certain point, there's a tipping point where it means worse quality. And bad context can corrupt the output. So the real critical thing for engineers is to manage the context. And what does that mean? Well, one, I think it means persisting a lot of information outside of the context window so that we can bring it in. Right? So this is things like scratch pads for things we're working on memory files, the agents MD, those kinds of files that help the agents have context to what you're working on. We also need to be very selective when we're selecting that context. So that means only pulling what's relevant for this step of the problem. Right? Don't just pull in everything that might be useful. And so that could mean, you know, things like bringing in the right at mentions for files that were referencing that could mean making sure we don't have unnecessary MCP servers enabled. And it means, you know, making sure that the agent has the right data and that we as a human have curated that data for the agent. And then as it's getting bigger and that window gets bigger, we want to summarize and trim and compress that context. Right? If we've gone through a whole big deep dive and debugging session with the agent and now we think we have the problem and the solution, well, that's great. It might be time to compress that context and just focus the agent back in on, okay, now we understand this problem. We're going to go fix it. And then the other most important thing is the isolate context. And I think this is why we've seen this huge rise in the past six or eight months of parallel agents because splitting work across several agents or several sessions can help things not accumulate and really drive this kind of task separation. And again, if you think about it, aren't these all of the same things that I would tell a brand new engineering manager about about managing a junior engineer. Like the story I tell here is when I was early in my career, I spent a lot of time as an engineering manager and product manager before I went into the dark arts of developer relations. And in my first job ever as an engineer manager, I was at a health care software company. And there was this new thing coming out called an iPad and that dates me a little bit. But it was released in the market and we thought this could be a great place to collect patient history. You know, that for me, I have to fill out every year at the doctor. It's very critical to assessing a lot of your risk of disease. But having to fill out from scratch every time is not fun. And so I designed in this other archaic tool that some people may have heard of called balsamic, basically a wire framing tool, a wire frame of what this would look like. Now that wire framing tool used things like comic sands and like silly smiley face icons as placeholders. And a lot of other stuff like that that you'd expect from just a wire frame. And I handed that to a set of interns that we had working for us that summer thinking this is a great greenfield project for them to take some time on. And you know, a few weeks later, I got back a working prototype. And the font was comic sands and there were silly emoji placeholders. And that's because that's what the spec had in it. And so so whose fault was that obviously it was not the interns fault. It was my fault as an engineering manager for not giving the right context to those junior engineers as to what's important what's not and what we really need to focus on and what problem we're solving. And so I think the habits that can tie all of that together are you don't need to think about all four of those things for every task. You just need to think about doing one task per session. Keep an eye on your context meter. And if you're in doubt and it feels like things are off the route rails you're probably right. So start a new session ask it to summarize the session for a new agent turns out that a is really great at writing prompts for AI. So if you worked on something with an agent for a while have that agent summarize where you're at. You can now read it. Make sure it matches with your understanding and then start a new session with just that right context again. It's a little bit of art and a little bit of science. So how do we put this into practice. Well, I think there's a lot of workflows. There's lots of things written out there that you can read. I've even compiled a lot of them at path. kilo.ai. It's a where you can find like all of these kinds of trends and ideas and workflow patterns that have been talked about. But when I think I keep coming back to is maybe one of the simpler ones. And that's the research plan implement loop. Right. And I think this really helps us solve for a lot of like classic mistakes that people do when they pick up agentic engineering for the first time or pick up AI to help them try to do some engineering. And what most people do is say, hey, help me implement this feature. I wanted to do X and Y. And you know, these large language models are very good at outputting lots of code. In fact, when I joined kilo code over a year ago, I made a pronouncement that we would never have our website be just prompt and a whole lot of code flying by. Mix for a great demo and you've seen lots and lots of coding agents that maybe that's how they show it off. But I think the reality is jumping straight into code like that can cause a lot of wrong assumptions. It can waste even more time rather than saving time and just create a lot of frustration. And it really creates that kind of paradigm that we've seen where people are kind of anti AI or think that AI is not a useful tool because they've jumped right in and gotten, you know, put garbage in and gotten garbage out. Or maybe it's been a while since they've used it, right? I mean, I think of the Will Smith eans spaghetti when it comes to AI videos that's come a long way in just the past two, three, four years. You know, the same is true of the AI coding models, but you have to do what works to give them the best chance at getting a great result. And what that is is first understanding the problem really well and making sure you and the AI agent can understand the problem really well. And then laying out explicit steps for implementing that those changes are fixing that problem. And only then do we jump to the implementation phase where we're writing code. And Dexhorthy has a great phrase that he says here, which is a bad line of research can potentially be hundreds of lines of bad code. So we're really going to focus in on how do we get the research and the plan in place in order to make give ourselves the best chance of having great code come out. So in that first phase, we're going to use a tool that is only going to be focused in on research. And so for Kila, we call that ask mode. And the reason we call that is because the ask mode can't actually do anything. It can only chat. It can't write files. It can maybe read files if you let it. But it can't start trying to code a solution. And so instead of trying to code a solution from the beginning, we're going to first try to understand the system. How does it actually work today? Where are the right files that are going to be involved? What are the right paradigms that we want to mirror? Or how does this differ from something that we have already? And you know, just kind of learn where in the code base this is going to go. And you know how the data is going to flow through the system and how it's going to change with our change. As well as like any edge cases we can need to consider right a is really great at brainstorming. And so it can help you kind of brainstorm those things and make sure you've really covered all of your bases. And then once you're done that research, what's going to come out of that is an actual output document that shows the details of that research. And so that you can then read and basically agree with and understand, hey, this this matches my understanding of the problem. I think we're ready to move on to the plan. And so then once we've reviewed that as a human. Now we can say, OK, let's outline the next steps. What kind of you know files are we're going to create or change? Maybe there's some code snippets, but not always a good idea to have a code snippet in the plan. So we're going to include like how is how are we going to verify know this changes correct. What are the test either changes or additions that we're going to make to know that. And we're also going to be really explicit at the painting phase about what is in and out of scope. What is going to change. What isn't going to change. And again, the output of that is going to be a very clear plan file, right? You'll see a lot of repositories nowadays have a folder called plans. Right, and we want to have that plan file be step by step, instructions with specific changes that we're going to make that have test commands to verify it that has a strategy for understanding how it's going to change the system. And it's going to be very clear so that we can even use maybe a smaller faster or cheaper model to implement it because we've spent the time in the research and plan phase to really understand what we're going to be doing once we get to implementing the change. And when we come to implementing the change, we now can start over a new session and give it just the plan execution. It allows us to keep the context in that session very low. It allows us to carefully review each change and I think commit very frequently. Now, you still work out a company called get lab for many, many years. So maybe I'm a little biased towards get, but I think get can be a huge helper here. When it comes to helping you slowly iterate and understand the changes that the agents are making, I treat get on my local machine kind of like my own first pull request review with my agents before I maybe put up an actual pull requests for my, you know, for my colleagues to look at. But I think again it's critical to understand here that human research at the planning or sorry human time at the planning and research phases is really the highest highest leverage use of your time. By the time you're implementing, you want to have all that hard thinking done. And that's really critical because again going back to Dexworth who's spoken a lot on the subject and I highly recommend you check out his videos of him on YouTube talking about this. He says very aptly that I can't replace thinking it can only amplify the thinking you've done or the lack of thinking you haven't done or the fact that you haven't thought it through. And so let's talk about how we configure our agents kind of like one more step down from this, this paradigm of research plan implement to really make sure we do this. So first we talked about modes and customizations we already talked about these modes ask code architect these modes that are specialized and focused on the thing that we're trying to get done right architect is maybe for planning ask mode is for research code mode is for actually implementing. Then we also want to have you know a set of rules that makes sense for our workspace right for the repository were in. Or maybe globally on our machine so that we understand you know that we have a certain set of rules that we always want to adhere to. And agents are pretty good at loading in and understanding those rules. But we have to have them written down for them to have those in their context right. And so I think a lot of the agent behavior then is are things that we want to tweak as we're learning right. How many do we want to do multiple agents at a time do we want those agents to use work trees so that we can then again merge them back in. Tari local repository locally before committing them to to a poll request. How much do we want to auto approve right so most agents have the ability to tune you know what are the things that can do independently what are the tools it can use independently can it read files can it files inside or outside of the workspace. Can it run tests you know what can the agent do autonomously without your intervention versus what do you need to approve. And I think this is something that you have to set up to be comfortable with in the beginning and then also you need to be comfortable changing as you learn how to use these tools. And I think a good mental model for this agent configuration is maybe kind of three distinct buckets right we talked about modes right this is that that role based configuration you know the behavior of an agent that we want. But there's two other really key things and that is the agents. MD and then skills. MD that you hear about. And so what are those what's the difference between the two well the agents that MD is now quickly becoming the de facto standard for where all agents go kind of for their read me for the like always on rules and details about the project. So I think it's critical that your project has an agent's MD with a minimal amount of information that an agent needs to know about you know what are the conventions that we're using what are the commands that we're using to get it built or tested and like what are the requirements around testing. Or requirements that we need to be sure check off before committing. And then skills are kind of more of a specific workflow right so there's reusable kind of playbooks for agents so if there's something that you're doing a lot you're making motion graphics with their emotion often or you're you know doing some sort of like daily or weekly or monthly change log compiling those kinds of things. Those kinds of things are great to put in as skills that an agent can then pick up when it needs it to do those specific kinds of workflows. And so typically those are on demand and you say hey let's use this skill for this task versus the agents is almost always loaded into the context for the agent so it knows what's going on. And then of course I work at kilo code and so I've got some power user tips there but I think some of these many of these apply you know regardless of which agent you're using but I think they're critical as you kind of get comfortable with those first kinds of paradigms. How do I now customize this and make it work for me and one is at mentioning for context so mentioning files or commits or you know things from the terminal that output. Those kinds of things and bringing them into the context quickly are really helpful using slash commands to do things like starting a new task when we need to or condensing the context when it's getting to full those kind of quick commands can help us move a lot faster. We also kind of were working in in VS code with kilo code we can select a section of code and right click and say add to kilo code and then that context is brought right in there and I can then talk or ask or questions about those that. Code or ask the agent to change a certain part about that code and then of course we have autocomplete built in as well which I think is still useful especially because we also have it not just in code but as you're prompting. And then kind of beyond the IDE I think we're seeing you know also this shift this year in you know where else do I want to be able to use this in the CLI from my mobile phone in a Claude agent directly in Slack. Right the ability to kind of use these agents wherever you are is something that's becoming more expected of everyone and everyone's agents and I think that's a good thing I think that means that we're starting to learn how we can use this these agents again more like a collaborator that's everywhere that we need to be. And then one other thing that I want to talk about are is getting other context things in first of all model context protocol right context is right in the name. The idea of this is you know fundamentally these models originally can only like it receiving put tokens and create output tokens right and slowly but surely we've been enabling them to use tools where they can you know make tool calls out and affect things in the environment like running tests. The MCP the concept of MCP basically expands this to say hey I want to give the agent other tools right for instance the GitHub MCP gives the agent a lot of tools to interact with the GitHub API look up pull requests look up comments look up issues and understand a lot more about your your GitHub environment right. Or context seven helps it look for up to date framework documentation because of course as you know the LMS kind of have a cutoff date where their knowledge cuts off and then anything that's improved since then they don't know about. So these MCP servers can be very helpful and there's there's thousands of them out there. But the concern is that every one of them is going to add at least some information right details about those tools that it has to the system prompt that gets sent every time you're having an interaction with an agent and so you want to make sure if you're not actually using that to disable it right let's say I have a Postgres MCP that connects to my database. And I'm doing a whole bunch of front of work that doesn't involve the database at all well that Postgres MCP is just going to be wasted tokens and maybe even worse tokens that help. You know kind of confuse the agent and and not understand that it's not supposed to touch the database right now. So we want to be really careful to not like overuse MCPs. And then another thing we hear from enterprises a lot is how do we work with internal platform APIs. And I think that you know there's kind of four different ways of doing that one if there's already an open AI open API spec for it or swagger spec use that. If there's not then converted to markdown so that you can save that markdown you know in the agents.md or somewhere else in the repository to reference it. And if it's something that changes a little more frequently maybe you do need to have like a reference URL that you can pull in and have the agent go pull every time to see the latest and greatest. And then we've seen some customers who you know have complex multi-step multi-system workflows where building their own MCP server might be the right choice. But you know one way or another I think the key is to when working alongside Kilo or any of these agents you know isolate your work from the agents work and then review that agents work as a pull request right that helps you understand you know how can I. And then we've got the best review the code just like I would review a junior engineers code. And so that's really the presentation that I have on Kilo we've got some exciting new features coming up we've got you know expanded across all these surfaces. We also have a big focus on open claw and kilo claw and making a very safe way to use open claw agents. And taking a look at Kilo I've just little plug at the end here visit kilo.ai and we'd love to get your feedback on what we're building. And you know just kind of to give you you know where we go from here again I think you've kind of got to pick a tool and get lots of reps right. We said earlier on that you know it's part art and part science. And I think that just means you need a lot of reps right to kind of get the feel for what can I trust the models to do and what can I trust the models to do. And then try this research plan implement feedback loop see how that works for you. And I think maybe you'll end up like some of these other senior engineers who have said hey look I'm having more fun programming now than I've had in years and years as we you know far out some of this tedious work to agents and let our brains work on the harder engineering problems.

Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary

TL;DR

Takeaways

Vocabulary

Transcript