- The widespread adoption of AI agents in software development creates new challenges, shifting from initial productivity gains to increased pressure and a higher risk of technical debt.
- AI agents excel at generating code that "runs" but often produce brittle systems and struggle with complex product architectures due to limited context and lack of human judgment for code quality.
- To counter these issues, developers must intentionally introduce "friction" into the process by designing "agent-legible codebases" and actively engaging human critical thinking for architectural decisions, reliability, and code reviews.
The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil
- AI tools initially boost individual productivity but quickly raise baseline expectations, transforming free time into pressure to ship faster without adequate time for critical thought.
- AI agents are optimized for making progress and passing tests, which can lead to brittle code that handles failures poorly (e.g., silently loading defaults) and creates more failure conditions than human-written code.
- Human developers must resist the addictive cycle of rapid AI-driven output and actively slow down to prevent the accumulation of massive technical debt and the decline of codebase understanding.
- Design an "agent-legible codebase" through strong modularization, adhering to known patterns, pushing complexity to abstraction layers, and avoiding "hidden magic" that obscures intent from the agent.
- Implement mechanical enforcement via linting rules for practices like preventing
bare catch-allerror handling, enforcingunique function names, and using anerasable syntax only TypeScript modeto improve agent consistency and error detection. - Preserve human judgment for critical areas such as
system architecture,database migrations,permissioning changes, anddependency choices, as agents cannot responsibly make these decisions. - Embrace intentional "friction" in the engineering process, similar to
SLOs, as a positive force that necessitates human judgment, experience, and critical thinking, which are vital for creating reliable and maintainable systems.
AI engineer — A software engineer whose primary tools and learning path are deeply integrated with AI technologies.
Agent — An AI system designed to perform tasks, often by interacting with its environment (e.g., generating code, running tests).
Reinforcement Learning (RL) — A type of machine learning where an agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.
Pull request (PR) — A request to merge code changes from one branch into another, typically requiring review by other engineers.
Entropy — In software, refers to the degree of disorder, unpredictability, or inconsistency in a codebase, often increasing with unmanaged complexity.
Context window — The limited amount of text (tokens) an AI model can process or "remember" at any given time during a conversation or task.
Modularization — The process of breaking down a system into smaller, independent, and interchangeable components or modules.
ORM (Object-Relational Mapper) — A programming technique that converts data between incompatible type systems using object-oriented programming languages. It allows developers to interact with a database using objects rather than raw SQL.
Linting rules — Automated checks that analyze source code for programmatic and stylistic errors, potential bugs, and non-compliance with coding standards.
SLO (Service Level Objective) — A key metric that defines the target level of performance or availability for a service, often used to introduce intentional friction or trigger critical thinking about reliability.
Good morning. Thanks for having us. Today I want to talk with Christina about friction a little bit. This is a social preview that came up automatically when someone submitted an issue to basically a forum post that goes with a security incident that was deployed accidentally. It was a configuration change that caused the problem. And the social preview post had the marketing tagline of that company which said ship without friction. And we want to encourage to add a little bit of friction to it and I'll tell you why. So who are we? I've been doing software development for 20 years, most of it in the open source space. I have created Flask which is a Python framework which ironically is so much in the weights that a lot of people are learning about it now because the machines are producing it. And I left my previous company that worked for a century in April last year which perfectly coincided with me having time and then obviously Claude code. And so I fell deep into a whole of a gently engineering and I started writing on my blog and a lot of people reached out to me over the last year being all excited about this. And then I started with a friend in October a company called Arandel where we are trying to make sense of all the I things. And my name is Christina and I work with Armin at this company called Arandel. But importantly I am what I like to call a native AI engineer and what that basically means is that these tools have been around longer than I have. So what this means is they've been super foundational in how it become a software engineer. Not just because obviously I use them to work but also because this is the means by which I've learned to do what I do. And before Arandel I was working at Ben and Spuz. So we want to share a little bit from practice not just theory but I will readily admit that I don't think I have all the solutions. So we have been building with or on agents for good 12 months. We had huge leverage and great disappointment. And we really keep running into two types of problems. I think especially if you listen to some earlier talks at this conference you will have learned a lot about that you should keep using your brain. It's for some reason that's really really hard. So there's a psychological problem and the other one is the engineering challenge. It seems to be producing worse code for some people and better code for some other people and what is it that actually makes that work. And so this is really not a solution as it is our part of the journey of how we think so far we have managed. Yeah. So problem number one is the psychology part which is like why is it even though everybody told you many times over that you should be using your brain, you should be slowing down, it's actually incredibly hard. It's just one more problem and we don't sleep that much. What is it that actually makes it so hard? And then would it be that hard if the machines would actually be writing perfect code and we wouldn't have to think quite as much. Is there something we can do to make this a little bit better? So I'll begin by introducing the first part of these problems, the psychology problem. And what I want to talk first about is the shift. So I'm sure a lot of us here who have been playing with these tools for a while now experienced this at some point. We were prompting, prompting, not so good. And then at some point suddenly it clicked and they were really really useful for us. And it was fun in the beginning and they gave us a lot of extra time, right? Because not everyone was using them. They were actually tools that made us more productive, that made it more fun to do our jobs. Very quickly because they were so useful and they got us so hooked everyone was using them. And so this kind of had the opposite effect, where suddenly the baseline expectation was just that everyone is now using them and you have to use them. And so this fun and free time translated into pressure. Now we all have to ship faster and produce more code and it is just not sustainable to review and to actually have time to think. And so this leads us to the trap. And I actually think there's two parts of this problem of this trap. And one of them, a lot of engineers have spoken about and it's that these tools are super addictive. You never know if that next prompt is going to be the one that makes your product work and you've added a new feature or if it's going to be that last drop of slop that brings your product crashing down. And so it's very addictive. We keep doing what we're doing. It's not a great solution. But also most importantly, and I don't think we realize this as much. Is that because we produce a lot of output very fast, we are tricked into thinking that we're actually being more efficient, doing more work. And this is quite the opposite because now we don't have as much time to actually stop and think and the sign we're doing ask ourselves is the best way in which I can implement this or could I be doing something better. And when you're in this flow, it's very difficult for yourself to stop and it's definitely very difficult for your agent to stop because it's running around and it's reading files that it should have never even read. So we are the ones that need to actually have the agency to be in control here. And one thing that from a, if you start scaling this from like one person to an engineering team that actually took me quite a while to realize is that it really changes the composition of the engineering team. We were really supplied constrained by creation of code and so like the balance between writing code and reviewing code and engineering teams was usually quite with decent. Now every engineer has a multitude of producing power compared to their reviewing power. And so obviously we are piling up on pull requests. But we are also slowly starting to expand the total amount of humans in an organization that are participating in engineering process. I talked to a lot of engineers over the last year and increasingly one of the things that came up is like now have marketing people shipping code. I have former CEOs, CEOs that used to be like engineers in a shipping code again. And so the roles that those people have in the companies also doesn't give them. There is not that much responsibility that doesn't rest in them. The responsibility is still rests with the engineering team. And so the total number of entities, both humans and machines, that the participant with code creation process out numbers, the ones that can carry responsibility. We are not there where the machine can be responsible for the code changes. And so that has led to more and more code reviews being skipped, being rubber stamped. And on the goal to small PRs that we want to see again so that this reviewing process goes, this amplification is something that at the very least we need to recognize. And so when you get this pull request that looks really daunting and has 5,000 lines of code on it, this is actually when you should be thinking. And that's exactly when it's the most overwhelming. And increasingly we're tapping out of this. On the engineering side, what we're doing is we are creating larger pull requests. We're creating these massive changes because it is free now. And if you think about how the agents work, they're really optimized to creating code that runs. Like their main objective is write some code, run the tests, make some progress. The reinforcement learning sort of gets this in. And so the agents are writing kind of codes that is when you as a human, as a software engineer, start learning how to write code you wouldn't necessarily write. So for instance, you see quite a bit of code that tries to read a config file, and if it doesn't read the config, I'll load some defaults. And as an engineer, you know, that's actually not great because I might not notice that I'm reading the default config file. And so I might only discover that I have a massive problem after two hours when I already wrote database records with wrong data. And so these machines, they optimize to its making progress, to its shipping stuff, to like unblocking themselves. And as a result, they're creating many more failure conditions than human written code normally would do. And in parts, because you as a human feel a little bit of a, you feel bad when you write code like this. There's something that sort of builds up emotionally in yourself. But the agent doesn't have a reason for this. It doesn't feel anything. And so if you create these services that are sort of hobbling along and they're actually willing to recover from local failures, you actually create very, very brittle systems. And this also means that you're very quickly creating a code base of the size and complexity that the agent itself can no longer dig itself out from. It's going to start no longer reading all the files that it should. It's creating code in a new file that has already done somewhere else. And so this entire machinery over time creates much more entropy in a source code than you would normally have if humans were on it. And the big part of this is that humans feel bad and the agents don't really have any emotions that they communicate to you. But as Armin likes to say, don't worry, not all is lost. We have found some correlation between what the agents really excel at doing and the types of code bases that we actually put them to work into. And for example, the main example here is libraries versus products. What we found is that for libraries, they tend to excel a lot more. And this makes sense because intrinsically when you're building a library, you tend to have a very clearly defined problem that you're trying to solve. And most of the time you can even map the set of features that you want to build to the API service. And it has very tight constraints. And because this is something that you probably want to build on top of or make accessible to other people, it's likely that it's going to be a very simple core in which you can then plug into. And on the other hand, products, and perhaps this is a bit more unlucky for the rest of us because we probably are more into building products, it's much harder. Because there are so many interactive concerns and components. Like for example, you have your UI, your API response, you have different permissions depending on the feature flags, the billing, and so on. And so there's this very heavy intertwining between different components. And what this means is that for the agent itself, it's impossible to fit all of this into its context window. It has no way to actually understand the entire global structure. And so locally, the agent tends to be very reasonable. But when it gets to the global scale, it becomes a bit demented. So what we're proposing here is that just as you would do with any type of system design in the past, your code base has now become infrastructure. And as such, you have to design it in the way so that it is also legible for the agent. And it can make the most of it. And so this is what we're proposing is an agent legible code base. And one of the main points that is very clear to all of us, I'm sure, is modularization. So we have different components. And this makes it easy for the agent to add one feature in one spot without corrupting everything else. But importantly, this also means modularizing your code flow itself. So for example, I've been working on some refactoring. We're building somewhat of an AI assistant. And for me, it was super important to understand which steps of my code are actually like the main points. So say like you get user message, then I pass the message to the agent loop, and then I have to deal with the output. And this is where these points are very clearly defined for me. So the code was not as messy. But it happens to be that between these points, between these steps, that's where the agent tends to add the most fuzz. So it will be parsing between different types. It's adding things to state that shouldn't be in state. And so you end up with these behaviors that you didn't want to support and that are unexpected. So these actions can be quite dangerous. Another point is trying to follow all of the known patterns. Because I think we all know by now, there's no point in fighting the RL, the reinforcement learning. The more we can lean into it, the better the output is going to be. And it's also more scalable down the line. Then as mentioned with libraries, like if you have a simple core and you push the complexity to other abstraction layers, then it's going to be easier for yourself and the agent to be able to read your code base. And no hidden magic. So for example, here, using React Server Actions or using ORM instead of RoyceQL, what this does is that it hides intent from the agent. And if the agent can't see something, it can surely not respect it. And so to be more precise, these are the examples of mechanical enforcement that we have been using at the company. And most of these, we actually achieve with linting rules. So the main example would be no bear catch rolls. Great. Imagine that there's an example here. The agent found the bear catch roll and was like, oh no, this is bad. Edit it. But yeah, so we also tried to have our SQL always in one query interface so that the agent doesn't have to go hunting around the code base, finding a lot of the different places. Because if it misses one, then you can have breaking behaviors. And again, that's dangerous. We tried to have one primitives components library for the UI and not have any raw, for example, input boxes so that it will always have one type of styling. It's very consistent, one kind of behavior. We don't have any dynamic imports. And this may not sound as important, but actually we enforce unique function names. And the reason for this is not just more legibility for you and the agent, but it's actually also the token efficiency. So if your agent is gripping for a specific feature or something in your code base, if it only gets one output, it's going to be much better at continuing with the loop. And we started exploring something recently called Erasable syntax only type script mode. And what this does is that your code is basically JavaScript and it has the type and notations on top. And this means that there's no transpiling direction because there's one source of truth between your actual code and the compiler. And so when the agent is looking for errors, it doesn't have to have this confusion of, oh my god, where am I looking at? It's much better at finding them. And so the goal really is get in this loop somehow, like get the agent to produce as good code as it can. But you really need to find a way to feel the pain that the agent doesn't feel. And you need to be walking up in a way when you should be looking at this. And one of the things we have been doing is we build a pi extension for our review needs, where we are separating out the kind of input that normally would go back to the agent. So this is mechanical bugs. It is where it clearly violated the agent's MD. But then we specifically call out the kind of changes where the human's brain should reactivate, right? It's like, we don't think that the database migration should ever go in without the human making a judgment column this because it very much depends on the locks, the size of the data in production. If there are permissioning changes, you better think about this themselves rather than the agent because they can be under documented. But just some examples where we learned, if we miss it, we regret it. And you will miss it. But these machines can help you find this. And then you see this. And then you actually get a little bit of a hit. It's like, oh, now I have to kick into gear and do something here. This is what this looks like in Pi. You have the bottom, you have the human callouts on the top, you have what is, what is basically if you want to end this review and fix the issues, the agent would go back and automatically act on the first two. But this is the moment where I will now go and see, is this the dependency actually want to have in the scope base? Do I like the maintainers? Does this work for me? And we obviously like the speed. This is addictive, it is great, we feel there's a lot of productivity. But it is so devious if you start relying on that speed where you really shouldn't. And so I can only encourage you to find the areas where you have this feeling that this is actually not positive. For me a lot of this is reproduction cases. Like when a customer reports an issue, I can have the agent reproduce this perfectly. And I have a really good starting point. Exploring different type of product directions for as long as I'm committing yourself to doing this with the code that it generates. All of this is great. But on the other hand, system architecture, creating reliability in the system, they are not just very good at it. Because we really still have to go slow. There is so much mess that can appear in a code base in so little time. Mario was already talking about this earlier. But like we forget that we are producing months and months of technical debt in the time of weeks, in the time of days sometimes. And it becomes so much harder to actually understand what's going on in this code base. When the understanding of your own code drops, it is really, really hard. And it's also psychologically hard. I found some code pieces that actually didn't work in production. And I was kind of frustrated learning that I was the one that committed it with the agent and just didn't really see that. It's a very disappointing experience when it happens. And then you realize that you actually were the one that screwed up. And so it is, it is psychologically incredibly hard to really judge objectively the state of the code base. And the only way right now is to really slow down a little bit on that front. And this friction, I know that friction, like every engineering team, if ever worked at said like we need to get rid of the friction and shipping. And that is true. Like there's a lot of stuff that is very, very annoying and shouldn't be there. But if you have worked on large enough engineering work, SLOs are a great system that is intentionally designed to put friction into the engineering process to make you think, do I need this reliability? Do I need this criticality of the service? I'm sufficiently staff to run it. And with the agents we have now gotten in this idea that we should get rid of all of this when in our reality we need of it. Because the friction actually in many ways is what's necessary on a physical level to steer. Like without friction there is no steering and that is really necessary. So you should put a little bit more of a positive association to this idea of friction. Because this is really where judgment is, this is where experience is, and you should be inserting that and start feeling it. Thank you.
TL;DR
- The widespread adoption of AI agents in software development creates new challenges, shifting from initial productivity gains to increased pressure and a higher risk of technical debt.
- AI agents excel at generating code that "runs" but often produce brittle systems and struggle with complex product architectures due to limited context and lack of human judgment for code quality.
- To counter these issues, developers must intentionally introduce "friction" into the process by designing "agent-legible codebases" and actively engaging human critical thinking for architectural decisions, reliability, and code reviews.
Takeaways
- AI tools initially boost individual productivity but quickly raise baseline expectations, transforming free time into pressure to ship faster without adequate time for critical thought.
- AI agents are optimized for making progress and passing tests, which can lead to brittle code that handles failures poorly (e.g., silently loading defaults) and creates more failure conditions than human-written code.
- Human developers must resist the addictive cycle of rapid AI-driven output and actively slow down to prevent the accumulation of massive technical debt and the decline of codebase understanding.
- Design an "agent-legible codebase" through strong modularization, adhering to known patterns, pushing complexity to abstraction layers, and avoiding "hidden magic" that obscures intent from the agent.
- Implement mechanical enforcement via linting rules for practices like preventing
bare catch-allerror handling, enforcingunique function names, and using anerasable syntax only TypeScript modeto improve agent consistency and error detection. - Preserve human judgment for critical areas such as
system architecture,database migrations,permissioning changes, anddependency choices, as agents cannot responsibly make these decisions. - Embrace intentional "friction" in the engineering process, similar to
SLOs, as a positive force that necessitates human judgment, experience, and critical thinking, which are vital for creating reliable and maintainable systems.
Vocabulary
AI engineer — A software engineer whose primary tools and learning path are deeply integrated with AI technologies.
Agent — An AI system designed to perform tasks, often by interacting with its environment (e.g., generating code, running tests).
Reinforcement Learning (RL) — A type of machine learning where an agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.
Pull request (PR) — A request to merge code changes from one branch into another, typically requiring review by other engineers.
Entropy — In software, refers to the degree of disorder, unpredictability, or inconsistency in a codebase, often increasing with unmanaged complexity.
Context window — The limited amount of text (tokens) an AI model can process or "remember" at any given time during a conversation or task.
Modularization — The process of breaking down a system into smaller, independent, and interchangeable components or modules.
ORM (Object-Relational Mapper) — A programming technique that converts data between incompatible type systems using object-oriented programming languages. It allows developers to interact with a database using objects rather than raw SQL.
Linting rules — Automated checks that analyze source code for programmatic and stylistic errors, potential bugs, and non-compliance with coding standards.
SLO (Service Level Objective) — A key metric that defines the target level of performance or availability for a service, often used to introduce intentional friction or trigger critical thinking about reliability.
Transcript
Good morning. Thanks for having us. Today I want to talk with Christina about friction a little bit. This is a social preview that came up automatically when someone submitted an issue to basically a forum post that goes with a security incident that was deployed accidentally. It was a configuration change that caused the problem. And the social preview post had the marketing tagline of that company which said ship without friction. And we want to encourage to add a little bit of friction to it and I'll tell you why. So who are we? I've been doing software development for 20 years, most of it in the open source space. I have created Flask which is a Python framework which ironically is so much in the weights that a lot of people are learning about it now because the machines are producing it. And I left my previous company that worked for a century in April last year which perfectly coincided with me having time and then obviously Claude code. And so I fell deep into a whole of a gently engineering and I started writing on my blog and a lot of people reached out to me over the last year being all excited about this. And then I started with a friend in October a company called Arandel where we are trying to make sense of all the I things. And my name is Christina and I work with Armin at this company called Arandel. But importantly I am what I like to call a native AI engineer and what that basically means is that these tools have been around longer than I have. So what this means is they've been super foundational in how it become a software engineer. Not just because obviously I use them to work but also because this is the means by which I've learned to do what I do. And before Arandel I was working at Ben and Spuz. So we want to share a little bit from practice not just theory but I will readily admit that I don't think I have all the solutions. So we have been building with or on agents for good 12 months. We had huge leverage and great disappointment. And we really keep running into two types of problems. I think especially if you listen to some earlier talks at this conference you will have learned a lot about that you should keep using your brain. It's for some reason that's really really hard. So there's a psychological problem and the other one is the engineering challenge. It seems to be producing worse code for some people and better code for some other people and what is it that actually makes that work. And so this is really not a solution as it is our part of the journey of how we think so far we have managed. Yeah. So problem number one is the psychology part which is like why is it even though everybody told you many times over that you should be using your brain, you should be slowing down, it's actually incredibly hard. It's just one more problem and we don't sleep that much. What is it that actually makes it so hard? And then would it be that hard if the machines would actually be writing perfect code and we wouldn't have to think quite as much. Is there something we can do to make this a little bit better? So I'll begin by introducing the first part of these problems, the psychology problem. And what I want to talk first about is the shift. So I'm sure a lot of us here who have been playing with these tools for a while now experienced this at some point. We were prompting, prompting, not so good. And then at some point suddenly it clicked and they were really really useful for us. And it was fun in the beginning and they gave us a lot of extra time, right? Because not everyone was using them. They were actually tools that made us more productive, that made it more fun to do our jobs. Very quickly because they were so useful and they got us so hooked everyone was using them. And so this kind of had the opposite effect, where suddenly the baseline expectation was just that everyone is now using them and you have to use them. And so this fun and free time translated into pressure. Now we all have to ship faster and produce more code and it is just not sustainable to review and to actually have time to think. And so this leads us to the trap. And I actually think there's two parts of this problem of this trap. And one of them, a lot of engineers have spoken about and it's that these tools are super addictive. You never know if that next prompt is going to be the one that makes your product work and you've added a new feature or if it's going to be that last drop of slop that brings your product crashing down. And so it's very addictive. We keep doing what we're doing. It's not a great solution. But also most importantly, and I don't think we realize this as much. Is that because we produce a lot of output very fast, we are tricked into thinking that we're actually being more efficient, doing more work. And this is quite the opposite because now we don't have as much time to actually stop and think and the sign we're doing ask ourselves is the best way in which I can implement this or could I be doing something better. And when you're in this flow, it's very difficult for yourself to stop and it's definitely very difficult for your agent to stop because it's running around and it's reading files that it should have never even read. So we are the ones that need to actually have the agency to be in control here. And one thing that from a, if you start scaling this from like one person to an engineering team that actually took me quite a while to realize is that it really changes the composition of the engineering team. We were really supplied constrained by creation of code and so like the balance between writing code and reviewing code and engineering teams was usually quite with decent. Now every engineer has a multitude of producing power compared to their reviewing power. And so obviously we are piling up on pull requests. But we are also slowly starting to expand the total amount of humans in an organization that are participating in engineering process. I talked to a lot of engineers over the last year and increasingly one of the things that came up is like now have marketing people shipping code. I have former CEOs, CEOs that used to be like engineers in a shipping code again. And so the roles that those people have in the companies also doesn't give them. There is not that much responsibility that doesn't rest in them. The responsibility is still rests with the engineering team. And so the total number of entities, both humans and machines, that the participant with code creation process out numbers, the ones that can carry responsibility. We are not there where the machine can be responsible for the code changes. And so that has led to more and more code reviews being skipped, being rubber stamped. And on the goal to small PRs that we want to see again so that this reviewing process goes, this amplification is something that at the very least we need to recognize. And so when you get this pull request that looks really daunting and has 5,000 lines of code on it, this is actually when you should be thinking. And that's exactly when it's the most overwhelming. And increasingly we're tapping out of this. On the engineering side, what we're doing is we are creating larger pull requests. We're creating these massive changes because it is free now. And if you think about how the agents work, they're really optimized to creating code that runs. Like their main objective is write some code, run the tests, make some progress. The reinforcement learning sort of gets this in. And so the agents are writing kind of codes that is when you as a human, as a software engineer, start learning how to write code you wouldn't necessarily write. So for instance, you see quite a bit of code that tries to read a config file, and if it doesn't read the config, I'll load some defaults. And as an engineer, you know, that's actually not great because I might not notice that I'm reading the default config file. And so I might only discover that I have a massive problem after two hours when I already wrote database records with wrong data. And so these machines, they optimize to its making progress, to its shipping stuff, to like unblocking themselves. And as a result, they're creating many more failure conditions than human written code normally would do. And in parts, because you as a human feel a little bit of a, you feel bad when you write code like this. There's something that sort of builds up emotionally in yourself. But the agent doesn't have a reason for this. It doesn't feel anything. And so if you create these services that are sort of hobbling along and they're actually willing to recover from local failures, you actually create very, very brittle systems. And this also means that you're very quickly creating a code base of the size and complexity that the agent itself can no longer dig itself out from. It's going to start no longer reading all the files that it should. It's creating code in a new file that has already done somewhere else. And so this entire machinery over time creates much more entropy in a source code than you would normally have if humans were on it. And the big part of this is that humans feel bad and the agents don't really have any emotions that they communicate to you. But as Armin likes to say, don't worry, not all is lost. We have found some correlation between what the agents really excel at doing and the types of code bases that we actually put them to work into. And for example, the main example here is libraries versus products. What we found is that for libraries, they tend to excel a lot more. And this makes sense because intrinsically when you're building a library, you tend to have a very clearly defined problem that you're trying to solve. And most of the time you can even map the set of features that you want to build to the API service. And it has very tight constraints. And because this is something that you probably want to build on top of or make accessible to other people, it's likely that it's going to be a very simple core in which you can then plug into. And on the other hand, products, and perhaps this is a bit more unlucky for the rest of us because we probably are more into building products, it's much harder. Because there are so many interactive concerns and components. Like for example, you have your UI, your API response, you have different permissions depending on the feature flags, the billing, and so on. And so there's this very heavy intertwining between different components. And what this means is that for the agent itself, it's impossible to fit all of this into its context window. It has no way to actually understand the entire global structure. And so locally, the agent tends to be very reasonable. But when it gets to the global scale, it becomes a bit demented. So what we're proposing here is that just as you would do with any type of system design in the past, your code base has now become infrastructure. And as such, you have to design it in the way so that it is also legible for the agent. And it can make the most of it. And so this is what we're proposing is an agent legible code base. And one of the main points that is very clear to all of us, I'm sure, is modularization. So we have different components. And this makes it easy for the agent to add one feature in one spot without corrupting everything else. But importantly, this also means modularizing your code flow itself. So for example, I've been working on some refactoring. We're building somewhat of an AI assistant. And for me, it was super important to understand which steps of my code are actually like the main points. So say like you get user message, then I pass the message to the agent loop, and then I have to deal with the output. And this is where these points are very clearly defined for me. So the code was not as messy. But it happens to be that between these points, between these steps, that's where the agent tends to add the most fuzz. So it will be parsing between different types. It's adding things to state that shouldn't be in state. And so you end up with these behaviors that you didn't want to support and that are unexpected. So these actions can be quite dangerous. Another point is trying to follow all of the known patterns. Because I think we all know by now, there's no point in fighting the RL, the reinforcement learning. The more we can lean into it, the better the output is going to be. And it's also more scalable down the line. Then as mentioned with libraries, like if you have a simple core and you push the complexity to other abstraction layers, then it's going to be easier for yourself and the agent to be able to read your code base. And no hidden magic. So for example, here, using React Server Actions or using ORM instead of RoyceQL, what this does is that it hides intent from the agent. And if the agent can't see something, it can surely not respect it. And so to be more precise, these are the examples of mechanical enforcement that we have been using at the company. And most of these, we actually achieve with linting rules. So the main example would be no bear catch rolls. Great. Imagine that there's an example here. The agent found the bear catch roll and was like, oh no, this is bad. Edit it. But yeah, so we also tried to have our SQL always in one query interface so that the agent doesn't have to go hunting around the code base, finding a lot of the different places. Because if it misses one, then you can have breaking behaviors. And again, that's dangerous. We tried to have one primitives components library for the UI and not have any raw, for example, input boxes so that it will always have one type of styling. It's very consistent, one kind of behavior. We don't have any dynamic imports. And this may not sound as important, but actually we enforce unique function names. And the reason for this is not just more legibility for you and the agent, but it's actually also the token efficiency. So if your agent is gripping for a specific feature or something in your code base, if it only gets one output, it's going to be much better at continuing with the loop. And we started exploring something recently called Erasable syntax only type script mode. And what this does is that your code is basically JavaScript and it has the type and notations on top. And this means that there's no transpiling direction because there's one source of truth between your actual code and the compiler. And so when the agent is looking for errors, it doesn't have to have this confusion of, oh my god, where am I looking at? It's much better at finding them. And so the goal really is get in this loop somehow, like get the agent to produce as good code as it can. But you really need to find a way to feel the pain that the agent doesn't feel. And you need to be walking up in a way when you should be looking at this. And one of the things we have been doing is we build a pi extension for our review needs, where we are separating out the kind of input that normally would go back to the agent. So this is mechanical bugs. It is where it clearly violated the agent's MD. But then we specifically call out the kind of changes where the human's brain should reactivate, right? It's like, we don't think that the database migration should ever go in without the human making a judgment column this because it very much depends on the locks, the size of the data in production. If there are permissioning changes, you better think about this themselves rather than the agent because they can be under documented. But just some examples where we learned, if we miss it, we regret it. And you will miss it. But these machines can help you find this. And then you see this. And then you actually get a little bit of a hit. It's like, oh, now I have to kick into gear and do something here. This is what this looks like in Pi. You have the bottom, you have the human callouts on the top, you have what is, what is basically if you want to end this review and fix the issues, the agent would go back and automatically act on the first two. But this is the moment where I will now go and see, is this the dependency actually want to have in the scope base? Do I like the maintainers? Does this work for me? And we obviously like the speed. This is addictive, it is great, we feel there's a lot of productivity. But it is so devious if you start relying on that speed where you really shouldn't. And so I can only encourage you to find the areas where you have this feeling that this is actually not positive. For me a lot of this is reproduction cases. Like when a customer reports an issue, I can have the agent reproduce this perfectly. And I have a really good starting point. Exploring different type of product directions for as long as I'm committing yourself to doing this with the code that it generates. All of this is great. But on the other hand, system architecture, creating reliability in the system, they are not just very good at it. Because we really still have to go slow. There is so much mess that can appear in a code base in so little time. Mario was already talking about this earlier. But like we forget that we are producing months and months of technical debt in the time of weeks, in the time of days sometimes. And it becomes so much harder to actually understand what's going on in this code base. When the understanding of your own code drops, it is really, really hard. And it's also psychologically hard. I found some code pieces that actually didn't work in production. And I was kind of frustrated learning that I was the one that committed it with the agent and just didn't really see that. It's a very disappointing experience when it happens. And then you realize that you actually were the one that screwed up. And so it is, it is psychologically incredibly hard to really judge objectively the state of the code base. And the only way right now is to really slow down a little bit on that front. And this friction, I know that friction, like every engineering team, if ever worked at said like we need to get rid of the friction and shipping. And that is true. Like there's a lot of stuff that is very, very annoying and shouldn't be there. But if you have worked on large enough engineering work, SLOs are a great system that is intentionally designed to put friction into the engineering process to make you think, do I need this reliability? Do I need this criticality of the service? I'm sufficiently staff to run it. And with the agents we have now gotten in this idea that we should get rid of all of this when in our reality we need of it. Because the friction actually in many ways is what's necessary on a physical level to steer. Like without friction there is no steering and that is really necessary. So you should put a little bit more of a positive association to this idea of friction. Because this is really where judgment is, this is where experience is, and you should be inserting that and start feeling it. Thank you.