- The primary bottleneck in scaling AI agents isn't the agents themselves, but the human capacity to effectively orchestrate and manage dozens of them simultaneously.
- Ayduncraft introduces an agent orchestration paradigm inspired by real-time strategy (RTS) games to enhance human-agent collaboration and manage complex development tasks.
- The platform improves visibility into agent activities, grants agents more autonomy for task decomposition, and facilitates seamless human-agent collaboration, shifting the focus from micro-management to planning and review.
AgentCraft: Putting the Orc in Orchestration — Ido Salomon
- Human Orchestration Bottleneck: Engineers are the bottleneck in orchestrating multiple AI agents, lacking the specific skills to manage them effectively at scale.
- Gaming as a Mental Model: Learnings from RTS games, where players manage numerous units, can be applied to design intuitive and efficient interfaces for managing AI development agents.
- Enhanced Visual Visibility: Ayduncraft projects the file system onto a visual map, using directories and "runes" (files) to allow users to visually track what agents are working on, view change lists, understand task lineage, and proactively prevent collisions with heat maps.
- Improved Reactivity: The system incorporates mechanisms like quickly cycling between agents needing attention, similar to "muscle memory" in games, to help users react swiftly to agent queries or required approvals.
- Increased Agent Autonomy: Users can delegate broad "missions" or "campaigns" to agents, allowing them to autonomously decompose tasks, plan execution within sandboxed environments, and even generate ideas, reducing the human "babysitting" burden.
- Shift to Review-Centric Workflow: By enabling agents to perform initial planning and execution, the human role transitions more to reviewing "bundles" of work (e.g., PRs with visual evidence like screenshots/videos) and selecting the best outcomes, making the process more iterative and efficient.
- Seamless Human-Agent Collaboration: Ayduncraft supports shared workspaces for human-human collaboration and integrated chat for direct human-agent communication, allowing agents to understand human intent and coordinate soft collaboration by knowing what files others are changing.
Agent — An autonomous software entity, often with AI capabilities, designed to perform specific tasks, such as writing or modifying code.
Orchestration — The systematic coordination and management of multiple independent software entities or services, in this context, numerous AI agents.
RTS Games — Real-Time Strategy games, a video game genre where players manage resources and command multiple units in real-time to achieve objectives.
Ayduncraft — A specific software platform designed to act as an orchestrator for multiple AI agents, drawing inspiration from gaming interfaces.
File System Projection — A visual representation in a user interface where the computer's file system structure (directories, files) is graphically mapped for intuitive interaction.
Rune — In the context of Ayduncraft, a visual icon or symbol representing a file within the projected file system map.
Lineage — The historical record or chronological sequence of actions and changes performed by agents on a codebase, providing full traceability.
Heat Map — A data visualization technique where values are represented by color, used here to show areas of intense agent activity or potential conflict in the file system.
Campaign Feature — A functionality in Ayduncraft that allows a user to define a high-level goal, which agents then autonomously break down, plan, and execute within an isolated environment.
Review Bundles — A feature that groups related changes or pull requests together for efficient review, often including visual evidence to aid in understanding the impact of the changes.
So, good morning, London. My name is Ida Salman. I'm the creator of Ayduncraft. I'm also the creator of MCPI and creator and commentator of MCP apps. So I'm building some of the stuff that David has been talking about. As you've all heard in the past day, Aydunce are amazing. But if one agent is so amazing, why don't we scale up to 10 or 20 or 100 different agents and be 100 times more amazing? It is pretty simple. We just spin up a bunch of agents, we put them in this nice screen and it looks really glowy. But it won't actually work. And the reason is that spinning in the map isn't a problem. It's us. We are the bottleneck in orchestrating all of these agents. Now, if you think about it, the role of the engineer to actually go and manage dozens of reckless employees is not typically what we do in most companies. So, we need to somehow find these new, potentially new skills to manage all of these agents. Luckily, they're not really brand new. It's not something that we've never done before. It's just something that's been hiding in unexpected places. I mean, if you're a gamer or used to play games at any point, managing dozens of units probably sounds a little bit familiar, which is via builds Ayduncecraft, which is an orchestrator that aims to raise the steering of human agent collaboration by taking learnings from gaming and transferring them into productivity. So, let's see a quick walkthrough of that and let understand the journey to raise that ceiling. So, this is Ayduncecraft. There's a lot to unpack. So, this starts with the basics and go from there. This is an agent. Not a metaphorical one. This is actually a physical manifestation of a coding agent, like a live session. It can be cursor, it can be Claude Code, codex, openclaw, whatever. It's something that we can detect on the device and visualize it. But it's also something that we can spawn directly from here. So, now, we have this agent and we can prompt it. We can use it like just any average agent that we have from our CLI or whatever. And what can we tell it to do? It has all these quirks and we have voice and we have text and we have images and so on. And we can just tell it to do stuff. So, for example, we can tell it to develop some feature for us. And now the agent is working. So, it's doing its work. So, it's doing work. And as we can see, if you look at the UI, there's like a bunch of other stuff. We have these buildings and each building represents some functionality. So, for example, one of these buildings manages the skills and plugins and so on. There's also, you know, like integrated terminal and get just to get that end-to-end workflow. The second part of raising the ceiling that we have the basics is visibility. We need to be able to quickly understand what each agent is doing. So, we have this nice side panel here that really shows us like high level missions, status, summary and so on. What are they actually doing? But the cool thing about agent craft is that we don't just see a list of what they can do. We can actually see them working. So, if we look at the map, you would notice that it's actually a projection of my file system. Each part of my file system is actually on the map. So, I have these directories here. And each one of these directories has files. These files are represented as runes, as you can see here. So, I can actually track and see visually what the agent is working on, which file. I can see the entire change list of what happened there. And because we're orchestrating it, I also know which agents did what and when. So, we can have full lineage of what's going on. And we can take this one step further. If I know all of these stuff, why not just create a heat map. I can actually try and see visualize collisions. And I can even prevent them proactively. Now, the cool thing here that once we have this visibility, we're not exactly done yet because we still need to be able to react to the changes that are happening. So, we can lean into another cool mechanism from RTS games. We can simply use muscle memory to quickly cycle between the agents that need our help. They need us to approve the plan. They need us to answer some questions. So on. So, now we have visibility and we can react quickly. So, we're done. We solved orchestration. But not quite because that's really only the first step. I was able to use more agents in parallel. But only for a short amount of time. There are a few reasons for that. The first one is that there's only a limit to how many ideas I can have in my head at any given time without being tired. So, what I did is basically tell the agent to do it. I told them okay, find missions for me to do. So, I have quest now and I can click a button and they just do whatever. I can refactor a test all the stuff that I don't want to do. And the second one is that all of this babysitting takes a lot of time. I see what's going on. I can react to it very quickly. But I think it needs to cycle through it. So, what I did there is kind of say how do I take myself out of the equation as much as possible. So, if agents are so amazing, why not just let them do it? I can just give them some idea. I have this campaign feature. Broadly say what I want to happen. And I will just spin up a container. I would let the agents run there. They can decompose the task. They can plan it. They can present a plan to me. I don't care what they're doing because it's container. So, do whatever. And the main thing here is that once it decomposed, I'm not the one doing the babysitting. Now I have the campaign orchestrator and that's his problem. So, we are actually moving more of the effort only to the planning phase or the review phase. And once we have that, we reach a point where we can just say why is it my ideas? Why can't I tell it to have running a Chrome job, go to Twitter every day, scan cool ideas and just implement them. And I just decide what I want. Which is actually how I implemented channels pretty quickly. So, we have that and now just have a lot of different PRs to review. So, there's this nice capability of just review bundles. And now I can see exactly what changes happened in each one. Why did they do stuff? What are the tasks? And I also have visual evidence. So, now I am able to just look at screenshots. I can look at videos and really see what's going on without investing too much time in doing it. And once we have that, we can actually shift more of the work from the planning to the review. How much time do I need to spend on the plan if I can just do it 10 times. And I'll just pick the one that is most feeding for me. And the next part is we're still not done. I mean, you think about it. This is only the first step. Because agents aren't that smart yet. So, we need to offload it to someone else. Humans. Now, what I can do, this is my favorite feature, is that we can actually create these work spaces. So, I can collaborate with the product designer for my team and they can do whatever they want. And I can just continue from where they left off. So, for example, I say this is an agent actually from the product designer on their computer. So, they can see my agents. I can see their agents. I can understand what they're doing and we can just collaborate. They just started working again. So, I can see that they want to design this new page, which is pretty cool. So, I can wait for them to finish or I can just go ahead now and just hand off from them to my agents, well, our agents, insert communist, whatever. So, we have our agents now and I can just keep going from there. And the cool thing is that it's not just human to human collaboration. We are also collaborating with the agents. So, there's more direct stuff like this. I can just type stuff and prompt my agents or even their agents. But, there's also a software mechanism that's actually a chat that is between humans and humans but also between humans and the agents. You can see that the agent said I'm starting to work on something and then I can say I'm also working on it. So, the next time the agent does something, it knows someone else is working. They can also have soft collaboration so they would know what files each one is changing. So, we've actually taken a bunch of stuff that were limiting us from really reaching our full potential with agents and kind of sold them one by one. There are a bunch of other features that just didn't have time to go over but you can try them out and see for yourself if you can really work better at that point. So, to sum up, these are not exactly new skills. I mean, you probably worried perhaps that we won't be able to get adapted to this future where we're not actually coding. We're just telling other people to code for us or other agents. But these skills are there. They're just not something we used for work until now. So, with games as one example, we can take these skills to the next level. We need to somehow raise that ceiling. We need to somehow improve our collaboration with agents and with agent craft, the goal is to take the learnings from games and really raise that to the next level with better visibility, more autonomy to the agents and human to agent collaboration. So, I invite you to go to the website. This is the QR code. You can, it's free. You can just download it and play with it. It's still experimental. It's still new. There's a bunch of stuff that need to change, but it will only happen with great feedback. There's also a discord. So, please join, give us your feedback and let's raise the ceiling together. Thank you.
TL;DR
- The primary bottleneck in scaling AI agents isn't the agents themselves, but the human capacity to effectively orchestrate and manage dozens of them simultaneously.
- Ayduncraft introduces an agent orchestration paradigm inspired by real-time strategy (RTS) games to enhance human-agent collaboration and manage complex development tasks.
- The platform improves visibility into agent activities, grants agents more autonomy for task decomposition, and facilitates seamless human-agent collaboration, shifting the focus from micro-management to planning and review.
Takeaways
- Human Orchestration Bottleneck: Engineers are the bottleneck in orchestrating multiple AI agents, lacking the specific skills to manage them effectively at scale.
- Gaming as a Mental Model: Learnings from RTS games, where players manage numerous units, can be applied to design intuitive and efficient interfaces for managing AI development agents.
- Enhanced Visual Visibility: Ayduncraft projects the file system onto a visual map, using directories and "runes" (files) to allow users to visually track what agents are working on, view change lists, understand task lineage, and proactively prevent collisions with heat maps.
- Improved Reactivity: The system incorporates mechanisms like quickly cycling between agents needing attention, similar to "muscle memory" in games, to help users react swiftly to agent queries or required approvals.
- Increased Agent Autonomy: Users can delegate broad "missions" or "campaigns" to agents, allowing them to autonomously decompose tasks, plan execution within sandboxed environments, and even generate ideas, reducing the human "babysitting" burden.
- Shift to Review-Centric Workflow: By enabling agents to perform initial planning and execution, the human role transitions more to reviewing "bundles" of work (e.g., PRs with visual evidence like screenshots/videos) and selecting the best outcomes, making the process more iterative and efficient.
- Seamless Human-Agent Collaboration: Ayduncraft supports shared workspaces for human-human collaboration and integrated chat for direct human-agent communication, allowing agents to understand human intent and coordinate soft collaboration by knowing what files others are changing.
Vocabulary
Agent — An autonomous software entity, often with AI capabilities, designed to perform specific tasks, such as writing or modifying code.
Orchestration — The systematic coordination and management of multiple independent software entities or services, in this context, numerous AI agents.
RTS Games — Real-Time Strategy games, a video game genre where players manage resources and command multiple units in real-time to achieve objectives.
Ayduncraft — A specific software platform designed to act as an orchestrator for multiple AI agents, drawing inspiration from gaming interfaces.
File System Projection — A visual representation in a user interface where the computer's file system structure (directories, files) is graphically mapped for intuitive interaction.
Rune — In the context of Ayduncraft, a visual icon or symbol representing a file within the projected file system map.
Lineage — The historical record or chronological sequence of actions and changes performed by agents on a codebase, providing full traceability.
Heat Map — A data visualization technique where values are represented by color, used here to show areas of intense agent activity or potential conflict in the file system.
Campaign Feature — A functionality in Ayduncraft that allows a user to define a high-level goal, which agents then autonomously break down, plan, and execute within an isolated environment.
Review Bundles — A feature that groups related changes or pull requests together for efficient review, often including visual evidence to aid in understanding the impact of the changes.
Transcript
So, good morning, London. My name is Ida Salman. I'm the creator of Ayduncraft. I'm also the creator of MCPI and creator and commentator of MCP apps. So I'm building some of the stuff that David has been talking about. As you've all heard in the past day, Aydunce are amazing. But if one agent is so amazing, why don't we scale up to 10 or 20 or 100 different agents and be 100 times more amazing? It is pretty simple. We just spin up a bunch of agents, we put them in this nice screen and it looks really glowy. But it won't actually work. And the reason is that spinning in the map isn't a problem. It's us. We are the bottleneck in orchestrating all of these agents. Now, if you think about it, the role of the engineer to actually go and manage dozens of reckless employees is not typically what we do in most companies. So, we need to somehow find these new, potentially new skills to manage all of these agents. Luckily, they're not really brand new. It's not something that we've never done before. It's just something that's been hiding in unexpected places. I mean, if you're a gamer or used to play games at any point, managing dozens of units probably sounds a little bit familiar, which is via builds Ayduncecraft, which is an orchestrator that aims to raise the steering of human agent collaboration by taking learnings from gaming and transferring them into productivity. So, let's see a quick walkthrough of that and let understand the journey to raise that ceiling. So, this is Ayduncecraft. There's a lot to unpack. So, this starts with the basics and go from there. This is an agent. Not a metaphorical one. This is actually a physical manifestation of a coding agent, like a live session. It can be cursor, it can be Claude Code, codex, openclaw, whatever. It's something that we can detect on the device and visualize it. But it's also something that we can spawn directly from here. So, now, we have this agent and we can prompt it. We can use it like just any average agent that we have from our CLI or whatever. And what can we tell it to do? It has all these quirks and we have voice and we have text and we have images and so on. And we can just tell it to do stuff. So, for example, we can tell it to develop some feature for us. And now the agent is working. So, it's doing its work. So, it's doing work. And as we can see, if you look at the UI, there's like a bunch of other stuff. We have these buildings and each building represents some functionality. So, for example, one of these buildings manages the skills and plugins and so on. There's also, you know, like integrated terminal and get just to get that end-to-end workflow. The second part of raising the ceiling that we have the basics is visibility. We need to be able to quickly understand what each agent is doing. So, we have this nice side panel here that really shows us like high level missions, status, summary and so on. What are they actually doing? But the cool thing about agent craft is that we don't just see a list of what they can do. We can actually see them working. So, if we look at the map, you would notice that it's actually a projection of my file system. Each part of my file system is actually on the map. So, I have these directories here. And each one of these directories has files. These files are represented as runes, as you can see here. So, I can actually track and see visually what the agent is working on, which file. I can see the entire change list of what happened there. And because we're orchestrating it, I also know which agents did what and when. So, we can have full lineage of what's going on. And we can take this one step further. If I know all of these stuff, why not just create a heat map. I can actually try and see visualize collisions. And I can even prevent them proactively. Now, the cool thing here that once we have this visibility, we're not exactly done yet because we still need to be able to react to the changes that are happening. So, we can lean into another cool mechanism from RTS games. We can simply use muscle memory to quickly cycle between the agents that need our help. They need us to approve the plan. They need us to answer some questions. So on. So, now we have visibility and we can react quickly. So, we're done. We solved orchestration. But not quite because that's really only the first step. I was able to use more agents in parallel. But only for a short amount of time. There are a few reasons for that. The first one is that there's only a limit to how many ideas I can have in my head at any given time without being tired. So, what I did is basically tell the agent to do it. I told them okay, find missions for me to do. So, I have quest now and I can click a button and they just do whatever. I can refactor a test all the stuff that I don't want to do. And the second one is that all of this babysitting takes a lot of time. I see what's going on. I can react to it very quickly. But I think it needs to cycle through it. So, what I did there is kind of say how do I take myself out of the equation as much as possible. So, if agents are so amazing, why not just let them do it? I can just give them some idea. I have this campaign feature. Broadly say what I want to happen. And I will just spin up a container. I would let the agents run there. They can decompose the task. They can plan it. They can present a plan to me. I don't care what they're doing because it's container. So, do whatever. And the main thing here is that once it decomposed, I'm not the one doing the babysitting. Now I have the campaign orchestrator and that's his problem. So, we are actually moving more of the effort only to the planning phase or the review phase. And once we have that, we reach a point where we can just say why is it my ideas? Why can't I tell it to have running a Chrome job, go to Twitter every day, scan cool ideas and just implement them. And I just decide what I want. Which is actually how I implemented channels pretty quickly. So, we have that and now just have a lot of different PRs to review. So, there's this nice capability of just review bundles. And now I can see exactly what changes happened in each one. Why did they do stuff? What are the tasks? And I also have visual evidence. So, now I am able to just look at screenshots. I can look at videos and really see what's going on without investing too much time in doing it. And once we have that, we can actually shift more of the work from the planning to the review. How much time do I need to spend on the plan if I can just do it 10 times. And I'll just pick the one that is most feeding for me. And the next part is we're still not done. I mean, you think about it. This is only the first step. Because agents aren't that smart yet. So, we need to offload it to someone else. Humans. Now, what I can do, this is my favorite feature, is that we can actually create these work spaces. So, I can collaborate with the product designer for my team and they can do whatever they want. And I can just continue from where they left off. So, for example, I say this is an agent actually from the product designer on their computer. So, they can see my agents. I can see their agents. I can understand what they're doing and we can just collaborate. They just started working again. So, I can see that they want to design this new page, which is pretty cool. So, I can wait for them to finish or I can just go ahead now and just hand off from them to my agents, well, our agents, insert communist, whatever. So, we have our agents now and I can just keep going from there. And the cool thing is that it's not just human to human collaboration. We are also collaborating with the agents. So, there's more direct stuff like this. I can just type stuff and prompt my agents or even their agents. But, there's also a software mechanism that's actually a chat that is between humans and humans but also between humans and the agents. You can see that the agent said I'm starting to work on something and then I can say I'm also working on it. So, the next time the agent does something, it knows someone else is working. They can also have soft collaboration so they would know what files each one is changing. So, we've actually taken a bunch of stuff that were limiting us from really reaching our full potential with agents and kind of sold them one by one. There are a bunch of other features that just didn't have time to go over but you can try them out and see for yourself if you can really work better at that point. So, to sum up, these are not exactly new skills. I mean, you probably worried perhaps that we won't be able to get adapted to this future where we're not actually coding. We're just telling other people to code for us or other agents. But these skills are there. They're just not something we used for work until now. So, with games as one example, we can take these skills to the next level. We need to somehow raise that ceiling. We need to somehow improve our collaboration with agents and with agent craft, the goal is to take the learnings from games and really raise that to the next level with better visibility, more autonomy to the agents and human to agent collaboration. So, I invite you to go to the website. This is the QR code. You can, it's free. You can just download it and play with it. It's still experimental. It's still new. There's a bunch of stuff that need to change, but it will only happen with great feedback. There's also a discord. So, please join, give us your feedback and let's raise the ceiling together. Thank you.