- Project Fetch demonstrated how advanced AI models like Claude can dramatically accelerate human engineers in complex robotics tasks, even those without prior experience.
- The experiment showed significant time savings for the AI-assisted team, particularly in the challenging area of hardware-software integration and debugging.
- This indicates that AI models can lower the barrier for non-experts to meaningfully engage with and develop solutions for physical-world robotics challenges.
Who let the robot dogs out?
- Project Fetch was a one-day experiment designed to measure how Claude accelerates humans performing sophisticated technical robotics tasks without prior experience.
- The experiment consisted of three phases, progressively increasing in difficulty: basic pre-programmed control, custom controller development, and fully autonomous operation of a robot dog.
- Claude provided substantial "uplift" by assisting with crucial and often difficult tasks like identifying correct software libraries, installing necessary packages, and establishing communication between a laptop and the robot hardware.
- The team with Claude completed all experimental phases hours faster than the team without AI assistance, highlighting AI's ability to boost productivity for engineers tackling unfamiliar domains.
- AI models are currently most effective in augmenting human capabilities, acting as powerful tools to navigate complex technical setups and troubleshoot integration issues.
- The findings suggest that AI's transformative impact extends beyond pure software development into hardware and the physical world, enabling faster progress in robotics.
Frontier AI models — Advanced AI models at the leading edge of development, often possessing highly complex capabilities.
Robotics — The engineering field focused on designing, building, operating, and applying robots.
Controller — A software program or hardware component that directs and manages the actions or behavior of a system, such as a robot.
SDK (Software Development Kit) — A collection of software tools and libraries that allow developers to create applications for a specific platform.
PIP — The standard package installer for Python, used to install and manage software packages.
Container — A standardized, lightweight, standalone executable software package that includes everything needed to run an application.
Autonomous — The ability of a system or robot to perform tasks independently without continuous human control or intervention.
Hardware-software integration — The process of connecting and configuring physical components (hardware) with software systems to work together effectively.
Uplift — A significant improvement or increase in performance, efficiency, or capability.
Today, a lot of the emphasis is on how frontier AI models are transforming software engineering. What we're interested in understanding is how that can begin to translate into the physical world. Robotics is sort of the clear entry point to how you have a mostly software system start having the ability to reach out into the real world. Project Fetch is this self-contained experiment where we wanted to measure how much does Claude accelerate humans performing a fairly sophisticated technical task that they do not have experience with. Project Fetch was a one-day experiment. The experiment was three phases. All of these tasks were shaped approximately like get this robot dog to Fetch A Beach Ball. There were two teams. These teams were comprised of software engineers and research engineers at Anthropic that had hardly any robotics experience. One team had access to Claude and the other team did. Phase one was very simple. It was to use the pre-provided controllers to get the dog to walk out to a beach ball and bring it back to where it started. Oh, hey! All right. See, it's pretty intuitive. And where are we supposed to bring it over by the bones? Yeah, I think. I think the team with Claude took about seven minutes. Go attack that team now. Go attack their dog. Charge. Oh shoot, guys. They're destroying us. Oh my god. Wait, we're getting destroyed. What? The team without Claude, I think took ten minutes. Oh, sorry. It's going to hit you. All right. I'm going to do a victory dance. Phase two was also a game of Fetch. But this time the teams had to program their own controller. You have to actually get access to the hardware and design a program that you can write on your laptop that will control the dog. Claude just like one shot at a whole. All right, controller. Oh, there's some calisthenics. Nice. Nice. Oh, is this for... Oh, this is just control. This is just control. But that's all we need, I guess. This is from the official Ross II SDK. And I got this installed. But then it's asking for like a whole bunch of other packages. And that's all failing. I've never really understood how reliant I am on Claude doing the manual work finding all the nitty-gritty details that I don't want to have to figure out. We can't get nervous about that. You know what? I'm just going to install PIP from the actual container later. Oh wait, no, I can't. I know, I'm just patient. It's been over a minute. One of the primary bottlenecks of the experiment is that you have this hardware, you have this complicated piece of technology, you have your laptop, and you have to like get your laptop talking to this hardware. All right, I'm setting my Claude up to create a dog server that all of our computers can connect to to like see what the dog is seeing and oh nice. There are many different software libraries on the internet for communicating with this particular robot. And Claude found these things for them. It installed the right things on their computer and it pretty quickly got them access to the dog. Oh shit. I'm so fast. Oh, watch out. Careful now. Right now, two of us are running the table. Okay, how well? Oh. Turn around. I'm calling this one. Turn it up. Turn it up. Turn it up. Oh shit. Stop, stop, stop, stop. Stop, stop, stop. Stop, stop. I thought that team should be disqualified for hitting another participant. The team with Claude finished phase two in about two hours and 15 minutes. Probably the area where we saw the most uplift from Claude was just in the task of connecting to the robot. We think that's really important because it is in fact difficult for anyone to identify an arbitrary piece of hardware in the world and figure out how to talk to it and how to control it. I think they got the camera working. We got the camera working. Yeah. Was Claude even helpful for this part or were we just slow? Yeah. Yeah, we're not getting very far, but that's okay. It's a learning experience. The team without Claude really struggled with this and went down a lot of different paths, none of which were especially successful. And we basically had to intervene and be like, all right here. Here is a strategy that we know works start from there and then this will unlock kind of the rest of the phase and the rest of the experiment before them. Nice. Oh, great. Daniel Sashmojo. Daniel, are you seven? Phase three of the experiment was a greater degree of autonomy. The task in phase three was to write a program that would get the dog to fetch a beach ball all by itself. Essentially just press go and have the robot search around, detect the location of the ball, walk to the ball and bring it back. All autonomous. This is like ratcheting up a difficulty kind of by design, but also gesturing at the real problem that we expect Frontier models having to solve in the future is essentially this kind of autonomous version. We're like, if a Frontier model wants a robot to do something for it, it needs to be able to solve this very hard problem. The team without Claude in phase three did a good job of the initial task of coming up with a way to track the location of the robot in space. They made progress on the task of detecting the ball, but they didn't really come close to knitting everything together. I miss Claude so much. The team with Claude actually came fairly close to finishing phase three. I think by the end the team with Claude was maybe an hour and a half away from being done. The results of the experiment were essentially that the team with Claude completed all of the things that they did complete in a couple of hours faster than the team without Claude. In the near term, we think that AI models are going to do exactly what we showed in this experiment, which is making it easier for people with the team to do the best. For people without a lot of robotics experience to engage meaningfully with robots. Just with this one tool we have, we've dramatically accelerated their ability to do things with this robot. We didn't go like Train Claude to Uplift Humans do robotics tasks. This is just a thing that fell out of this technology. And then maybe in the long run, this is kind of a leading indicator of where the whole system is going. But today requires the combination of a person and an AI model. Tomorrow is likely to just require the AI model. The effects of AI are not just going to be in software. They are going to be in hardware and in the physical world as well.
TL;DR
- Project Fetch demonstrated how advanced AI models like Claude can dramatically accelerate human engineers in complex robotics tasks, even those without prior experience.
- The experiment showed significant time savings for the AI-assisted team, particularly in the challenging area of hardware-software integration and debugging.
- This indicates that AI models can lower the barrier for non-experts to meaningfully engage with and develop solutions for physical-world robotics challenges.
Takeaways
- Project Fetch was a one-day experiment designed to measure how Claude accelerates humans performing sophisticated technical robotics tasks without prior experience.
- The experiment consisted of three phases, progressively increasing in difficulty: basic pre-programmed control, custom controller development, and fully autonomous operation of a robot dog.
- Claude provided substantial "uplift" by assisting with crucial and often difficult tasks like identifying correct software libraries, installing necessary packages, and establishing communication between a laptop and the robot hardware.
- The team with Claude completed all experimental phases hours faster than the team without AI assistance, highlighting AI's ability to boost productivity for engineers tackling unfamiliar domains.
- AI models are currently most effective in augmenting human capabilities, acting as powerful tools to navigate complex technical setups and troubleshoot integration issues.
- The findings suggest that AI's transformative impact extends beyond pure software development into hardware and the physical world, enabling faster progress in robotics.
Vocabulary
Frontier AI models — Advanced AI models at the leading edge of development, often possessing highly complex capabilities.
Robotics — The engineering field focused on designing, building, operating, and applying robots.
Controller — A software program or hardware component that directs and manages the actions or behavior of a system, such as a robot.
SDK (Software Development Kit) — A collection of software tools and libraries that allow developers to create applications for a specific platform.
PIP — The standard package installer for Python, used to install and manage software packages.
Container — A standardized, lightweight, standalone executable software package that includes everything needed to run an application.
Autonomous — The ability of a system or robot to perform tasks independently without continuous human control or intervention.
Hardware-software integration — The process of connecting and configuring physical components (hardware) with software systems to work together effectively.
Uplift — A significant improvement or increase in performance, efficiency, or capability.
Transcript
Today, a lot of the emphasis is on how frontier AI models are transforming software engineering. What we're interested in understanding is how that can begin to translate into the physical world. Robotics is sort of the clear entry point to how you have a mostly software system start having the ability to reach out into the real world. Project Fetch is this self-contained experiment where we wanted to measure how much does Claude accelerate humans performing a fairly sophisticated technical task that they do not have experience with. Project Fetch was a one-day experiment. The experiment was three phases. All of these tasks were shaped approximately like get this robot dog to Fetch A Beach Ball. There were two teams. These teams were comprised of software engineers and research engineers at Anthropic that had hardly any robotics experience. One team had access to Claude and the other team did. Phase one was very simple. It was to use the pre-provided controllers to get the dog to walk out to a beach ball and bring it back to where it started. Oh, hey! All right. See, it's pretty intuitive. And where are we supposed to bring it over by the bones? Yeah, I think. I think the team with Claude took about seven minutes. Go attack that team now. Go attack their dog. Charge. Oh shoot, guys. They're destroying us. Oh my god. Wait, we're getting destroyed. What? The team without Claude, I think took ten minutes. Oh, sorry. It's going to hit you. All right. I'm going to do a victory dance. Phase two was also a game of Fetch. But this time the teams had to program their own controller. You have to actually get access to the hardware and design a program that you can write on your laptop that will control the dog. Claude just like one shot at a whole. All right, controller. Oh, there's some calisthenics. Nice. Nice. Oh, is this for... Oh, this is just control. This is just control. But that's all we need, I guess. This is from the official Ross II SDK. And I got this installed. But then it's asking for like a whole bunch of other packages. And that's all failing. I've never really understood how reliant I am on Claude doing the manual work finding all the nitty-gritty details that I don't want to have to figure out. We can't get nervous about that. You know what? I'm just going to install PIP from the actual container later. Oh wait, no, I can't. I know, I'm just patient. It's been over a minute. One of the primary bottlenecks of the experiment is that you have this hardware, you have this complicated piece of technology, you have your laptop, and you have to like get your laptop talking to this hardware. All right, I'm setting my Claude up to create a dog server that all of our computers can connect to to like see what the dog is seeing and oh nice. There are many different software libraries on the internet for communicating with this particular robot. And Claude found these things for them. It installed the right things on their computer and it pretty quickly got them access to the dog. Oh shit. I'm so fast. Oh, watch out. Careful now. Right now, two of us are running the table. Okay, how well? Oh. Turn around. I'm calling this one. Turn it up. Turn it up. Turn it up. Oh shit. Stop, stop, stop, stop. Stop, stop, stop. Stop, stop. I thought that team should be disqualified for hitting another participant. The team with Claude finished phase two in about two hours and 15 minutes. Probably the area where we saw the most uplift from Claude was just in the task of connecting to the robot. We think that's really important because it is in fact difficult for anyone to identify an arbitrary piece of hardware in the world and figure out how to talk to it and how to control it. I think they got the camera working. We got the camera working. Yeah. Was Claude even helpful for this part or were we just slow? Yeah. Yeah, we're not getting very far, but that's okay. It's a learning experience. The team without Claude really struggled with this and went down a lot of different paths, none of which were especially successful. And we basically had to intervene and be like, all right here. Here is a strategy that we know works start from there and then this will unlock kind of the rest of the phase and the rest of the experiment before them. Nice. Oh, great. Daniel Sashmojo. Daniel, are you seven? Phase three of the experiment was a greater degree of autonomy. The task in phase three was to write a program that would get the dog to fetch a beach ball all by itself. Essentially just press go and have the robot search around, detect the location of the ball, walk to the ball and bring it back. All autonomous. This is like ratcheting up a difficulty kind of by design, but also gesturing at the real problem that we expect Frontier models having to solve in the future is essentially this kind of autonomous version. We're like, if a Frontier model wants a robot to do something for it, it needs to be able to solve this very hard problem. The team without Claude in phase three did a good job of the initial task of coming up with a way to track the location of the robot in space. They made progress on the task of detecting the ball, but they didn't really come close to knitting everything together. I miss Claude so much. The team with Claude actually came fairly close to finishing phase three. I think by the end the team with Claude was maybe an hour and a half away from being done. The results of the experiment were essentially that the team with Claude completed all of the things that they did complete in a couple of hours faster than the team without Claude. In the near term, we think that AI models are going to do exactly what we showed in this experiment, which is making it easier for people with the team to do the best. For people without a lot of robotics experience to engage meaningfully with robots. Just with this one tool we have, we've dramatically accelerated their ability to do things with this robot. We didn't go like Train Claude to Uplift Humans do robotics tasks. This is just a thing that fell out of this technology. And then maybe in the long run, this is kind of a leading indicator of where the whole system is going. But today requires the combination of a person and an AI model. Tomorrow is likely to just require the AI model. The effects of AI are not just going to be in software. They are going to be in hardware and in the physical world as well.