Skip to main content

Claude 3.5 Sonnet for agentic coding

TL;DR

  • Software engineers often spend more time writing tests and fixing bugs than developing the initial code itself.
  • A new AI model, Model Claude 3.5 Sonnet (dubbed Claude), can autonomously generate tests, identify errors, and fix code.
  • Demonstrated with a buggy image resizing function, Claude successfully diagnosed, tested for, and resolved a visual bug with minimal human input, leading to a fully functioning solution.

Takeaways

  • AI models like Model Claude 3.5 Sonnet are designed to accelerate and automate the traditionally time-consuming testing and debugging phases of software development.
  • The model operates within a secure, isolated sandbox environment without internet access, utilizing tools to edit files and execute commands/code.
  • The AI's workflow includes analyzing the existing code, writing a test suite for expected behavior, running those tests to confirm the bug, and then editing the code to apply a fix.
  • After implementing a fix, the AI reruns the unit tests to validate the solution and confirm successful bug resolution.
  • User interaction is minimized, primarily involving a description of the bug and the relevant file path, after which the AI takes over the diagnostic and fixing process.
  • The demonstration showcased the model's ability to correct specific functional and visual bugs, such as an image cropping incorrectly (remaining square instead of circular) and having an unwanted white background.

Vocabulary

autonomous — Operating independently without human control or intervention. unit tests — Small, isolated tests that verify individual components, functions, or methods of a codebase. implementation — The actual code or details of how a particular function or feature is built and performs. sandbox environment — A secure, isolated virtual environment where programs or code can be run and tested without affecting the host system. test suite — A collection of unit tests or other tests grouped together to test a specific part of a software application. crop images — To remove the outer parts of an image to improve framing or remove unwanted areas. resize images — To change the dimensions (width and height) of an image while maintaining its content.

Transcript

As a software engineer, I find that writing tests and fixing code usually takes much longer than writing the code itself. Our new Model Claude 3.5 Sonnet can help write tests and fix code autonomously. We'll show you how Claude takes us from an incomplete implementation to a fully functioning one, including unit tests, with minimal input from me. I've written a function that resizes in crops, images and to circles. This could be used to make sure that users on a website have profile photos that are all the same dimensions. But there's a bug in this function. When I run it, the crop images are still square and they've got a white background. So let's see if Claude can write tests for the expected behavior, find the error, and fix it. For this demo, I've given Claude tools to edit files and run commands and code in the secure sandbox environment with no internet access. I talk Claude about the bug I'm seeing, the path to the file where the function lives, and some instructions for what I want it to do. First, Claude chooses to open the file with the function I've written to understand the current implementation, and identify some potential problems. Then Claude writes a test suite for us and puts it into the file I asked it to. Now Claude's going to run those tests. Let's give it a second. And just as expected, the test are failing due to that bug. So now Claude's going to go ahead and fix that bug for us. Here you're going to see Claude edit the function file to fix the bug. And now Claude's going to rerun those tests. And the test are passing. So now if we rerun the function, look, our image no longer has that white background. Thanks, Claude.

Feedback / ReportSpotted an issue or have an improvement idea?