Skip to main content

Claude 3 Opus as an economic analyst

TL;DR

  • Claude 3 Opus demonstrates advanced multi-modal capabilities, using tools like WebView and a Python interpreter to analyze complex economic data with high accuracy.
  • The model can effectively transcribe visual data, perform statistical analyses, and project future trends through Monte Carlo simulations.
  • A key feature, "Dispatch Subagents," enables Claude 3 to break down and parallelize global economic analyses across multiple sub-tasks and economies.

Takeaways

  • Claude 3 Opus utilizes a WebView tool to browse specified URLs, interpreting visual data on web pages for complex problem-solving.
  • The model integrates a Python interpreter to write and execute code for data plotting, statistical analysis, and advanced simulations like Monte Carlo.
  • Claude 3 achieved approximately 5% accuracy in transcribing real US GDP trendlines and averaged 11% accuracy on arbitrary made-up GDP graphs.
  • The Dispatch Subagents feature allows the model to decompose a large problem into multiple sub-problems, delegating them to other model instances for parallel processing.
  • Subagents can be provided with precise instructions and required data formats to ensure coordinated and efficient task execution.
  • This parallel processing capability allows for rapid, simultaneous analysis of multiple global economies, producing insights such as projected changes in GDP shares.

Vocabulary

Claude 3 Opus — The largest and most capable model in Anthropic's new Claude 3 family, known for advanced reasoning and multi-modal understanding. Multi-modal — An AI model capable of processing and understanding multiple types of data, such as text and images, simultaneously. WebView tool — A functionality that allows an AI model to access, view, and extract information directly from web pages at a specified URL. Python interpreter — An embedded tool that enables an AI model to write, execute, and debug Python code for tasks like data manipulation, analysis, or visualization. Monte Carlo simulations — A broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, often used to predict outcomes in complex systems. GDP trends — The general direction or pattern of change observed in a country's Gross Domestic Product over a period of time. Dispatch Subagents — A feature where a primary AI model can break down a complex task into smaller sub-tasks and delegate them to separate, parallel AI instances (subagents) for execution.

Transcript

In this video, we're going to see if Claude and a couple of friends can help us analyze the world economy in a matter of minutes. OK, I've asked Claude 3 Opus, which is the largest model in Anthropics new Claude 3 family, to look at the GDP trends for the US and write down a markdown table of what it sees. With given Opus and all the other models in the Claude 3 family, extensive training on tool use and one of the major tools it's using is this WebView tool. It goes to URL, looks at what's on the page, and because it's multi-modal, it can use the information on that page to solve complex problems. So here's the markdown, and it's important to note that Claude doesn't have direct access to these numbers. It's literally looking at the same browser you and I are seeing, looking at the trendline and trying to estimate what the exact numbers are. Let's see how accurate it was. We've asked the model to create a plot of the data, and it's used the second tool this Python interpreter to write out the code and then render the image for us to check. And here's the image. Look, it's actually added helpful little tooltip animations to explain some of the major peaks and troughs in the last decade or two of the US economy. And we can compare that graph with the actual data, and it turns out it's pretty close, it's actually within 5% accuracy. And by the way, Claude's transcription here isn't just coming from its pre-existing knowledge of US GDP. We tried it with a large sample of made-up GDP graphs, and its transcription accuracy was within 11% on average. Next, we asked the model to do some statistical analysis, projecting out into the future, performing simulations to see where the GDP of the US might head. And we can see that it's run this analysis using Python, and it's able to perform these Monte Carlo simulations to see what the range of GDP possibilities might look like for the next decade or so. But I wonder if we can go further. We're going to get the model to analyze a more complicated question, that is, how GDP might change across all of the biggest world economies, and then to help it do that, we're going to give it one more tool called Dispatch Subagents. This basically allows the model to break down the problem into lots of subproblems, and then write prompts for other versions of itself to help pick up the slack. The models can then complete a more complex task by all working together. Here you can see it's written this prompt and given very precise instructions that it wants the other models to follow, including a format for the data that is hoping to return. It's dispatched a version of this prompt to one model that's going to look at the US, one for China, one for Germany, Japan, and so on. I can see in these progress bars that the subagent models are now completing the set task for each of the individual economies. They're going to the relevant web pages, they're getting the information, they're running the code to analyze it, just like we saw in the previous US example, but all in parallel. Let's just skip forward to see what the model produced. You can see it's run the analysis, it's produced a pre and post pie chart of how it expects the world economy to look in 2030 versus 2020, and it's given us a written analysis too, where it makes variable predictions that relate to the statistical analysis that it ran. It's telling us that it thinks the GDP share of particular economies will change and which ones will be larger or smaller by 2030. So there we have it, complex, multi-step, multimodal analysis run by a model that can create subagents to get even more tasks running in parallel. We're excited to see what you or customers can do with these advanced Claude 3 capabilities.

Feedback / ReportSpotted an issue or have an improvement idea?