Extended thinking

📖 Lesson content

Summary

Extended thinking is Claude's advanced feature that gives the model time to reason through complex problems before generating a final response. Think of it as Claude's internal monologue - you can see how it approaches your problem step by step.

How Extended Thinking Works

When you enable extended thinking, Claude's response includes two parts instead of one:

Reasoning Content Part - Claude's internal thinking process
Text Part - The final response you actually wanted

The reasoning content shows you exactly how Claude breaks down your problem, what it considers, and how it arrives at its final answer. This transparency can be incredibly valuable for understanding and debugging complex tasks.

Trade-offs to Consider

Extended thinking comes with clear benefits and costs:

Better accuracy on complex tasks
Higher cost - you pay for all thinking tokens
Increased latency - thinking takes time

The key decision point is simple: use your evaluations. If you've already optimized your prompt but still aren't getting the accuracy you need, that's when extended thinking becomes worth considering.

The Signature System

One important detail you'll notice immediately is the cryptographic signature attached to reasoning content:

This signature ensures you can't modify the thinking text. If you want to include Claude's previous reasoning in a follow-up conversation, the signature verifies the content hasn't been tampered with. This prevents potential safety issues from modified reasoning text.

Redacted Content

Sometimes Claude's thinking gets flagged by safety systems. When this happens, you'll receive a redactedContent field instead of readable thinking text:

The redacted content is encrypted but still functional - you can pass it back to Claude in future conversations without losing context. It's just not readable to you as a developer.

Implementation

To enable extended thinking, you need to modify your API call with two parameters:

additional_model_fields["thinking"] = {
    "type": "enabled",
    "budget_tokens": thinking_budget
}

The thinking_budget controls how many tokens Claude can spend on reasoning. The minimum is 1024 tokens, but you might need more for complex problems. Like everything else with Claude, use your evaluations to find the right budget for your use case.

Here's how the updated chat function looks:

def chat(
    messages,
    system=None,
    temperature=1.0,
    stop_sequences=[],
    tools=None,
    tool_choice="auto",
    text_editor=None,
    thinking=False,
    thinking_budget=1024
):

Testing Your Implementation

When building applications that handle extended thinking, you'll want to test both normal reasoning content and redacted content scenarios. There's actually a special test string that forces Claude to return redacted content - useful for making sure your code handles both cases properly.

The most important takeaway about extended thinking is that the decision to use it should always be data-driven. Run your evaluations first, optimize your prompts, and only then consider extended thinking if you need that extra boost in accuracy for complex tasks.

Downloads

🔁 Related lessons

Next: Image support
Previous: Quiz on Retrieval Augmented Generation
Same section: Overview of Claude Models · Accessing the API · Making a request
Part of paths: Path C
Reference docs: Glossary · Skills atlas · By use-case

📚 Source & attribution

Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-in-amazon-bedrock/276788