Temperature

📖 Lesson content

Summary

Temperature is a powerful parameter that controls how creative or deterministic Claude's responses will be. Understanding how to use it effectively can dramatically improve your AI applications.

How Claude Generates Text

Before diving into temperature, it's helpful to understand Claude's text generation process. When you send Claude a prompt like "What do you think?", it goes through three phases:

Tokenization: Breaking your input into smaller chunks
Prediction: Calculating probabilities for possible next tokens
Sampling: Selecting a token based on those probabilities

In the diagram above, you can see how Claude might assign different probabilities to potential next tokens. The word "about" has a 30% chance, "would" has 20%, and so on. This process repeats for each token until the response is complete.

What Temperature Does

Temperature is a decimal value between 0 and 1 that directly influences these token selection probabilities. Think of it as a creativity dial:

Low temperature (near 0): Makes the highest probability token much more likely to be selected
High temperature (near 1): Distributes probability more evenly across all possible tokens

At temperature 0, Claude becomes deterministic - it will always pick the most probable token. At temperature 1, lower-probability tokens have a much better chance of being selected, leading to more creative and varied outputs.

Temperature Ranges and Use Cases

Different tasks call for different temperature settings:

Low Temperature (0.0 - 0.3)

Factual responses
Coding assistance
Data extraction
Content moderation

Medium Temperature (0.4 - 0.7)

Summarization
Educational content
Problem-solving
Creative writing with constraints

High Temperature (0.8 - 1.0)

Brainstorming
Creative writing
Marketing content
Joke generation

Setting Temperature in Code

By default, Claude's temperature is set to 1.0, which means maximum creativity. You can override this by adding temperature to your inference configuration:

def chat(messages, system=None, temperature=1.0):
    params = {
        "modelId": model_id,
        "messages": messages,
        "inferenceConfig": {"temperature": temperature}
    }
    
    if system:
        params["system"] = [{"text": system}]
    
    response = client.converse(**params)
    return response["output"]["message"]["content"][0]["text"]

Temperature in Practice

Here's a practical example using movie idea generation. With temperature set to the default (1.0), you might get creative responses like:

"A reclusive origami master discovers her intricate paper creations come to life at night, leading her on a magical journey to save their miniature world from a mysterious shadow creature threatening to unfold their existence."

But when you set temperature to 0.0 for the same prompt, you'll consistently get more predictable responses:

"A time-traveling archaeologist must prevent ancient artifacts from being stolen by a tech billionaire who's using them to build a doomsday device that harnesses their forgotten power."

Running the low-temperature version multiple times will produce very similar responses, often with repeated themes like "time-traveling historian" or "time-traveling archaeologist."