Controlling model output

📖 Lesson content

Summary

Beyond crafting better prompts, there are two powerful techniques for controlling Claude's output: prefilled assistant messages and stop sequences. These methods give you precise control over how Claude responds and when it stops generating text.

Prefilled Assistant Messages

Message prefilling lets you provide the beginning of Claude's response, which it will then continue from that starting point. This technique is incredibly useful for steering Claude in a specific direction.

Here's how it works: instead of just sending a user message, you add an assistant message at the end of your message list. Claude sees this assistant message and thinks "I've already started responding to this question, so I should continue from where I left off."

For example, if you ask "Is tea or coffee better at breakfast?" without prefilling, Claude typically gives a balanced response mentioning both options. But if you add an assistant message saying "Coffee is better because", Claude will continue from there and build a case for coffee.

The key thing to understand is that Claude continues from exactly where your prefilled text ends. If you write "Coffee is better because", Claude won't repeat that text - it will pick up right after "because" and complete the thought.

Here's the code structure:

messages = []
add_user_message(messages, "Is tea or coffee better at breakfast?")
add_assistant_message(messages, "Coffee is better because")
answer = chat(messages)

You can steer Claude in any direction using this technique:

Favor coffee: "Coffee is better because"
Favor tea: "Tea is better because"
Take a contrarian stance: "Neither is very good because"

Stop Sequences

Stop sequences force Claude to end its response as soon as it generates a specific string of characters. This is perfect for controlling the length or endpoint of responses.

The concept is straightforward: you provide a list of strings, and when Claude generates any of those strings, it immediately stops and returns whatever it has generated up to that point.

For example, if you ask Claude to "Count from 1 to 10" with a stop sequence of "5", you'll get:

add_user_message(messages, "Count from 1 to 10")
answer = chat(messages, stop_sequences=["5"])

This returns: "1, 2, 3, 4, " - stopping right before the "5" is included in the output.

You can be more precise with your stop sequences. If you want to avoid the trailing comma and space, use stop_sequences=[", 5"] instead. This will give you a cleaner result: "1, 2, 3, 4".

Stop sequences are particularly useful for:

Limiting list lengths
Stopping at specific markers or delimiters
Creating consistent output formats
Preventing overly long responses

Both techniques give you fine-grained control over Claude's behavior, allowing you to create more predictable and targeted responses for your applications.

🔁 Related lessons

Next: Structured data
Previous: Response streaming
Same section: Making a request · Multi-turn conversations · Chat exercise
Part of paths: Path C
Reference docs: Glossary · Skills atlas · By use-case

📚 Source & attribution

Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-with-google-vertex/289160