- Alex, a prompt engineer at Anthropic, outlines five key strategies for effectively interacting with and optimizing responses from the Claude language model.
- He emphasizes an empirical, test-driven approach to prompt engineering, focusing on clear, structured instructions and specific examples.
- By leveraging techniques like explicit task descriptions, XML tags, long context windows, and "thinking tags," users can significantly enhance Claude's performance and accuracy.
Behind the prompt: Prompting tips for Claude.ai
- Prompt engineering is the practice of optimizing your input to a language model to achieve the best possible response.
- Anthropic employs an empirical, test-driven approach to prompt engineering, using benchmarks to scientifically measure prompt performance.
- Describe your task clearly and specifically to Claude, providing precise definitions (e.g., what constitutes "personal identifiable information" to remove).
- Mark different parts of your prompt with XML tags (
<tag>...</tag>), as Claude has been fine-tuned to recognize and pay special attention to this structure. - Include a wide range of examples within your prompt to help Claude learn and perform the task more effectively.
- Utilize Claude's long context capability, which can process up to 100,000 tokens (roughly 70,000 words) in a single prompt.
- Allow Claude time to "think" through complex questions using "thinking tags" (
<thinking>...</thinking>) before generating its final answer, which has been shown to improve performance.
Prompt engineering — The practice of optimizing the input (prompt) given to a language model to elicit the most desirable or accurate output.
Red teaming — A security practice where a team simulates an attack on a system (like a language model) to find vulnerabilities, biases, or harmful behaviors.
Jail bricks — Specific prompts designed to bypass or "circumvent the filters" and safety guardrails of a language model.
Circumvent the filters — To find a way around the safety mechanisms or restrictions put in place on a language model.
Benchmarks — Standardized tests or criteria used to evaluate the performance, accuracy, or safety of a language model.
PII — Personal Identifiable Information; data that can be used to identify a specific individual (e.g., email addresses, phone numbers).
XML tags — Structured markers (like <task>...</task>) used within a prompt to delineate different sections or provide specific instructions to the model.
Tokens — The basic units of text (which can be words, parts of words, or punctuation) that a language model processes.
Long context — The ability of a language model to process and understand a very large amount of text in a single input.
Thinking tags — Specific tags (e.g., <thinking>...</thinking>) used in a prompt to instruct a language model to perform internal reasoning steps before producing a final answer.
I'm Alex, I'm a prompt engineer at Anthropic. I help people get the most out of Claude with safety at the top of my mind. First got into prompt engineering back in last August. Anthropic released their paper, read team language models to reduce harms, and immediately I read it and it was hooked. I was inspired to see that a company was taking a safety first approach to researching language models, and I thought it was really interesting how you could see the ways that models would output to different and diverse ranges of prompts. You may be familiar with red teaming attacks as prompt exploits or the more infamous name, jail bricks. I decided to start writing jail bricks after reading the paper and becoming inspired by the opportunities that still existed to red team these models. Jail bricks are specific prompts that are written to circumvent the filters that have been applied on top of language models. Anth engineering is the practice of optimizing your prompt in order to get the best response from the language model. At Anthropic we like to take an empirical test driven approach to prompt engineering. Whenever we write a new prompt, we run it against a series of benchmarks in order to scientifically measure its performance. With Claude we've discovered a set of best practices that allow you to get the most out of the model. So let's get into it. Here are my five tips for getting the best performance from Claude. First, describe your task. Claude responds well to clear direct and specific instructions. Let's say you wanted Claude to remove personal identifiable information from a piece of text. Explaining the Claude exactly what that means helps Claude recognize what pieces of text to remove. For example, email addresses and phone numbers. Second, mark different parts of your prompt with XML tags. XML tags look like this. Claude has been fine-tuned to pay special attention to their structure. In our example, we use XML tags to indicate the beginning and end of tags that Claude needs to de-identify. Third, give examples. The more examples, the better. Including a wide range of examples helps Claude learn how to do the task. Back to our PII prompt, we provide Claude with examples of how to de-identify text within XML tags. Fourth, make use of the long context. Claude can read up to 100,000 tokens. That's roughly 70,000 words, or the length of the entire great Gatsby. And finally, the last tip is to let Claude think. Researchers have discovered that giving language monosome time to think through their response before producing their final answer leads to better performance. With Claude, we like to use thinking tags so that it can jot down its ideas before answering a complex question. Here in this example, you can see Claude starts to reason within thinking tags, and then outputs his final answer. Alright, so those are my top tips for getting the most out of Claude and a little bit about me and my own prompting journey. Stay up to date on the latest prompting best practices. Make sure to go check out our developer dog site. And if you haven't got access to the Claude API yet, you can still practice your prompt engineering right now at Claude.ai.
TL;DR
- Alex, a prompt engineer at Anthropic, outlines five key strategies for effectively interacting with and optimizing responses from the Claude language model.
- He emphasizes an empirical, test-driven approach to prompt engineering, focusing on clear, structured instructions and specific examples.
- By leveraging techniques like explicit task descriptions, XML tags, long context windows, and "thinking tags," users can significantly enhance Claude's performance and accuracy.
Takeaways
- Prompt engineering is the practice of optimizing your input to a language model to achieve the best possible response.
- Anthropic employs an empirical, test-driven approach to prompt engineering, using benchmarks to scientifically measure prompt performance.
- Describe your task clearly and specifically to Claude, providing precise definitions (e.g., what constitutes "personal identifiable information" to remove).
- Mark different parts of your prompt with XML tags (
<tag>...</tag>), as Claude has been fine-tuned to recognize and pay special attention to this structure. - Include a wide range of examples within your prompt to help Claude learn and perform the task more effectively.
- Utilize Claude's long context capability, which can process up to 100,000 tokens (roughly 70,000 words) in a single prompt.
- Allow Claude time to "think" through complex questions using "thinking tags" (
<thinking>...</thinking>) before generating its final answer, which has been shown to improve performance.
Vocabulary
Prompt engineering — The practice of optimizing the input (prompt) given to a language model to elicit the most desirable or accurate output.
Red teaming — A security practice where a team simulates an attack on a system (like a language model) to find vulnerabilities, biases, or harmful behaviors.
Jail bricks — Specific prompts designed to bypass or "circumvent the filters" and safety guardrails of a language model.
Circumvent the filters — To find a way around the safety mechanisms or restrictions put in place on a language model.
Benchmarks — Standardized tests or criteria used to evaluate the performance, accuracy, or safety of a language model.
PII — Personal Identifiable Information; data that can be used to identify a specific individual (e.g., email addresses, phone numbers).
XML tags — Structured markers (like <task>...</task>) used within a prompt to delineate different sections or provide specific instructions to the model.
Tokens — The basic units of text (which can be words, parts of words, or punctuation) that a language model processes.
Long context — The ability of a language model to process and understand a very large amount of text in a single input.
Thinking tags — Specific tags (e.g., <thinking>...</thinking>) used in a prompt to instruct a language model to perform internal reasoning steps before producing a final answer.
Transcript
I'm Alex, I'm a prompt engineer at Anthropic. I help people get the most out of Claude with safety at the top of my mind. First got into prompt engineering back in last August. Anthropic released their paper, read team language models to reduce harms, and immediately I read it and it was hooked. I was inspired to see that a company was taking a safety first approach to researching language models, and I thought it was really interesting how you could see the ways that models would output to different and diverse ranges of prompts. You may be familiar with red teaming attacks as prompt exploits or the more infamous name, jail bricks. I decided to start writing jail bricks after reading the paper and becoming inspired by the opportunities that still existed to red team these models. Jail bricks are specific prompts that are written to circumvent the filters that have been applied on top of language models. Anth engineering is the practice of optimizing your prompt in order to get the best response from the language model. At Anthropic we like to take an empirical test driven approach to prompt engineering. Whenever we write a new prompt, we run it against a series of benchmarks in order to scientifically measure its performance. With Claude we've discovered a set of best practices that allow you to get the most out of the model. So let's get into it. Here are my five tips for getting the best performance from Claude. First, describe your task. Claude responds well to clear direct and specific instructions. Let's say you wanted Claude to remove personal identifiable information from a piece of text. Explaining the Claude exactly what that means helps Claude recognize what pieces of text to remove. For example, email addresses and phone numbers. Second, mark different parts of your prompt with XML tags. XML tags look like this. Claude has been fine-tuned to pay special attention to their structure. In our example, we use XML tags to indicate the beginning and end of tags that Claude needs to de-identify. Third, give examples. The more examples, the better. Including a wide range of examples helps Claude learn how to do the task. Back to our PII prompt, we provide Claude with examples of how to de-identify text within XML tags. Fourth, make use of the long context. Claude can read up to 100,000 tokens. That's roughly 70,000 words, or the length of the entire great Gatsby. And finally, the last tip is to let Claude think. Researchers have discovered that giving language monosome time to think through their response before producing their final answer leads to better performance. With Claude, we like to use thinking tags so that it can jot down its ideas before answering a complex question. Here in this example, you can see Claude starts to reason within thinking tags, and then outputs his final answer. Alright, so those are my top tips for getting the most out of Claude and a little bit about me and my own prompting journey. Stay up to date on the latest prompting best practices. Make sure to go check out our developer dog site. And if you haven't got access to the Claude API yet, you can still practice your prompt engineering right now at Claude.ai.