📖 Lesson content
Summary
Prompt caching is a feature that speeds up Claude's responses and reduces the cost of text generation by reusing computational work from previous requests. Instead of throwing away all the processing work after each request, Claude can save and reuse it when you send similar content again.
How Claude Normally Processes Requests
To understand prompt caching, let's first look at what happens during a typical request without caching enabled.

When you send a message to Claude, it doesn't immediately start generating a response. Instead, Claude performs extensive preprocessing work on your input:

- Tokenizes the prompt (breaks text into smaller units)
- Creates embeddings for each token (mathematical representations)
- Adds context based on surrounding text
- Only then generates the actual output text

After sending you the response, Claude discards all this computational work. Everything gets thrown away, and Claude declares itself ready for the next request.
The Problem with Repeated Content
Here's where things get inefficient. Imagine you're having a conversation with Claude, so your follow-up request includes:

- The same original user message from before
- Claude's previous response
- Your new follow-up message

Claude has to reprocess that original message all over again, even though it just analyzed the exact same content moments earlier. As Claude might think: "I just processed that message and threw away all the work I did. I could have reused it!"
How Prompt Caching Solves This
Prompt caching changes this wasteful process. Instead of discarding the preprocessing work, Claude saves it in a cache.

Here's how it works:
- Initial request: Claude processes your message and writes the computational work to a cache
- Follow-up requests: When Claude sees the same content again, it reads the previously processed work from the cache instead of starting over

The cache acts like a lookup table: "If I ever see this message again, I'll reuse this work I already did."
Key Benefits and Limitations

Prompt caching offers several advantages:
- Faster responses: Requests using cached content execute more quickly
- Lower costs: You pay less for processing that reuses cached work
- Automatic optimization: The initial request writes to cache, follow-up requests read from it
However, there are important limitations to keep in mind:
- Short lifespan: Cache only lives for 5 minutes
- Exact matches required: Only useful when you're repeatedly sending the same content
- Common use case: This happens extremely frequently in conversational applications and document analysis workflows
Prompt caching is particularly valuable for applications where users frequently reference the same documents, continue conversations, or iterate on similar prompts within a short timeframe.
🔁 Related lessons
- Next: Rules of prompt caching
- Previous: Citations
- Same section: Making a request · Multi-turn conversations · Chat exercise
- Part of paths: Path C
- Reference docs: Glossary · Skills atlas · By use-case
📚 Source & attribution
- Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-with-google-vertex/289196
- © 2025 Anthropic. Educational fair-use only.