Introducing Retrieval Augmented Generation

📖 Lesson content

Summary

Retrieval Augmented Generation (RAG) is a technique that helps you work with large documents when using Claude. Instead of cramming an entire 800-page financial report into a single prompt, RAG lets you intelligently find and include only the most relevant sections for each question.

The Problem with Large Documents

Imagine you have a massive financial document and want to ask Claude specific questions about it, like "What risk factors does this company have?" You face a fundamental challenge: how do you get the right information from the document into Claude so it can answer your question effectively?

Option 1: Include Everything in the Prompt

The first approach seems straightforward - extract all the text from the document and stuff it directly into your prompt along with the user's question.

This approach has several problems:

There's a hard limit on how much text Claude can process - your document might be too long
Claude becomes less effective with very long prompts
Larger prompts cost more money and take longer to process

Option 2: Break Documents into Chunks

The second approach is more sophisticated. You break the document into smaller chunks during a preprocessing step, then find and include only the chunks relevant to each user question.

Here's how it works: when a user asks "What risks does this company face?", you search through your chunks to find the one about "Risk Factors" and include only that section in your prompt to Claude.

Benefits of the Chunking Approach

Claude can focus on only the most relevant content
Scales up to very large documents
Works with multiple documents
Smaller prompts cost less and run faster

Challenges with Chunking

Requires a preprocessing step to split documents
Need a searching mechanism to find "relevant" chunks
Included chunks might not contain all the context Claude needs
Many ways to chunk text - which approach is best?

For example, if you only include the "Risk Factors" section, you might miss important context from the "Strategy Outlook" section that addresses how the company plans to handle those risks.

This is RAG

Option 2 is Retrieval Augmented Generation. Despite its complexity, RAG offers significant advantages for working with large documents, but it comes with technical challenges that require careful consideration.

The key components of RAG are:

Document preprocessing and chunking
A search mechanism to find relevant chunks
Intelligent selection of which chunks to include in prompts

When considering RAG for your application, you need to evaluate whether the benefits outweigh the additional complexity for your specific use case. The technique shines when working with large document collections where you need precise, contextual answers, but it requires more upfront engineering work than simply including entire documents in prompts.

🔁 Related lessons

Next: Text chunking strategies
Previous: Quiz on tool use with Claude
Same section: Making a request · Multi-turn conversations · Chat exercise
Part of paths: Path C
Reference docs: Glossary · Skills atlas · By use-case

📚 Source & attribution

Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-with-google-vertex/289191