📖 Lesson content
Summary
After extracting text chunks from a document, the next step in a RAG pipeline is finding which chunks are most relevant to a user's question. This is essentially a search problem - you need to look through all your chunks and identify the ones that relate to what the user is asking about.

Semantic Search
The most common approach for finding relevant chunks is semantic search. Unlike traditional keyword-based search, semantic search uses text embeddings to understand the actual meaning of both the user's question and each text chunk. This allows the system to find conceptually related content even when the exact words don't match.

What Are Text Embeddings?
A text embedding is a numerical representation of the meaning contained in some text. Think of it as converting words and sentences into a format that computers can work with mathematically.

Here's how the process works:
- You feed text into an embedding model
- The model outputs a long list of numbers (the embedding)
- Each number ranges from -1 to +1
- These numbers represent different qualities or features of the input text
Understanding the Numbers
Each number in an embedding is essentially a "score" for some quality of the input text. However, here's the important caveat: we don't actually know what each specific number represents.

While it's helpful to imagine that one number might represent "how happy the text is" and another might represent "how much the text talks about oceans," these are just conceptual examples. The embedding model learns these features during training, but they're not explicitly labeled or interpretable to us.
Despite this opacity, embeddings are incredibly powerful because they capture semantic meaning in a way that allows for mathematical comparison between different pieces of text.
Embeddings on Vertex AI
Claude can't generate embeddings directly. Instead, you need to use a specialized embedding model. On Vertex AI, the model we'll use is called text-embedding-005.

Implementation
To work with embeddings on Vertex AI, you'll need to install the Google GenAI SDK:
pip install google-genai
Here's the basic setup for generating embeddings:
from google import genai
client = genai.Client(
project="YOUR_PROJECT_ID",
location="global",
vertexai=True
)
def generate_embedding(text):
response = client.models.embed_content(
model="text-embedding-005",
contents=text
)
if not response.embeddings:
return []
return [e.values for e in response.embeddings]
When you run this function with a text chunk, you'll get back a list of floating-point numbers representing the semantic meaning of that text. These embeddings form the foundation for implementing semantic search in your RAG system.
The next step is understanding how to use these embeddings to actually find the most relevant chunks for a user's question, which involves comparing embeddings mathematically to determine similarity.
Downloads
🔁 Related lessons
- Next: The full RAG flow
- Previous: Text chunking strategies
- Same section: Making a request · Multi-turn conversations · Chat exercise
- Part of paths: Path C
- Reference docs: Glossary · Skills atlas · By use-case
📚 Source & attribution
- Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-with-google-vertex/289188
- © 2025 Anthropic. Educational fair-use only.