📖 Lesson content
Summary
When you have both semantic search (vector embeddings) and lexical search (BM25) working independently, the next step is combining them into a unified search pipeline. This hybrid approach leverages the strengths of both methods to deliver more accurate results.

Creating a Unified Interface
Both search implementations share nearly identical APIs - they both have add_document() and search() methods that work the same way. This consistency makes it straightforward to wrap them in a single Retriever class.

The Retriever acts as a coordinator that forwards user queries to both indexes, collects their results, and merges them into a single ranked list.

Reciprocal Rank Fusion
The challenge is merging results from different search methods that use different scoring systems. Vector search returns cosine similarity scores, while BM25 returns relevance scores - you can't simply combine these numbers directly.
Instead, we use a technique called Reciprocal Rank Fusion (RRF). This method focuses on the rank position of results rather than their raw scores.

Here's how it works with an example. Say your vector search returns sections 2, 7, and 6 in that order, while BM25 returns sections 6, 2, and 7. To merge these:

First, create a table showing each text chunk and its rank from both search methods:
- Section 2: Rank 1 from vector, rank 2 from BM25
- Section 7: Rank 2 from vector, rank 3 from BM25
- Section 6: Rank 3 from vector, rank 1 from BM25

Then apply the RRF formula to calculate a combined score for each chunk:
RRF_score(d) = Σ(1 / (k + rank_i(d)))
Where k is a constant (typically 60, but we'll use 1 for clearer results) and rank_i(d) is the rank of document d in the i-th search result.

For our example:
- Section 2: 1.0/(1+1) + 1.0/(1+2) = 0.833
- Section 7: 1.0/(1+2) + 1.0/(1+3) = 0.583
- Section 6: 1.0/(1+3) + 1.0/(1+1) = 0.75

The final ranking becomes: Section 2 (0.833), Section 6 (0.75), Section 7 (0.583). This makes intuitive sense - Section 2 performed well in both searches, Section 6 had mixed results, and Section 7 ranked lower overall.

Implementation
The Retriever class implementation is straightforward:

class Retriever:
def __init__(self, *indexes):
self._indexes = list(indexes)
def add_document(self, document):
for index in self._indexes:
index.add_document(document)
def search(self, query_text, k=1, k_rrf=60):
# Get results from all indexes
all_results = [index.search(query_text, k) for index in self._indexes]
# Apply reciprocal rank fusion
# ... merge logic here ...

The merge logic tracks document ranks across all search results, calculates RRF scores, and returns the top-k documents sorted by their combined scores.
Testing the Hybrid Approach
When testing with the query "what happened with INC-2023-Q4-011?", the hybrid approach delivers much better results than vector search alone:

The results now correctly prioritize:
- Section 10: Cybersecurity Analysis (the actual incident report)
- Section 2: Software Engineering (relevant context)
- Section 5: Legal Developments (less relevant but still related)

Benefits of the Hybrid Architecture
This design offers several advantages:
- Modular design: Each search index is implemented independently with the same API
- Easy extensibility: You can add new search methods by implementing the same
search()andadd_document()interface - Better accuracy: Combines semantic understanding with exact keyword matching
- Flexible fusion: The RRF algorithm works regardless of how many search indexes you combine

The consistent API means you could easily add a third search index - perhaps one that specializes in named entity recognition or handles specific document types - and the Retriever would automatically incorporate its results into the final ranking.

This hybrid search foundation provides significantly more robust retrieval than either method alone, setting up your RAG pipeline for better performance across a wider range of query types.
Downloads
🔁 Related lessons
- Next: Reranking results
- Previous: BM25 lexical search
- Same section: Making a request · Multi-turn conversations · Chat exercise
- Part of paths: Path C
- Reference docs: Glossary · Skills atlas · By use-case
📚 Source & attribution
- Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-with-google-vertex/289193
- © 2025 Anthropic. Educational fair-use only.