A Multi-index RAG pipeline

📖 Lesson content

Summary

When you have both semantic search (vector embeddings) and lexical search (BM25) working independently, the next step is combining them into a unified search pipeline. This hybrid approach leverages the strengths of both methods to deliver more accurate results.

Creating a Unified Interface

Both search implementations share nearly identical APIs - they both have add_document() and search() methods that work the same way. This consistency makes it straightforward to wrap them in a single Retriever class.

The Retriever acts as a coordinator that forwards user queries to both indexes, collects their results, and merges them into a single ranked list.

Reciprocal Rank Fusion

The challenge is merging results from different search methods that use different scoring systems. Vector search returns cosine similarity scores, while BM25 returns relevance scores - you can't simply combine these numbers directly.

Instead, we use a technique called Reciprocal Rank Fusion (RRF). This method focuses on the rank position of results rather than their raw scores.

Here's how it works with an example. Say your vector search returns sections 2, 7, and 6 in that order, while BM25 returns sections 6, 2, and 7. To merge these:

First, create a table showing each text chunk and its rank from both search methods:

Section 2: Rank 1 from vector, rank 2 from BM25
Section 7: Rank 2 from vector, rank 3 from BM25
Section 6: Rank 3 from vector, rank 1 from BM25

Then apply the RRF formula to calculate a combined score for each chunk:

RRF_score(d) = Σ(1 / (k + rank_i(d)))

Where k is a constant (typically 60, but we'll use 1 for clearer results) and rank_i(d) is the rank of document d in the i-th search result.

For our example:

Section 2: 1.0/(1+1) + 1.0/(1+2) = 0.833
Section 7: 1.0/(1+2) + 1.0/(1+3) = 0.583
Section 6: 1.0/(1+3) + 1.0/(1+1) = 0.75

The final ranking becomes: Section 2 (0.833), Section 6 (0.75), Section 7 (0.583). This makes intuitive sense - Section 2 performed well in both searches, Section 6 had mixed results, and Section 7 ranked lower overall.

Implementation

The Retriever class implementation is straightforward:

class Retriever:
    def __init__(self, *indexes):
        self._indexes = list(indexes)
    
    def add_document(self, document):
        for index in self._indexes:
            index.add_document(document)
    
    def search(self, query_text, k=1, k_rrf=60):
        # Get results from all indexes
        all_results = [index.search(query_text, k) for index in self._indexes]
        
        # Apply reciprocal rank fusion
        # ... merge logic here ...

The merge logic tracks document ranks across all search results, calculates RRF scores, and returns the top-k documents sorted by their combined scores.

Testing the Hybrid Approach

When testing with the query "what happened with INC-2023-Q4-011?", the hybrid approach delivers much better results than vector search alone:

The results now correctly prioritize:

Section 10: Cybersecurity Analysis (the actual incident report)
Section 2: Software Engineering (relevant context)
Section 5: Legal Developments (less relevant but still related)

Benefits of the Hybrid Architecture

This design offers several advantages:

Modular design: Each search index is implemented independently with the same API
Easy extensibility: You can add new search methods by implementing the same search() and add_document() interface
Better accuracy: Combines semantic understanding with exact keyword matching
Flexible fusion: The RRF algorithm works regardless of how many search indexes you combine

The consistent API means you could easily add a third search index - perhaps one that specializes in named entity recognition or handles specific document types - and the Retriever would automatically incorporate its results into the final ranking.

This hybrid search foundation provides significantly more robust retrieval than either method alone, setting up your RAG pipeline for better performance across a wider range of query types.

Downloads

🔁 Related lessons

Next: Reranking results
Previous: BM25 lexical search
Same section: Making a request · Multi-turn conversations · Chat exercise
Part of paths: Path C
Reference docs: Glossary · Skills atlas · By use-case

📚 Source & attribution

Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-with-google-vertex/289193