BM25 lexical search

📖 Lesson content

Summary

When building a RAG pipeline, you'll quickly discover that semantic search alone doesn't always return the best results. Sometimes you need exact term matches that semantic search might miss. The solution is to combine semantic search with lexical search using a technique called BM25.

The Problem with Semantic Search Alone

Let's say you're searching for a specific incident ID like "INC-2023-Q4-011" in a document. While this exact term appears multiple times in relevant sections, semantic search might return unrelated sections that are semantically similar but don't actually contain the specific term you're looking for.

This happens because semantic search focuses on meaning rather than exact matches. When you need precise term matching, you need a different approach.

Hybrid Search Strategy

The solution is to run two searches in parallel and merge the results:

Semantic Search - Uses embeddings and vector databases for meaning-based matching
Lexical Search - Uses classic text search for exact term matching
Merge Results - Combines both result sets for better coverage

How BM25 Works

BM25 (Best Match 25) is a popular algorithm for lexical search in RAG pipelines. Here's how it processes a search query:

The algorithm follows these key steps:

Tokenize the query - Break the user's question into individual terms
Count term frequency - See how often each term appears across all documents
Weight terms by rarity - Terms used less frequently get higher importance scores
Find best matches - Return chunks that contain more instances of the higher-weighted terms

The key insight is that rare terms like "INC-2023-Q4-011" are much more important for search than common words like "a" or "the".

Implementing BM25 Search

Here's how to set up a BM25 search system:


store = BM25Index()

for chunk in chunks:
    store.add_document({"content": chunk})

results = store.search("What happened with INC-2023-Q4-011?", 3)

The BM25 implementation provides the same API as your semantic search system - both have add_document() and search() methods, making them easy to use together.

Better Search Results

When you run the same query through BM25 that failed with semantic search alone, you get much better results. Instead of returning irrelevant sections, BM25 prioritizes the sections that actually contain your specific search terms.

The algorithm correctly identifies that "INC-2023-Q4-011" is a rare, important term and ranks documents containing it much higher than documents with only common words from the query.

Next Steps

Now that you have both semantic and lexical search systems working independently, the next step is merging their results. This hybrid approach gives you the best of both worlds - the contextual understanding of semantic search combined with the precision of exact term matching from lexical search.

Both search systems use similar APIs, making it straightforward to query both in parallel and combine their results into a single, more comprehensive result set.

Downloads

🔁 Related lessons

Next: A Multi-index RAG pipeline
Previous: Implementing the RAG flow
Same section: Making a request · Multi-turn conversations · Chat exercise
Part of paths: Path C
Reference docs: Glossary · Skills atlas · By use-case

📚 Source & attribution

Original Anthropic Academy lesson: https://anthropic.skilljar.com/claude-with-google-vertex/289195