Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeepΒ 
posted an update 1 day ago
Post
739
Although this might sound like another way to make money on LLM API calls...

Good folks at @AnthropicAI just introduced Contextual Retrieval, and it's a significant yet logical step up from simple Retrieval-Augmented Generation (RAG)!

Here are the steps to implement Contextual Retrieval based on Anthropic's approach:

1. Preprocess the knowledge base:
- Break down documents into smaller chunks (typically a few hundred tokens each).
- Generate contextual information for each chunk using Claude 3 Haiku with a specific prompt.
- Prepend the generated context (usually 50-100 tokens) to each chunk.

2. Create embeddings and a BM25 index:
- Use an embedding model (Gemini or Voyage recommended) to convert contextualized chunks into vector embeddings.
- Create a BM25 index using the contextualized chunks.

3. Set up the retrieval process:
- Implement a system to search both the vector embeddings and the BM25 index.
- Use rank fusion techniques to combine and deduplicate results from both searches.

4. Implement reranking (optional but recommended):
- Retrieve the top 150 potentially relevant chunks initially.
- Use a reranking model (e.g., Cohere reranker) to score these chunks based on relevance to the query.
- Select the top 20 chunks after reranking.

5. Integrate with the generative model:
- Add the top 20 chunks (or top K, based on your specific needs) to the prompt sent to the generative model.

6. Optimize for your use case:
- Experiment with chunk sizes, boundary selection, and overlap.
- Consider creating custom contextualizer prompts for your specific domain.
- Test different numbers of retrieved chunks (5, 10, 20) to find the optimal balance.

7. Leverage prompt caching:
- Use Claude's prompt caching feature to reduce costs when generating contextualized chunks.
- Cache the reference document once and reference it for each chunk, rather than passing it repeatedly.

8. Evaluate and iterate