29 Jan 2025

Building a RAG system with Meilisearch: a comprehensive guide

Discover best practices for building a RAG system, with tips on optimizing documents, integrating AI, and why effective retrieval is key to success.

Carolina FerreiraDeveloper Advocate @ Meilisearch@CarolainFG

Building a RAG system with Meilisearch: a comprehensive guide

Retrieval Augmented Generation (RAG) has become an essential component of modern AI applications, enabling more accurate and controllable responses from Large Language Models (LLMs). While vector databases are the standard for RAG, Meilisearch stands out as a fast, open-source alternative with AI-powered search, exceptional relevancy, and remarkable speed.

This guide will walk you through building and optimizing a RAG system using Meilisearch.

Understanding RAG

RAG is a process that enhances LLM outputs by grounding them in external, retrievable data. Instead of relying solely on the model's trained knowledge, RAG systems first retrieve relevant information from a curated knowledge base, then use this context to generate responses.

The typical RAG workflow consists of three main steps:

Retrieval: query the knowledge base to find relevant documents or passages
Augmentation: combine the retrieved information with the user's query
Generation: use an LLM to generate a response based on both the query and retrieved context

Key components of RAG

A RAG system comprises three essential components:

External data source External data sources are the foundation of a RAG system. These sources such as knowledge bases, or technical documentation provide the information the LLM uses to generate responses. The quality of this data directly impacts performance; it must be well-organized, and regularly updated for accuracy and relevancy.
Vector store The vector store serves as the bridge between raw data and the LLM. It converts text into vector embeddings—numerical representations of meaning. These vectors allow efficient similarity searches, enabling quick retrieval of relevant information. Modern tools like Meilisearch combine keyword search with semantic similarity to deliver fast and scalable results.
Large Language Model The LLM is the system's intelligence, responsible for understanding user queries and generating coherent, relevant responses. It combines user queries with context retrieved from the vector store to produce accurate replies. Models like GPT-4, Claude, or Llama 2 excel at creating human-like responses within the constraints of the provided context.

Why LLMs need RAG: overcoming key limitations

Large Language Models excel at general knowledge but face two significant limitations:

they struggle with specialized domain-specific information
they are constrained by their last training sessions, relying on outdated knowledge and often lagging months or even years behind current advancements.

RAG lets you tackle both challenges at once. For instance, a legal firm can enhance their LLM's capabilities by incorporating not only their historical case archives but also the latest court decisions and regulatory changes. A healthcare provider might integrate both established medical literature and recent clinical trials or updated treatment protocols.

The ability to continuously update your knowledge base ensures that your LLM-powered applications can provide accurate, up-to-date responses that combine deep domain expertise with the latest information in your field.

How to optimize document retrieval in RAG Systems

Efficient information retrieval is crucial for RAG. Without precise and relevant document retrieval, even the most advanced LLMs can produce inaccurate or incomplete responses. The goal is to ensure that only the most relevant, contextually rich documents are retrieved in response to a query.

Choosing the right document retrieval system is a crucial step in this process. Meilisearch offers a fast, open-source search engine that supports keyword searches and more advanced AI-powered search approaches that combine exact word matching with semantic search. This dual capability makes it an ideal tool for RAG systems, where the goal is to retrieve not only documents that match keywords but also those that are semantically related

Meilisearch offers a range of features specifically suited for RAG systems:

Easy embedder integration: Meilisearch automatically generates vector embeddings, enabling high-quality semantic retrieval with minimal setup and flexibility to choose the latest embedder models.
Hybrid search capabilities: Combine keyword and semantic (vector-based) search to deliver broader, more accurate document retrieval.
Speed and performance: Meilisearch delivers ultra-fast response times, ensuring that retrieval is never a bottleneck in your LLM workflow.
Customizable relevancy: Adjust ranking rules and sort documents based on attributes like freshness or importance, to prioritize the most valuable results. Set a relevancy threshold to exclude less relevant results from the search.

Once you've established your retrieval system, the next step is to optimize how your data is stored, indexed, and retrieved. The following strategies—document chunking, metadata enrichment, and relevancy tuning—will ensure that every search query returns the most useful and contextually relevant information.

How to chunk documents to maximize relevancy

Breaking down documents into optimal-sized chunks is crucial for effective retrieval. Chunks should be large enough to maintain context but small enough to be specific and relevant. Consider semantic boundaries like paragraphs or sections rather than arbitrary character counts.

Enriching metadata to boost search precision

Enhance your documents with rich metadata to improve retrieval accuracy. Include categories, tags, timestamps, authors, and other relevant attributes. For example, tagging technical documentation with specific product versions can significantly improve retrieval quality.

Adjusting relevancy for accurate results

Fine-tune your search parameters based on your specific use case. Adjust the hybrid search semantic ratio to balance conceptual understanding and exact matching based on the needs of your domain. Use the ranking score threshold to filter out low-quality matches, but be careful not to set it too high and miss valuable contextual information.

Setting up Meilisearch for RAG

The quality of the retrieval system directly impacts the accuracy and reliability of generated responses. Meilisearch stands out as a search engine for RAG implementations, thanks to its AI-powered search capabilities, customizable document processing, and advanced ranking controls.

Set Meilisearch up

Unlike traditional vector stores that rely solely on semantic search, Meilisearch combines vector similarity with full-text search, giving you the best of both worlds.

First, you need to create a Meilisearch project and activate the AI-powered search feature.

Then, you need to configure the embedder of your choice. We are going to use an OpenAI embedder, but Meilisearch also supports embedders from HuggingFace, Ollama, and any embedder accessible via a RESTful API:

import os
import meilisearch

client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY'))

# An index is where the documents are stored.
index = client.index('domain-data')

index.update_embedders({
    "openai": {
        "source": "openAi",
        "apiKey": "OPEN_AI_API_KEY",
        "model": "text-embedding-3-small",
        "documentTemplate": "A document titled '{{doc.hierarchy_lvl1}}'. Under the section '{{doc.hierarchy_lvl2}}'. This is further divided into '{{doc.hierarchy_lvl3}}'. It discusses {{doc.content}}."
    }
})

Note: You'll need to replace OPEN_AI_API_KEY with your OpenAI API key.

Smart document processing with Meilisearch's document template

Meilisearch’s document template allows you to customize embeddings for each document, ensuring only the most relevant fields are included.

Customizing your document processing helps you:

Increase retrieval relevance with precise embeddings
Lower costs by reducing unnecessary tokens
Ensure consistency across different document types
Support domain-specific needs for unique data formats
Iterate and refine embedding strategies as your system evolves

Here’s an example document from the Meilisearch documentation:

{
    "hierarchy_lvl1":"Filter expression reference"
    "hierarchy_lvl2":"Filter expressions"
    "hierarchy_lvl3":"Creating filter expressions with arrays"
    "content":"Inner array elements are connected by an OR operator. The following expression returns either horror or comedy films"
    "hierarchy_lvl0":"Filtering and sorting"
    "anchor":"creating-filter-expressions-with-arrays"
    "url":"https://www.meilisearch.com/docs/learn/filtering_and_sorting/filter_expression_reference#creating-filter-expressions-with-arrays"
    "objectID":"bbcce6ab00badb2a377b455ba16180d"
    "publication_date":"1733986800"
}

To optimize the embeddings for this document, we’ve decided to focus on the most meaningful fields:

Headings: The values of hierarchy_lvl0 to hierarchy_lvl3 will be included in the embeddings to retain document structure and context
Content: The value of content will be embedded as it provides the essential text needed for semantic search

Other fields, like publication_date, will be excluded from embeddings but remain available for sorting. This allows Meilisearch to sort by date while keeping embeddings lean and focused on relevancy

Meilisearch customizable ranking rules

Meilisearch offers fine-grained control over result ranking, enabling you to customize how search results are ordered and prioritized. This control ensures that users see the most relevant content first, tailored to your specific business or domain needs.

Unlike fixed ranking systems, Meilisearch allows you to define your own ranking rules. This flexibility helps you prioritize certain types of content, promote newer or more relevant results, and create a search experience that aligns with user expectations.

For instance, we have added to the default ranking rules, a custom rule that promote newer documents.

# Configure settings

import os
import meilisearch

# Initialize the Meilisearch client
client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY'))

# An index is where the documents are stored.
index = client.index('domain-data')

index.update_settings({
    'rankingRules': [
        "words",
        "typo",
        "proximity",
        "attribute",
        "sort",
        "exactness",
        "publication_date:desc",
    ],
    'searchableAttributes': [
        'hierarchy_lvl1',
        'hierarchy_lvl2',
        'hierarchy_lvl3',
        'content'
    ]
})

Index your documents

After setting up Meilisearch and preparing your data using best practices like document chunking and metadata enrichment, you can now push your data to Meilisearch.

Meilisearch accepts data in .json, .ndjson, and .csv formats. There are several ways to upload your documents:

Drag and drop files into the Cloud UI.
Use the API via the /indexes/{index_uid}/documents route.
Call the method from your preferred SDK

💡 Note: Your documents must have a unique identifier (id). This is crucial for Meilisearch to identify and update records correctly.

Here’s how to upload documents using the Python SDK:

import os
import meilisearch
import json

# Initialize Meilisearch client
client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY')))

# Select or create the index
index = client.index('domain-data')

# Load the JSON file
with open('path/to/your/file.json', 'r') as file:
    documents = json.load(file)  # Load the array of JSON objects as a Python list

# Add documents to Meilisearch
index.add_documents(documents)

Perform an AI-powered search

Perform AI-powered searches with q and hybrid to retrieve search results using the embedder you configured earlier.

Meilisearch will return a mix of semantic and full-text matches, prioritizing results that match the query's meaning and context. You can fine-tune this balance using the semanticRatio parameter:

index.search(
    userQuery,  
    {
        "hybrid": {
            "embedder": "openai",
            "semanticRatio": 0.7  # 70% semantic, 30% full-text
        }
    }
)

This flexible control lets you:

Optimize the balance to fit your specific use case.
Adapt in real-time based on query patterns.
Combine the strengths of both methods, ensuring you don't miss key results.

This dual approach ensures you won't miss relevant results that might slip through the cracks of pure semantic search, while maintaining the benefits of semantic understanding.

Quality control with ranking score threshold

The rankingScoreThreshold parameter ensures that only high-quality results are included in the search response. It works in tandem with the ranking score, a numeric value ranging from 0.0 (poor match) to 1.0 (perfect match). Any result with a ranking score below the specified rankingScoreThreshold is excluded.

By setting a ranking score threshold, you can:

Filter out low-relevance results to improve overall result quality
Provide better context for RAG systems, ensuring LLMs work with higher-quality data
Reduce noise in search results, minimizing irrelevant information
Customize relevancy to align with your specific use case needs

The following query only returns results with a ranking score bigger than 0.3:

index.search(
    userQuery,  
    {
        "hybrid": {
            "embedder": "openai",
            "semanticRatio": 0.7  # 70% semantic, 30% full-text
        },
        "rankingScoreThreshold": 0.4

    }
)

Ready to build your RAG system? Now that we've set up Meilisearch. We'll walk you through the steps to create a RAG system with Meilisearch.

Implementing RAG with Meilisearch

We'll build a RAG system using the Meilisearch documentation as our example knowledge base, demonstrating how to retrieve, process, and generate accurate, context-aware responses.

Key technologies used

Our implementation leverages several key technologies:

FastAPI: powers the API that handles user queries
Meilisearch: retrieves the relevant content
OpenAI's GPT-4: generates human-like, contextual responses
LangChain: orchestrates the AI workflow by chaining the search and LLM response generation.

How the system works

When a user submits a question, the system follows these steps:

User input: The user submits a query to the API
Content retrieval: Meilisearch searches for the most relevant content using a combination of keyword and semantic search
Context construction: the system builds a hierarchical context from the search results
LLM generation: the context and user query are sent to GPT-4 to generate an accurate, practical response
Response delivery: the system returns the LLM-generated answer along with the sources used to generate it

Setting up the environment

API keys and credentials are stored on environment variables in a .env file. We use dotenv to load them.

Here's how key services are initialized:

Meilisearch client: connects to the Meilisearch instance using the host and API key.
OpenAI client: authenticates the GPT-4 LLM via an API key
FastAPI application: sets up the web API for users to interact with the system

language-python=1

import os
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from meilisearch import Client
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize FastAPI application
app = FastAPI()

# Initialize Meilisearch client
client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY')))

# Initialize OpenAI
llm = ChatOpenAI(temperature=0, model="gpt-4o", api_key=os.getenv('OPENAI_API_KEY'))

Configuring CORS middleware

To ensure the system can handle requests from different origins (like frontend clients), we configure Cross-Origin Resource Sharing (CORS) for the FastAPI app. This allows cross-origin requests from any domain.

language-python=23

# Configure CORS middleware to allow cross-origin requests
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Allows all origins
    allow_credentials=True,  # Allows credentials (cookies, authorization headers, etc.)
    allow_methods=["*"],  # Allows all HTTP methods
    allow_headers=["*"],  # Allows all headers
)

Defining the Query Data Model

The Query class defines the data structure for incoming POST requests. This ensures that only queries with a valid question are accepted.

language-python=32

class Query(BaseModel):
    question: str

How it works:

Input validation: FastAPI will automatically validate that incoming POST requests contain a valid question field of type string
Data parsing: The incoming query is parsed into a Query object that can be used inside the endpoint

Defining the API endpoint

The API exposes a single POST endpoint (/query) where users send a query. This endpoint retrieves relevant content, constructs a context, and returns an answer from GPT-4.

language-python=35

@app.post("/query")
async def query_documents(query: Query):
    """Query documents and generate response using RAG."""

Querying Meilisearch for relevant documents

The system queries Meilisearch using a hybrid search approach that combines semantic search (70%) with keyword search (30%). It also enforces a rankingScoreThreshold of 0.4, ensuring only high-quality results are included.

language-python=38

    try:
        # Prepare search parameters
        search_params =     {
        "hybrid": {
            "embedder": "openai",
            "semanticRatio": 0.7  # 70% semantic, 30% full-text
        },
        "limit": 5, # restricts results to 5 documents
        "rankingScoreThreshold": 0.4
    }
        
        # Search Meilisearch
        search_results = meili.index('domain-data').search(
            query.question,  
            search_params
        )

Constructing the context for GPT-4

Once Meilisearch returns the search results, the system processes them to create a structured context. The context preserves the hierarchical structure of the documents, ensuring that headings and subheadings are retained.

Context construction process

Extract Hierarchical Data: the system pulls hierarchical levels (hierarchy_lvl0, hierarchy_lvl1, etc.) from the search results.
Concatenate context: the headings and main content are combined to create a clear, readable context.
Separate Sections: each document's context is separated using "---" to improve clarity for GPT-4.

language-python=55

        # Prepare context from search results
        contexts = []
        for hit in search_results['hits']:
            context_parts = []
            
            # Add hierarchical path
            for i in range(4):  # levels 0-3
                hierarchy_key = f'hierarchy_lvl{i}'
                if hit.get(hierarchy_key):
                    context_parts.append(f"{'  ' * i}> {hit[hierarchy_key]}")
            
            # Add content
            if hit.get('content'):
                context_parts.append(f"\nContent: {hit['content']}")
                
            contexts.append("\n".join(context_parts))
        
        context = "\n\n---\n\n".join(contexts)

Generating a response with GPT-4

The assembled context is passed to GPT-4 along with the user's question. A precise prompt ensures responses are:

practical and implementation-focused
based on actual documentation
clear about limitations when information isn't available

language-python=74

        # Create prompt template
        prompt_template = """You are a helpful Meilisearch documentation assistant. Use the following Meilisearch documentation to answer the question. 
If you cannot find the answer in the context, say so politely and suggest checking Meilisearch's documentation directly.
Provide practical, implementation-focused answers when possible.

Context:
{context}

Question: {question}

Answer (be concise and focus on practical information):"""

Running the LLMChain with LangChain

Create LLMChain: this links GPT-4 to the formatted prompt.
Send input: the user query and context are sent to the LLM for processing.
Return response: the LLM's response is returned to the user.

language-python=86

        prompt = PromptTemplate(
            template=prompt_template,
            input_variables=["context", "question"]
        )
        
        # Create and run chain
        chain = LLMChain(llm=llm, prompt=prompt)
        response = chain.run(context=context, question=query.question)

Assembling the final API response

The final API response includes:

LLM-generated answer
Sources (URLs and hierarchy of the documents used)

language-python=95

        return {
            "answer": response,
            "sources": [{
                'url': doc.get('url', ''),
                'hierarchy': [
                    doc.get(f'hierarchy_lvl{i}', '') 
                    for i in range(4) 
                    if doc.get(f'hierarchy_lvl{i}')
                ]
            } for doc in search_results['hits']]
        }

Handling errors and exceptions

To avoid system crashes, all exceptions are caught and returned as an error response.

language-python=107

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Running the application

Finally, you can run the API locally using Uvicorn. This command starts the FastAPI app on localhost:8000.

language-python=110

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

At this point, your RAG system is live, able to retrieve relevant context and generate precise answers using Meilisearch and GPT-4.

How to evaluate the performance of your RAG system

Ensuring high-quality content in RAG systems

Maintain high standards for your document base. Regularly audit and update your content to ensure accuracy and relevance. Remove duplicate or outdated information that might dilute search results. Establish a process for validating and updating information to maintain the knowledge base's integrity.

Monitoring performance to identify bottlenecks

Implement monitoring to track retrieval effectiveness. Watch for patterns in failed queries or consistently low-ranking results. Use this data to refine your document processing and search parameters. Monitor both technical metrics (like response times) and quality metrics (like relevancy scores) to ensure optimal performance. This can be easily done through the Meilisearch Cloud monitoring metrics and analytics dashboards.

Collecting user feedback

User feedback is one of the most valuable sources for improving the performance of your RAG system. While metrics like query latency or relevancy scores provide technical insight, user feedback reveals real-world problems.

By collecting and analyzing feedback, you can identify issues that are harder to detect with system metrics alone, such as:

False positives: When irrelevant results are returned for a query
Missed context: When the system fails to retrieve a document that users expected to see
Slow responses: When users experience slow loading times or incomplete responses

User feedback can guide you in fine-tuning your Meilisearch configuration. It might highlight the need to adjust sorting to prioritize more recent documents, raise the rankingScoreThreshold to filter out low-relevance results, optimize the documentTemplate to embed more relevant context, or chunk large documents into smaller, more targeted sections to improve retrieval accuracy.

Key takeaways: maximizing RAG performance with Meilisearch

Implementing RAG with Meilisearch provides several key advantages:

Flexibility: easily integrates with various data sources and LLMs.
Performance: delivers fast retrieval times and efficient resource usage.
Accuracy: combines keyword and semantic search for more precise results.
Scalability: handles large, growing knowledge bases with ease.

Meilisearch's robust features and high performance make it a strong foundation for production-ready RAG implementations. To get the most out of your system, focus on:

Data preparation and indexing: Ensure your knowledge base is clean, organized, and well-structured
Domain-specific fine-tuning: Adjust ranking rules, relevance thresholds, and embedding strategies for your unique context
Continuous evaluation: Use user feedback, system metrics, and LLM responses to optimize system performance
Knowledge base updates: Regularly review and update content to keep responses accurate and relevant

As Meilisearch and LLM technology continue to evolve, future advancements will bring even greater efficiency, accuracy, and flexibility to RAG systems — making them an increasingly valuable approach for AI-powered applications.

Interested in building your own RAG app?

Ready to build a powerful RAG system with lightning-fast search and exceptional relevancy? Start your free Meilisearch trial today and experience the perfect blend of keyword and semantic search capabilities.

Start a trial