Building a RAG system with Meilisearch: a comprehensive guide
Discover best practices for building a RAG system, with tips on optimizing documents, integrating AI, and why effective retrieval is key to success.
Retrieval Augmented Generation (RAG) has become an essential component of modern AI applications, enabling more accurate and controllable responses from Large Language Models (LLMs). While vector databases are the standard for RAG, Meilisearch stands out as a fast, open-source alternative with AI-powered search, exceptional relevancy, and remarkable speed.
This guide will walk you through building and optimizing a RAG system using Meilisearch.
Understanding RAG
RAG is a process that enhances LLM outputs by grounding them in external, retrievable data. Instead of relying solely on the model's trained knowledge, RAG systems first retrieve relevant information from a curated knowledge base, then use this context to generate responses.
The typical RAG workflow consists of three main steps:
- Retrieval: query the knowledge base to find relevant documents or passages
- Augmentation: combine the retrieved information with the user's query
- Generation: use an LLM to generate a response based on both the query and retrieved context
Key components of RAG
A RAG system comprises three essential components:
-
External data source External data sources are the foundation of a RAG system. These sources such as knowledge bases, or technical documentation provide the information the LLM uses to generate responses. The quality of this data directly impacts performance; it must be well-organized, and regularly updated for accuracy and relevancy.
-
Vector store The vector store serves as the bridge between raw data and the LLM. It converts text into vector embeddings—numerical representations of meaning. These vectors allow efficient similarity searches, enabling quick retrieval of relevant information. Modern tools like Meilisearch combine keyword search with semantic similarity to deliver fast and scalable results.
-
Large Language Model The LLM is the system's intelligence, responsible for understanding user queries and generating coherent, relevant responses. It combines user queries with context retrieved from the vector store to produce accurate replies. Models like GPT-4, Claude, or Llama 2 excel at creating human-like responses within the constraints of the provided context.
Why LLMs need RAG: overcoming key limitations
Large Language Models excel at general knowledge but face two significant limitations:
- they struggle with specialized domain-specific information
- they are constrained by their last training sessions, relying on outdated knowledge and often lagging months or even years behind current advancements.
RAG lets you tackle both challenges at once. For instance, a legal firm can enhance their LLM's capabilities by incorporating not only their historical case archives but also the latest court decisions and regulatory changes. A healthcare provider might integrate both established medical literature and recent clinical trials or updated treatment protocols.
The ability to continuously update your knowledge base ensures that your LLM-powered applications can provide accurate, up-to-date responses that combine deep domain expertise with the latest information in your field.
How to optimize document retrieval in RAG Systems
Efficient information retrieval is crucial for RAG. Without precise and relevant document retrieval, even the most advanced LLMs can produce inaccurate or incomplete responses. The goal is to ensure that only the most relevant, contextually rich documents are retrieved in response to a query.
Choosing the right document retrieval system is a crucial step in this process. Meilisearch offers a fast, open-source search engine that supports keyword searches and more advanced AI-powered search approaches that combine exact word matching with semantic search. This dual capability makes it an ideal tool for RAG systems, where the goal is to retrieve not only documents that match keywords but also those that are semantically related
Meilisearch offers a range of features specifically suited for RAG systems:
- Easy embedder integration: Meilisearch automatically generates vector embeddings, enabling high-quality semantic retrieval with minimal setup and flexibility to choose the latest embedder models.
- Hybrid search capabilities: Combine keyword and semantic (vector-based) search to deliver broader, more accurate document retrieval.
- Speed and performance: Meilisearch delivers ultra-fast response times, ensuring that retrieval is never a bottleneck in your LLM workflow.
- Customizable relevancy: Adjust ranking rules and sort documents based on attributes like freshness or importance, to prioritize the most valuable results. Set a relevancy threshold to exclude less relevant results from the search.
Once you've established your retrieval system, the next step is to optimize how your data is stored, indexed, and retrieved. The following strategies—document chunking, metadata enrichment, and relevancy tuning—will ensure that every search query returns the most useful and contextually relevant information.
How to chunk documents to maximize relevancy
Breaking down documents into optimal-sized chunks is crucial for effective retrieval. Chunks should be large enough to maintain context but small enough to be specific and relevant. Consider semantic boundaries like paragraphs or sections rather than arbitrary character counts.
Enriching metadata to boost search precision
Enhance your documents with rich metadata to improve retrieval accuracy. Include categories, tags, timestamps, authors, and other relevant attributes. For example, tagging technical documentation with specific product versions can significantly improve retrieval quality.
Adjusting relevancy for accurate results
Fine-tune your search parameters based on your specific use case. Adjust the hybrid search semantic ratio to balance conceptual understanding and exact matching based on the needs of your domain. Use the ranking score threshold to filter out low-quality matches, but be careful not to set it too high and miss valuable contextual information.
Setting up Meilisearch for RAG
The quality of the retrieval system directly impacts the accuracy and reliability of generated responses. Meilisearch stands out as a search engine for RAG implementations, thanks to its AI-powered search capabilities, customizable document processing, and advanced ranking controls.
Set Meilisearch up
Unlike traditional vector stores that rely solely on semantic search, Meilisearch combines vector similarity with full-text search, giving you the best of both worlds.
First, you need to create a Meilisearch project and activate the AI-powered search feature.
Then, you need to configure the embedder of your choice. We are going to use an OpenAI embedder, but Meilisearch also supports embedders from HuggingFace, Ollama, and any embedder accessible via a RESTful API:
import os import meilisearch client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY')) # An index is where the documents are stored. index = client.index('domain-data') index.update_embedders({ "openai": { "source": "openAi", "apiKey": "OPEN_AI_API_KEY", "model": "text-embedding-3-small", "documentTemplate": "A document titled '{{doc.hierarchy_lvl1}}'. Under the section '{{doc.hierarchy_lvl2}}'. This is further divided into '{{doc.hierarchy_lvl3}}'. It discusses {{doc.content}}." } })
Note: You'll need to replace OPEN_AI_API_KEY with your OpenAI API key.
Smart document processing with Meilisearch's document template
Meilisearch’s document template allows you to customize embeddings for each document, ensuring only the most relevant fields are included.
Customizing your document processing helps you:
- Increase retrieval relevance with precise embeddings
- Lower costs by reducing unnecessary tokens
- Ensure consistency across different document types
- Support domain-specific needs for unique data formats
- Iterate and refine embedding strategies as your system evolves
Here’s an example document from the Meilisearch documentation:
{ "hierarchy_lvl1":"Filter expression reference" "hierarchy_lvl2":"Filter expressions" "hierarchy_lvl3":"Creating filter expressions with arrays" "content":"Inner array elements are connected by an OR operator. The following expression returns either horror or comedy films" "hierarchy_lvl0":"Filtering and sorting" "anchor":"creating-filter-expressions-with-arrays" "url":"https://www.meilisearch.com/docs/learn/filtering_and_sorting/filter_expression_reference#creating-filter-expressions-with-arrays" "objectID":"bbcce6ab00badb2a377b455ba16180d" "publication_date":"1733986800" }
To optimize the embeddings for this document, we’ve decided to focus on the most meaningful fields:
- Headings: The values of hierarchy_lvl0 to hierarchy_lvl3 will be included in the embeddings to retain document structure and context
- Content: The value of content will be embedded as it provides the essential text needed for semantic search
Other fields, like publication_date
, will be excluded from embeddings but remain available for sorting. This allows Meilisearch to sort by date while keeping embeddings lean and focused on relevancy
Meilisearch customizable ranking rules
Meilisearch offers fine-grained control over result ranking, enabling you to customize how search results are ordered and prioritized. This control ensures that users see the most relevant content first, tailored to your specific business or domain needs.
Unlike fixed ranking systems, Meilisearch allows you to define your own ranking rules. This flexibility helps you prioritize certain types of content, promote newer or more relevant results, and create a search experience that aligns with user expectations.
For instance, we have added to the default ranking rules, a custom rule that promote newer documents.
# Configure settings import os import meilisearch # Initialize the Meilisearch client client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY')) # An index is where the documents are stored. index = client.index('domain-data') index.update_settings({ 'rankingRules': [ "words", "typo", "proximity", "attribute", "sort", "exactness", "publication_date:desc", ], 'searchableAttributes': [ 'hierarchy_lvl1', 'hierarchy_lvl2', 'hierarchy_lvl3', 'content' ] })
Index your documents
After setting up Meilisearch and preparing your data using best practices like document chunking and metadata enrichment, you can now push your data to Meilisearch.
Meilisearch accepts data in .json
, .ndjson
, and .csv
formats. There are several ways to upload your documents:
- Drag and drop files into the Cloud UI.
- Use the API via the
/indexes/{index_uid}/documents
route. - Call the method from your preferred SDK
💡 Note: Your documents must have a unique identifier (id). This is crucial for Meilisearch to identify and update records correctly.
Here’s how to upload documents using the Python SDK:
import os import meilisearch import json # Initialize Meilisearch client client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY'))) # Select or create the index index = client.index('domain-data') # Load the JSON file with open('path/to/your/file.json', 'r') as file: documents = json.load(file) # Load the array of JSON objects as a Python list # Add documents to Meilisearch index.add_documents(documents)
Perform an AI-powered search
Perform AI-powered searches with q
and hybrid
to retrieve search results using the embedder you configured earlier.
Meilisearch will return a mix of semantic and full-text matches, prioritizing results that match the query's meaning and context. You can fine-tune this balance using the semanticRatio
parameter:
index.search( userQuery, { "hybrid": { "embedder": "openai", "semanticRatio": 0.7 # 70% semantic, 30% full-text } } )
This flexible control lets you:
- Optimize the balance to fit your specific use case.
- Adapt in real-time based on query patterns.
- Combine the strengths of both methods, ensuring you don't miss key results.
This dual approach ensures you won't miss relevant results that might slip through the cracks of pure semantic search, while maintaining the benefits of semantic understanding.
Quality control with ranking score threshold
The rankingScoreThreshold
parameter ensures that only high-quality results are included in the search response. It works in tandem with the ranking score, a numeric value ranging from 0.0 (poor match) to 1.0 (perfect match). Any result with a ranking score below the specified rankingScoreThreshold
is excluded.
By setting a ranking score threshold, you can:
- Filter out low-relevance results to improve overall result quality
- Provide better context for RAG systems, ensuring LLMs work with higher-quality data
- Reduce noise in search results, minimizing irrelevant information
- Customize relevancy to align with your specific use case needs
The following query only returns results with a ranking score bigger than 0.3:
index.search( userQuery, { "hybrid": { "embedder": "openai", "semanticRatio": 0.7 # 70% semantic, 30% full-text }, "rankingScoreThreshold": 0.4 } )
Ready to build your RAG system? Now that we've set up Meilisearch. We'll walk you through the steps to create a RAG system with Meilisearch.
Implementing RAG with Meilisearch
We'll build a RAG system using the Meilisearch documentation as our example knowledge base, demonstrating how to retrieve, process, and generate accurate, context-aware responses.
Key technologies used
Our implementation leverages several key technologies:
- FastAPI: powers the API that handles user queries
- Meilisearch: retrieves the relevant content
- OpenAI's GPT-4: generates human-like, contextual responses
- LangChain: orchestrates the AI workflow by chaining the search and LLM response generation.
How the system works
When a user submits a question, the system follows these steps:
- User input: The user submits a query to the API
- Content retrieval: Meilisearch searches for the most relevant content using a combination of keyword and semantic search
- Context construction: the system builds a hierarchical context from the search results
- LLM generation: the context and user query are sent to GPT-4 to generate an accurate, practical response
- Response delivery: the system returns the LLM-generated answer along with the sources used to generate it
Setting up the environment
API keys and credentials are stored on environment variables in a .env file. We use dotenv
to load them.
Here's how key services are initialized:
- Meilisearch client: connects to the Meilisearch instance using the host and API key.
- OpenAI client: authenticates the GPT-4 LLM via an API key
- FastAPI application: sets up the web API for users to interact with the system
import os
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from meilisearch import Client
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Initialize FastAPI application
app = FastAPI()
# Initialize Meilisearch client
client = meilisearch.Client(os.getenv('MEILI_HOST'), os.getenv('MEILI_API_KEY')))
# Initialize OpenAI
llm = ChatOpenAI(temperature=0, model="gpt-4o", api_key=os.getenv('OPENAI_API_KEY'))
Configuring CORS middleware
To ensure the system can handle requests from different origins (like frontend clients), we configure Cross-Origin Resource Sharing (CORS) for the FastAPI app. This allows cross-origin requests from any domain.
# Configure CORS middleware to allow cross-origin requests
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Allows all origins
allow_credentials=True, # Allows credentials (cookies, authorization headers, etc.)
allow_methods=["*"], # Allows all HTTP methods
allow_headers=["*"], # Allows all headers
)
Defining the Query Data Model
The Query
class defines the data structure for incoming POST
requests. This ensures that only queries with a valid question are accepted.
class Query(BaseModel):
question: str
How it works:
- Input validation: FastAPI will automatically validate that incoming
POST
requests contain a valid question field of type string - Data parsing: The incoming query is parsed into a
Query
object that can be used inside the endpoint
Defining the API endpoint
The API exposes a single POST
endpoint (/query
) where users send a query. This endpoint retrieves relevant content, constructs a context, and returns an answer from GPT-4.
@app.post("/query")
async def query_documents(query: Query):
"""Query documents and generate response using RAG."""
Querying Meilisearch for relevant documents
The system queries Meilisearch using a hybrid search approach that combines semantic search (70%) with keyword search (30%). It also enforces a rankingScoreThreshold
of 0.4
, ensuring only high-quality results are included.
try:
# Prepare search parameters
search_params = {
"hybrid": {
"embedder": "openai",
"semanticRatio": 0.7 # 70% semantic, 30% full-text
},
"limit": 5, # restricts results to 5 documents
"rankingScoreThreshold": 0.4
}
# Search Meilisearch
search_results = meili.index('domain-data').search(
query.question,
search_params
)
Constructing the context for GPT-4
Once Meilisearch returns the search results, the system processes them to create a structured context. The context preserves the hierarchical structure of the documents, ensuring that headings and subheadings are retained.
Context construction process
- Extract Hierarchical Data: the system pulls hierarchical levels (hierarchy_lvl0, hierarchy_lvl1, etc.) from the search results.
- Concatenate context: the headings and main content are combined to create a clear, readable context.
- Separate Sections: each document's context is separated using "---" to improve clarity for GPT-4.
# Prepare context from search results
contexts = []
for hit in search_results['hits']:
context_parts = []
# Add hierarchical path
for i in range(4): # levels 0-3
hierarchy_key = f'hierarchy_lvl{i}'
if hit.get(hierarchy_key):
context_parts.append(f"{' ' * i}> {hit[hierarchy_key]}")
# Add content
if hit.get('content'):
context_parts.append(f"\nContent: {hit['content']}")
contexts.append("\n".join(context_parts))
context = "\n\n---\n\n".join(contexts)
Generating a response with GPT-4
The assembled context is passed to GPT-4 along with the user's question. A precise prompt ensures responses are:
- practical and implementation-focused
- based on actual documentation
- clear about limitations when information isn't available
# Create prompt template
prompt_template = """You are a helpful Meilisearch documentation assistant. Use the following Meilisearch documentation to answer the question.
If you cannot find the answer in the context, say so politely and suggest checking Meilisearch's documentation directly.
Provide practical, implementation-focused answers when possible.
Context:
{context}
Question: {question}
Answer (be concise and focus on practical information):"""
Running the LLMChain with LangChain
- Create LLMChain: this links GPT-4 to the formatted prompt.
- Send input: the user query and context are sent to the LLM for processing.
- Return response: the LLM's response is returned to the user.
prompt = PromptTemplate(
template=prompt_template,
input_variables=["context", "question"]
)
# Create and run chain
chain = LLMChain(llm=llm, prompt=prompt)
response = chain.run(context=context, question=query.question)
Assembling the final API response
The final API response includes:
- LLM-generated answer
- Sources (URLs and hierarchy of the documents used)
return {
"answer": response,
"sources": [{
'url': doc.get('url', ''),
'hierarchy': [
doc.get(f'hierarchy_lvl{i}', '')
for i in range(4)
if doc.get(f'hierarchy_lvl{i}')
]
} for doc in search_results['hits']]
}
Handling errors and exceptions
To avoid system crashes, all exceptions are caught and returned as an error response.
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Running the application
Finally, you can run the API locally using Uvicorn. This command starts the FastAPI app on localhost:8000.
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
At this point, your RAG system is live, able to retrieve relevant context and generate precise answers using Meilisearch and GPT-4.
How to evaluate the performance of your RAG system
Ensuring high-quality content in RAG systems
Maintain high standards for your document base. Regularly audit and update your content to ensure accuracy and relevance. Remove duplicate or outdated information that might dilute search results. Establish a process for validating and updating information to maintain the knowledge base's integrity.
Monitoring performance to identify bottlenecks
Implement monitoring to track retrieval effectiveness. Watch for patterns in failed queries or consistently low-ranking results. Use this data to refine your document processing and search parameters. Monitor both technical metrics (like response times) and quality metrics (like relevancy scores) to ensure optimal performance. This can be easily done through the Meilisearch Cloud monitoring metrics and analytics dashboards.
Collecting user feedback
User feedback is one of the most valuable sources for improving the performance of your RAG system. While metrics like query latency or relevancy scores provide technical insight, user feedback reveals real-world problems.
By collecting and analyzing feedback, you can identify issues that are harder to detect with system metrics alone, such as:
- False positives: When irrelevant results are returned for a query
- Missed context: When the system fails to retrieve a document that users expected to see
- Slow responses: When users experience slow loading times or incomplete responses
User feedback can guide you in fine-tuning your Meilisearch configuration. It might highlight the need to adjust sorting to prioritize more recent documents, raise the rankingScoreThreshold to filter out low-relevance results, optimize the documentTemplate to embed more relevant context, or chunk large documents into smaller, more targeted sections to improve retrieval accuracy.
Key takeaways: maximizing RAG performance with Meilisearch
Implementing RAG with Meilisearch provides several key advantages:
- Flexibility: easily integrates with various data sources and LLMs.
- Performance: delivers fast retrieval times and efficient resource usage.
- Accuracy: combines keyword and semantic search for more precise results.
- Scalability: handles large, growing knowledge bases with ease.
Meilisearch's robust features and high performance make it a strong foundation for production-ready RAG implementations. To get the most out of your system, focus on:
- Data preparation and indexing: Ensure your knowledge base is clean, organized, and well-structured
- Domain-specific fine-tuning: Adjust ranking rules, relevance thresholds, and embedding strategies for your unique context
- Continuous evaluation: Use user feedback, system metrics, and LLM responses to optimize system performance
- Knowledge base updates: Regularly review and update content to keep responses accurate and relevant
As Meilisearch and LLM technology continue to evolve, future advancements will bring even greater efficiency, accuracy, and flexibility to RAG systems — making them an increasingly valuable approach for AI-powered applications.