Document template best practices
When using AI-powered search, Meilisearch generates prompts by filling in your embedder's documentTemplate
with each document's data. The better your prompt is, the more relevant your search results.
This guide shows you what to do and what to avoid when writing a documentTemplate
.
Sample document
Take a look at this document from a database of movies:
{
"id": 2,
"title": "Ariel",
"overview": "Taisto Kasurinen is a Finnish coal miner whose father has just committed suicide and who is framed for a crime he did not commit. In jail, he starts to dream about leaving the country and starting a new life. He escapes from prison but things don't go as planned...",
"genres": [
"Drama",
"Crime",
"Comedy"
],
"poster": "https://image.tmdb.org/t/p/w500/ojDg0PGvs6R9xYFodRct2kdI6wC.jpg",
"release_date": 593395200
}
Do not use the default documentTemplate
Use a custom documentTemplate
value in your embedder configuration.
The default documentTemplate
includes all searchable fields with non-null
values. In most cases, this adds noise and more information than the embedder needs to provide relevant search results.
Only include highly relevant information
Take a look at your document and identify the most relevant fields. A good documentTemplate
for the sample document could be:
"A movie called {{doc.title}} about {{doc.overview}}"
In the sample document, poster
and id
contain data that has little semantic importance and can be safely excluded. The data in genres
and release_date
is very useful for filters, but say little about this specific film.
This leaves two relevant fields: title
and overview
.
Keep prompts short
For the best results, keep prompts somewhere between 15 and 45 words:
"A movie called {{doc.title}} about {{doc.overview | truncatewords: 20}}"
In the sample document, the overview
alone is 49 words. Use Liquid's truncate
or truncatewords
to shorten it.
Short prompts do not have enough information for the embedder to properly understand the query context. Long prompts instead provide too much information and make it hard for the embedder to identify what is truly relevant about a document.
Conclusion
In this article you saw the main steps to generating prompts that lead to relevant AI-powered search results:
- Do not use the default
documentTemplate
- Only include relevant data
- Truncate long fields