🚀We just wrapped up the Meilisearch AI launch week. Learn more!

Go to homeMeilisearch's logo
Back to articles
03 Sept 2024

How to choose the best model for semantic search

Discover the best embedding model for semantic search. See our model performance, cost, and relevancy comparison in building semantic search.

Quentin de Quelen
Quentin de QuelenCo-founder & CEO at Meilisearch@Quentin_dQ
How to choose the best model for semantic search

Semantic search transforms how we find information online by focusing on understanding the meaning behind the words and phrases in user searches rather than relying solely on word-for-word matching.

Using machine learning (ML) and natural language processing (NLP), it decodes intent, context, and relationships between words, thus allowing for more conversational queries. This makes it a vital tool for businesses needing fast, accurate, and relevant results from large or complex data sets.

Embedding models are ML techniques that render words and phrases into complex numerical representations, which are then classified based on their context and relationships. However, not all models are the same. Accuracy, speed, indexing efficiency, and pricing can vary significantly between them.

Choosing the right semantic search model is crucial for delivering precise, efficient, and scalable search experiences for specific tasks.

Understanding the trade-offs—whether for multilingual support, search latency, or computational efficiency—will help you implement a high-performing semantic search system for your brand.

What is semantic search?

Semantic search is an advanced AI-powered approach to information retrieval that focuses on NLP, ML, and knowledge representation to deduce the intent and contextual meaning of user queries.

Traditional keyword-based search engines match exact strings of words, but that’s not always enough. Semantic search goes further by incorporating named entity recognition, relationships between words, and contextual disambiguation to deliver more relevant results.

Semantic search engines can understand synonyms, paraphrased queries, and even infer implicit meanings exclusively because they use deep learning models such as transformers.

Given its ability to enhance the search criteria, AI-powered search is crucial for applications requiring faster, broader, and even less structured data retrieval.

Why is semantic search important?

Semantic search is important because it improves search capability and relevance, thus helping users find information faster and with fewer irrelevant results. It also reduces uncertainty by interpreting natural language queries more clearly.

Like similarity search, it allows users to phrase their questions conversationally and intuitively rather than using specific—sometimes even long-tail—keywords. Think of it like this: if you can ask a question from your neighbor on a bus, then you can ask it with a quick semantic search.

Additionally, businesses benefit from semantic search by enhancing customer experiences, improving product discovery, and optimizing knowledge management systems.

For example, e-commerce stores like Bookshop.org can boost search-based sales by up to 43% by implementing semantic search tools like Meilisearch.

A core component of semantic search is embedding models through which similarity computations are made possible.

What are embedding models?

Embedding models are machine learning models that transform words, phrases, or documents into dense numerical representations called embeddings.

These vector representations encode word relationships, which allow search engines to compare semantic meaning and context rather than rely solely on exact similarity between words.

For instance, embeddings generated for words like “phone,” “mobile,” and “cell” will be more closely aligned in vector space than, say, “cell” and “bell,” even though the latter two are closer in letter-by-letter similarity.

This allows a search engine to retrieve relevant results even if the exact keyword is absent, underpinning various NLP and LLM (large language model) applications.

What’s the difference between semantic and search embeddings?

Semantic and search embeddings serve distinct purposes in NLP applications, particularly in semantic search.

Semantic embeddings capture the meaning and relationships between words, which is helpful for tasks like document classification, recommendation systems, language translation, and sentiment analysis. They do this by reflecting the words’ conceptual similarities, even if they are not used in the same context.

On the other hand, search embeddings are specifically optimized for retrieval tasks, thus ensuring queries and indexed documents align effectively in vector space to maximize search relevance.

Unlike generic semantic embeddings, search embeddings often incorporate domain- and background-specific optimizations to fine-tune the effectiveness of the specific retrieval activity.

For example, a semantic embedding model might learn that "mac" is related to pasta, the makeup company, and the Apple laptop computer. However, a search embedding model trained on computers and hardware would prioritize the third meaning when processing search queries in that domain.

Embedding models serve as the backbone of semantic search by enabling similarity-based retrieval mechanisms that align with user intent.

What is the role of embedding models in semantic search?

Embedding models power semantic search by converting text into a structured format that can be efficiently indexed and retrieved.

Instead of relying on exact keyword matches, these models compare query embeddings with indexed document embeddings. This enables nearest-neighbor searches and significantly improves retrieval accuracy over traditional algorithms that depend on keyword matching.

These models often use transformer-based architectures such as BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), and their derivative neural networks to capture context-aware representations.

That’s how they can handle nuanced queries at scale. For instance, even though “cat” and “bat” sound (and look) similar, only one of them may be used in context as a conventional house pet.

Several fine-tuned and pre-trained models continue to emerge as industry standards for semantic search. Each improves precision, relevance, efficiency, and/or scalability in its own way.

What are the most commonly used models for semantic search?

Different embedding models vary in terms of vector dimensionality, context length, and certain other performance characteristics.

In the context of LLMs, dimensionality points to the number of components in a vector, with each corresponding to an attribute of the encoded variable. Additionally, context length relates to the amount of text—usually measured in tokens—a model can “remember” and refer to at any given time.

To assess these variations, a series of benchmark tests were conducted using Meilisearch, evaluating each model's effectiveness in real-world search scenarios that you will encounter.

These tests measured factors such as retrieval accuracy, indexing speed, and query latency, which is how they evaluated each model’s performance under specific search conditions.

Your choice of a model for semantic search will depend on a variety of factors, including, but not limited to, its accuracy, computational efficiency, and cost.

What factors should be considered when choosing the best model for semantic search?

1. Results relevancy

Relevance is the cornerstone of effective semantic search, especially in achieving optimal user experience. The right model should balance precision, recall, and speed to ensure users receive highly relevant results without excessive noise.

This balance becomes especially important when comparing hybrid approaches like vector and full-text search. When selecting an embedding model, think about the following:

  • multilingual support;
  • handling multi-modal data;
  • domain-specific performance.

In this context, bigger might not always mean better. While larger models often provide better accuracy, smaller models can offer competitive results at lower computational costs.

Moreover, effectively structuring data, such as with optimized document templates in Meilisearch, can improve search quality.

2. Search performance

Search latency is a critical factor in user experience. Search-as-you-type has become the standard for customer-facing applications as fast, responsive search results improve user engagement and retention.

Local embedding models are ideal to achieve lightning-fast performance, as they eliminate the need for round trips to external services and reduce latency. If you must rely on remote models, hosting your search service close to the embedding service can minimize delays and improve user experience.

The table below showcases latency benchmarks for various local embedding models and embedding APIs. All requests originate from a Meilisearch instance hosted in the AWS London data center.

Benchmarks conducted in Meilisearch highlight the stark differences in latency across various models, with local models achieving response times as low as 10ms while some cloud-based services reach 800ms.

3. Indexing performance

Efficient indexing is another critical factor for the scalability of your search solution. As expected, the time required to process and store embeddings varies widely among models. Notable metrics influencing processing time are API rate limits, batch processing capabilities, and model dimensions.

Local models without GPUs may experience slower indexing due to limited processing power, while third-party services vary in speed based on their infrastructure and agreements.

As mentioned earlier, minimizing data travel time between your application and the model can reduce latency and optimize indexing. Evaluating these factors ensures your chosen model and service can effectively meet your needs.

The benchmark below compares the indexing performance of a collection of 10k e-commerce documents (with automatic embedding generation).

Meilisearch benchmarks indicate that indexing times range from under a minute for optimized cloud-based solutions to several hours for certain local models without GPU acceleration.

These should be important points to ponder when you weigh the frequency and volume of data updates in your applications. This is because they directly impact how quickly your system can process frequent or large data updates, which is crucial for maintaining the performance and responsiveness of your search solution.

4. Pricing

Embedding model costs vary depending on the provider and usage model. While local models are free to run, they require computational resources, potentially necessitating GPU investment.

On the other hand, cloud-based services charge per million tokens (or thousand neurons for Cloudflare), with costs ranging from $0.02 to $0.18 per million tokens.

ProviderPrice
Cohere$0.10 per million tokens for Embed 3
OpenAI$0.02 per million tokens for text-embedding-3-small
$0.10 per million tokens for text-embedding-ada-002
$0.13 per million tokens for text-embedding-3-large
Cloudflare$0.011 per 1,000 Neurons
Jina$0.18 per million tokens
Mistral$0.10 per million tokens
VoyageAI$0.02 per million tokens for voyage-3-lite
$0.06 per million tokens for voyage-3
$0.12 per million tokens for voyage-multimodal-3
$0.18 per million tokens for voyage-code-3
$0.18 per million tokens for voyage-3-large
Local modelFree

As such, analyze cost-effectiveness against search demand and performance requirements. It's often a good idea to start with a well-known model that's easy to set up and has robust community support. You can migrate to a cloud provider like AWS for improved performance when necessary.

Alternatively, you can choose an open-source model to self-host, giving you more flexibility. Just be aware that optimizing local models for high volume may require scaling your infrastructure.

5. Other optimization techniques

To maximize search performance, hybrid search approaches combining full-text and vector search can yield the best results. To refine semantic search performance, consider the following optimizations:

  • Experiment with model presets, as some models allow tuning for query vs. document embedding, which can improve relevancy.
  • Assess specialized models, especially those employing retrieval-augmented generation, as domain-specific models may offer superior results for particular use cases.
  • Explore models that provide reranking functions that further enhance search precision.
  • Test higher-tier accounts as premium tiers may provide faster processing and reduced rate limiting.
  • Optimize data transfer with quantization options to reduce API response sizes and improve efficiency.

Carefully evaluating these factors will help you select the best semantic search model to meet your needs.

The bottom line

Now that we’ve taken this journey together, let’s round up what we’ve discovered.

Model/ServiceDimensionsContext LengthLatencyIndexation timePricing (per million tokens)
Cohere embed-english-v3.01024512±170ms43s$0.10
Cohere embed-english-light-v3.0384512±160ms16s$0.10
OpenAI text-embedding-3-small15368192±460ms95s$0.02
OpenAI text-embedding-3-large30728192±750ms151s$0.13
Mistral10248192±200ms409s$0.10
VoyageAI voyage-210244000±350ms330s$0.10
VoyageAI voyage-large-2153616000±400ms409s$0.12
Jina Colbert v2128, 96, or 648192±400ms375s$0.18
OSS all-MiniLM-L6-v2384512±10ms880sFree
OSS bge-small-en-v1.51024512±20ms3379sFree
OSS bge-large-en-v1.51536512±60ms9132sFree

Choosing the best model for semantic search depends on what (and how) you aim to achieve with your specific use case, budget, and performance requirements. For most cases, cloud-based solutions such as those provided by Cohere or OpenAI may be optimal.

As the needs of your organization grow, it might be worth the cost and effort to upgrade to a local or self-hosted solution. Understanding your own needs is paramount to making a well-informed decision.

If you’re unsure which model is best for you or if you’re looking for a tailored solution, contact a search expert at Meilisearch.

Frequently asked questions (FAQs)

How does semantic search differ from keyword-based search?

Semantic search focuses on understanding meaning, while keyword-based search relies on exact word matches.

What are the key features of a good semantic search model?

A good model should offer high accuracy, low latency, efficient indexing, and cost-effectiveness.

How do transformer-based models improve semantic search?

Transformers process text in context, capturing contextual relationships between words to enhance search relevance while reducing downtime.

How do vector databases enhance semantic search?

Vector databases work to efficiently store embeddings, enabling fast and scalable search operations.

What are some open-source models for semantic search?

Popular open-source semantic search models include all-MiniLM-L6-v2 and Universal Sentence Encoder.

Building the future of search with Meilisearch AI

Building the future of search with Meilisearch AI

We're transforming how developers build search with Meilisearch AI. No more complex infrastructure—just powerful, intelligent search that works out of the box.

Hybrid Search 101: how it works and why It's important

Hybrid Search 101: how it works and why It's important

Understand what hybrid search is, how it works, its benefits and limitations, how to start implementing it, and more.

Ilia Markov
Ilia Markov13 Mar 2025
Neural search: Definition, how it works, benefits and more

Neural search: Definition, how it works, benefits and more

Learn what neural search is, how it works, discover its benefits and drawbacks, and see how it compares with other types of search.

Ilia Markov
Ilia Markov11 Mar 2025