AI-powered hybrid search is in closed beta. Join the waitlist for early access!

Go to homeMeilisearch's logo
Back to articles
03 Sept 2024

Choosing the best model for semantic search

A comparison of model performance, cost, and relevancy in regard to building semantic search.

Quentin de Quelen
Quentin de QuelenCo-founder & CEO at Meilisearch@Quentin_dQ
Choosing the best model for semantic search

Semantic search is transforming search technology by providing more accurate and relevant results. However, with many embedding models available, choosing the right one can be challenging. This guide will help you understand the key factors to consider when selecting a model to build semantic search.

Overview

In this guide, we will use the open-source search engine Meilisearch to perform the semantic searches. For the purpose of the tests, we're using the entry tier of Meilisearch Cloud (i.e., the Build plan .)

This guide will cover the following models:

Factors to consider

1. Results relevancy

Relevancy is crucial for effective search, as it ensures that users find the most pertinent results quickly. In the realm of semantic search, achieving a balance between relevancy and speed is essential to provide a seamless user experience. It's important to consider the tradeoffs of vector search vs full-text search

When selecting a model, consider your specific use case, such as the need for multilingual support, handling multi-modal data, or addressing domain-specific requirements. If you have a highly specialized use case or need to support a particular language, it may be beneficial to explore models that can be trained on your data or opt for multilingual models.

The performance difference between a very small model and a large model is not always substantial. Smaller models are generally less expensive and faster, making them a practical choice in many scenarios. Therefore, it is often worth considering smaller models for their cost-effectiveness and speed.

Additionally, you should always consider the context you're providing to the model. In Meilisearch, this comes in the form of a document template. The more accurately the template describes the data, the better the search results will be, leading to a more satisfying user experience.

2. Search performance

You read everywhere now. Time is money. And the web is no different. Nowadays, search-as-you-type is the baseline for customer-facing applications. Saving users time greatly enhances their satisfaction and keeps them engaged with your platform.

To achieve lightning-fast search performance, consider using a local model to minimize latency by eliminating the need for round trips to the embedding service. If you need to use a remote model, then hosting your search service (e.g., your Meilisearch database) in close proximity to the embedding service can significantly reduce latency.

The table below showcases latency benchmarks for various local embedding models and embedding APIs. All requests are originated from a Meilisearch instance hosted on AWS (London datacenter.)

Here you can see that there are some clear winners in terms of latency. Unfortunately, latency is not the same as throughput, so we also need to take a close look at the indexing time.

3. Indexing performance

Indexing performance is another critical aspect when comparing search solutions. The embedding model performance will directly impact the indexing speed of your search solution. And the speed at which your data can be indexed directly impacts the overall efficiency and scalability of your search solution.

Local models without GPUs may have slower indexing due to limited processing power. In contrast, third-party services offer varying speeds and limitations based on their infrastructure and service agreements. It is essential to evaluate these factors to ensure that your chosen model and service can meet your requirements effectively.

Several factors come into play when optimizing indexing. Again, the latency plays a big role: reducing the time taken for data to travel between your application and the model is always going to improve your experience. Additionally, the maximum size of API calls the API accepts, the provider's rate limiting, and the model's supported number of dimensions can all influence the efficiency and scalability of the indexing process.

The benchmark below compares the indexing of a 10k e-commerce documents (with automatic embedding generation):

4. Pricing

While local embedders are free, most services charge per million of tokens. Here's a breakdown of the pricing for each platform:

ProviderPrice
Cohere$0.10 per million tokens
OpenAI$0.13 per million tokens for text-embedding-3-large
$0.02 per million tokens for text-embedding-3-small
Cloudflare$0.011 per 1,000 Neurons
Jina$0.18 per million tokens
Mistral$0.10 per million tokens
VoyageAI$0.10 per million tokens for voyage-2
$0.12 per million tokens for voyage-large-2
$0.12 per million tokens for voyage-multilingual-2
Local modelFree

As your search needs grow and scale, it may become more cost-effective to invest in your own GPU machine. By having your own hardware, you can have greater control over the performance and scalability of your search solution and potentially reduce costs in the long run.

It is often best to start with a well-known model from the list provided. They are generally easy to setup and you will easily find community resources to help you.. As the need arises, you can consider migrating the model to a cloud provider like AWS. Many services offer this option, allowing you to leverage their infrastructure for improved performance and scalability.

Alternatively, you can choose an equivalent open-source model to self-host, giving you even more flexibility and control over your search solution in the long term. Please note that optimizing local models for performance or high volume may require to scale your infrastructure accordingly.


Ready to elevate your search experience?

Talk to a search expert


Going further

While this article provides a comprehensive overview, we did not delve deeply into optimization techniques. There are several additional optimizations that can be explored to further enhance the performance of semantic search.

Here is a list of additional areas to investigate when choosing a model for your search experience:

  • Experiment with different presets (query vs. document) for models that offer this option to potentially improve relevancy
  • Evaluate specialized models for specific applications to assess their performance and suitability for your use case
  • Explore models that provide a reranking function to further refine search results
  • Test higher-tier accounts on each platform to check for improved performance and reduced rate limiting
  • Investigate parameters for receiving quantized data directly from the API to optimize data transfer and processing

Conclusion

Model/ServiceDimensionsContext LengthLatencyIndexation TimePricing (per million tokens)
Cohere embed-english-v3.01024512±170ms43s$0.10
Cohere embed-english-light-v3.0384512±160ms16s$0.10
OpenAI text-embedding-3-small15368192±460ms95s$0.02
OpenAI text-embedding-3-large30728192±750ms151s$0.13
Mistral10248192±200ms409s$0.10
VoyageAI voyage-210244000±350ms330s$0.10
VoyageAI voyage-large-2153616000±400ms409s$0.12
Jina Colbert v2128, 96, or 648192±400ms375s$0.18
OSS all-MiniLM-L6-v2384512±10ms880sFree
OSS bge-small-en-v1.51024512±20ms3379sFree
OSS bge-large-en-v1.51536512±60ms9132sFree

Choosing the right model and service for semantic search involves carefully balancing several key factors: relevancy, search performance, indexation performance, and cost.

Each option presents its own set of trade-offs:

  • Cloud-based services like Cohere and OpenAI offer excellent relevancy and reasonable latency, with Cohere's embed-english-light-v3.0 standing out for its balance of speed and performance.
  • Local models provide the fastest search latency but may struggle with indexation speed on limited hardware.
  • Emerging services like Mistral and VoyageAI show promise with competitive pricing and performance.
  • Open-source models offer cost-effective solutions for those willing to manage their own infrastructure.

Ultimately, the best choice depends on your specific use case, budget, and performance requirements. For many applications, starting with a cloud-based service like Cohere or OpenAI provides a good balance of ease of use, performance, and cost. As your needs grow, consider exploring local or specialized models, or contact Meilisearch's sales team for tailored solutions.


Meilisearch is an open-source search engine enabling developers to build state-of-the-art experiences while enjoying simple, intuitive DX.

For more things Meilisearch, you can join the community on Discord or subscribe to the newsletter. You can learn more about the product by checking out its roadmap and participating in product discussions.

How to cache semantic search: a complete guide

How to cache semantic search: a complete guide

Learn how to cache semantic search to slash API costs and response times. Discover practical strategies for implementing caching.

Ilia Markov
Ilia Markov21 Jan 2025
How personalization and recommendations enhance search and discovery

How personalization and recommendations enhance search and discovery

Learn to boost search efficiency with personalization and recommendation strategies with case studies and tips.

Laurent Cazanove
Laurent Cazanove14 Jan 2025
Fuzzy search: a comprehensive guide to implementation

Fuzzy search: a comprehensive guide to implementation

Learn how to implement fuzzy search to handle typos and misspellings in your applications. Get practical code examples and best practices for better UX.

Ilia Markov
Ilia Markov18 Dec 2024