Choosing the best model for semantic search
A comparison of model performance, cost, and relevancy in regard to building semantic search.
Semantic search is transforming search technology by providing more accurate and relevant results. However, with many embedding models available, choosing the right one can be challenging. This guide will help you understand the key factors to consider when selecting a model to build semantic search.
Overview
In this guide, we will use the open-source search engine Meilisearch to perform the semantic searches. For the purpose of the tests, we're using the entry tier of Meilisearch Cloud (i.e., the Build plan .)
This guide will cover the following models:
Model/Service | Dimensions | Context Length |
---|---|---|
Cohere embed-english-v3.0 | 1024 | 512 |
Cohere embed-english-light-v3.0 | 384 | 512 |
Cohere embed-multilingual-v3.0 | 1024 | 512 |
Cohere embed-multilingual-light-v3.0 | 384 | 512 |
OpenAI text-embedding-3-small | 1536 | 8192 |
OpenAI text-embedding-3-large | 3072 | 8192 |
Mistral | 1024 | 8192 |
VoyageAI voyage-2 | 1024 | 4000 |
VoyageAI voyage-large-2 | 1536 | 16000 |
VoyageAI voyage-multilingual-2 | 1024 | 32000 |
Jina Colbert v2 | 128, 96, or 64 | 8192 |
OSS all-MiniLM-L6-v2 | 384 | 512 |
OSS bge-small-en-v1.5 | 1024 | 512 |
OSS bge-large-en-v1.5 | 1536 | 512 |
Factors to consider
1. Results relevancy
Relevancy is crucial for effective search, as it ensures that users find the most pertinent results quickly. In the realm of semantic search, achieving a balance between relevancy and speed is essential to provide a seamless user experience. It's important to consider the tradeoffs of vector search vs full-text search
When selecting a model, consider your specific use case, such as the need for multilingual support, handling multi-modal data, or addressing domain-specific requirements. If you have a highly specialized use case or need to support a particular language, it may be beneficial to explore models that can be trained on your data or opt for multilingual models.
The performance difference between a very small model and a large model is not always substantial. Smaller models are generally less expensive and faster, making them a practical choice in many scenarios. Therefore, it is often worth considering smaller models for their cost-effectiveness and speed.
Additionally, you should always consider the context you're providing to the model. In Meilisearch, this comes in the form of a document template. The more accurately the template describes the data, the better the search results will be, leading to a more satisfying user experience.
2. Search performance
You read everywhere now. Time is money. And the web is no different. Nowadays, search-as-you-type is the baseline for customer-facing applications. Saving users time greatly enhances their satisfaction and keeps them engaged with your platform.
To achieve lightning-fast search performance, consider using a local model to minimize latency by eliminating the need for round trips to the embedding service. If you need to use a remote model, then hosting your search service (e.g., your Meilisearch database) in close proximity to the embedding service can significantly reduce latency.
The table below showcases latency benchmarks for various local embedding models and embedding APIs. All requests are originated from a Meilisearch instance hosted on AWS (London datacenter.)
Model/Service | Latency |
---|---|
Cloudflare bge-small-en-v1.5 | ±800ms |
Cloudflare bge-large-en-v1.5 | ±500ms |
Cohere embed-english-v3.0 | ±170ms |
Cohere embed-english-light-v3.0 | ±160ms |
Local gte-small | ±20ms |
Local all-MiniLM-L6-v2 | ±10ms |
Local bge-small-en-v1.5 | ±20ms |
Local bge-large-en-v1.5 | ±60ms |
Mistral | ±200ms |
Jina colbert | ±400ms |
OpenAI text-embedding-3-small | ±460ms |
OpenAI text-embedding-3-large | ±750ms |
VoyageAI voyage-2 | ±350ms |
VoyageAI voyage-large-2 | ±400ms |
Here you can see that there are some clear winners in terms of latency. Unfortunately, latency is not the same as throughput, so we also need to take a close look at the indexing time.
3. Indexing performance
Indexing performance is another critical aspect when comparing search solutions. The embedding model performance will directly impact the indexing speed of your search solution. And the speed at which your data can be indexed directly impacts the overall efficiency and scalability of your search solution.
Local models without GPUs may have slower indexing due to limited processing power. In contrast, third-party services offer varying speeds and limitations based on their infrastructure and service agreements. It is essential to evaluate these factors to ensure that your chosen model and service can meet your requirements effectively.
Several factors come into play when optimizing indexing. Again, the latency plays a big role: reducing the time taken for data to travel between your application and the model is always going to improve your experience. Additionally, the maximum size of API calls the API accepts, the provider's rate limiting, and the model's supported number of dimensions can all influence the efficiency and scalability of the indexing process.
The benchmark below compares the indexing of a 10k e-commerce documents (with automatic embedding generation):
Model/Service | Indexation Time |
---|---|
Cohere embed-english-v3.0 | 43s |
Cohere embed-english-light-v3.0 | 16s |
OpenAI text-embedding-3-small | 95s |
OpenAI text-embedding-3-large | 151s |
Cloudflare bge-small-en-v1.5 | 152s |
Cloudflare bge-large-en-v1.5 | 159s |
Jina Colbert V2 | 375s |
VoyageAI voyage-large-2 | 409s |
Mistral | 409s |
Local all-MiniLM-L6-v2 | 880s |
Local bge-small-en-v1.5 | 3379s |
Local bge-large-en-v1.5 | 9132s |
4. Pricing
While local embedders are free, most services charge per million of tokens. Here's a breakdown of the pricing for each platform:
Provider | Price |
---|---|
Cohere | $0.10 per million tokens |
OpenAI | $0.13 per million tokens for text-embedding-3-large |
$0.02 per million tokens for text-embedding-3-small | |
Cloudflare | $0.011 per 1,000 Neurons |
Jina | $0.18 per million tokens |
Mistral | $0.10 per million tokens |
VoyageAI | $0.10 per million tokens for voyage-2 |
$0.12 per million tokens for voyage-large-2 | |
$0.12 per million tokens for voyage-multilingual-2 | |
Local model | Free |
As your search needs grow and scale, it may become more cost-effective to invest in your own GPU machine. By having your own hardware, you can have greater control over the performance and scalability of your search solution and potentially reduce costs in the long run.
It is often best to start with a well-known model from the list provided. They are generally easy to setup and you will easily find community resources to help you.. As the need arises, you can consider migrating the model to a cloud provider like AWS. Many services offer this option, allowing you to leverage their infrastructure for improved performance and scalability.
Alternatively, you can choose an equivalent open-source model to self-host, giving you even more flexibility and control over your search solution in the long term. Please note that optimizing local models for performance or high volume may require to scale your infrastructure accordingly.
Ready to elevate your search experience?
Going further
While this article provides a comprehensive overview, we did not delve deeply into optimization techniques. There are several additional optimizations that can be explored to further enhance the performance of semantic search.
Here is a list of additional areas to investigate when choosing a model for your search experience:
- Experiment with different presets (query vs. document) for models that offer this option to potentially improve relevancy
- Evaluate specialized models for specific applications to assess their performance and suitability for your use case
- Explore models that provide a reranking function to further refine search results
- Test higher-tier accounts on each platform to check for improved performance and reduced rate limiting
- Investigate parameters for receiving quantized data directly from the API to optimize data transfer and processing
Conclusion
Model/Service | Dimensions | Context Length | Latency | Indexation Time | Pricing (per million tokens) |
---|---|---|---|---|---|
Cohere embed-english-v3.0 | 1024 | 512 | ±170ms | 43s | $0.10 |
Cohere embed-english-light-v3.0 | 384 | 512 | ±160ms | 16s | $0.10 |
OpenAI text-embedding-3-small | 1536 | 8192 | ±460ms | 95s | $0.02 |
OpenAI text-embedding-3-large | 3072 | 8192 | ±750ms | 151s | $0.13 |
Mistral | 1024 | 8192 | ±200ms | 409s | $0.10 |
VoyageAI voyage-2 | 1024 | 4000 | ±350ms | 330s | $0.10 |
VoyageAI voyage-large-2 | 1536 | 16000 | ±400ms | 409s | $0.12 |
Jina Colbert v2 | 128, 96, or 64 | 8192 | ±400ms | 375s | $0.18 |
OSS all-MiniLM-L6-v2 | 384 | 512 | ±10ms | 880s | Free |
OSS bge-small-en-v1.5 | 1024 | 512 | ±20ms | 3379s | Free |
OSS bge-large-en-v1.5 | 1536 | 512 | ±60ms | 9132s | Free |
Choosing the right model and service for semantic search involves carefully balancing several key factors: relevancy, search performance, indexation performance, and cost.
Each option presents its own set of trade-offs:
- Cloud-based services like Cohere and OpenAI offer excellent relevancy and reasonable latency, with Cohere's embed-english-light-v3.0 standing out for its balance of speed and performance.
- Local models provide the fastest search latency but may struggle with indexation speed on limited hardware.
- Emerging services like Mistral and VoyageAI show promise with competitive pricing and performance.
- Open-source models offer cost-effective solutions for those willing to manage their own infrastructure.
Ultimately, the best choice depends on your specific use case, budget, and performance requirements. For many applications, starting with a cloud-based service like Cohere or OpenAI provides a good balance of ease of use, performance, and cost. As your needs grow, consider exploring local or specialized models, or contact Meilisearch's sales team for tailored solutions.
Meilisearch is an open-source search engine enabling developers to build state-of-the-art experiences while enjoying simple, intuitive DX.
For more things Meilisearch, you can join the community on Discord or subscribe to the newsletter. You can learn more about the product by checking out its roadmap and participating in product discussions.