AI-powered hybrid search is in closed beta. Join the waitlist for early access!

Go to homeMeilisearch's logo
Back to articles
26 Sept 2024

Searching across multiple languages

Discover how easy it can be to implement advanced multilingual search and give your users the seamless, relevant results they deserve—regardless of language.

Quentin de Quelen
Quentin de QuelenCo-founder & CEO at Meilisearch@Quentin_dQ
Searching across multiple languages

In today's interconnected world, many applications need to handle content in multiple languages. Meilisearch offers powerful capabilities for multilingual search, combining intelligent language handling with specialized models for optimal relevance. This guide will walk you through setting up multilingual search with Meilisearch, covering both full-text and semantic search approaches.

Why multilingual search matters

Before we dive into the technical details, let's consider why multilingual search is crucial:

  1. Global reach: Expand your audience by catering to users in their native languages.
  2. Improved user experience: Provide relevant results regardless of the language used in queries or content.
  3. Increased engagement: Users are more likely to interact with content they can easily find and understand.

Now, let's explore how Meilisearch makes multilingual search a breeze.

Full-text search: intelligent language handling

At its core, Meilisearch excels at handling diverse languages out of the box. Our advanced tokenizer is designed to work with:

  • Languages that use whitespace to separate words (like English, Spanish, and French)
  • Languages with different writing systems (such as Chinese, Japanese, and Thai)

Automatic language detection

When you add documents to your Meilisearch index, the engine automatically detects the language of each field. This means you don't need to specify languages manually in most cases - Meilisearch will handle it for you.

For example, if your documents contain a mix of English and Spanish content:

[
  {
    "id": 1,
    "title": "The quick brown fox",
    "description": "El zorro marrón rápido salta sobre el perro perezoso"
  },
  {
    "id": 2,
    "title": "La luna y las estrellas",
    "description": "A beautiful night sky filled with twinkling stars"
  }
]

Meilisearch will automatically detect and properly handle the different languages in each field, ensuring accurate search results across your multilingual content.

Specifying languages for specific fields

While automatic detection works well in most cases, you might want more control over language handling for certain fields. Meilisearch allows you to specify languages for particular attributes using the localizedAttributes setting.

Here's how you can set it up:

{
  "localizedAttributes": [
    {
      "locales": ["eng"],
      "attributePatterns": ["title_en", "*_english"]
    },
    {
      "locales": ["spa"],
      "attributePatterns": ["description_es", "*_spanish"]
    }
  ]
}

This configuration tells Meilisearch to treat fields ending with _en or _english as English, and fields ending with _es or _spanish as Spanish. This approach is particularly useful when you have a consistent naming convention for your multilingual fields or when you want to ensure specific treatment for certain attributes.

Semantic search: Enhancing multilingual relevance

While full-text search works well for many multilingual scenarios, semantic search can provide even better results, especially for complex queries or when dealing with nuanced meanings across languages.

Meilisearch supports various embedding models for semantic search, offering powerful auto-embedding capabilities. This means you can easily set up semantic search without having to manage the embedding process yourself. Meilisearch handles the generation and storage of embeddings automatically, seamlessly integrating them into your search experience.

For multilingual applications, we recommend using specialized multilingual models rather than relying on general-purpose models. These models are designed to understand and represent meaning across multiple languages, leading to more relevant search results in multilingual contexts.

One excellent option for multilingual semantic search is the Cohere Multilingual embedding model. It offers a great balance of speed and relevance across a wide range of languages. Cohere's model is particularly adept at understanding context and nuance in different languages, making it an ideal choice for diverse, multilingual content. Follow the guide if you want to do it now.

To explore different models and their performance, you can use the Meilisearch playground. This interactive tool allows you to test various embedding models and see how they perform with your specific multilingual queries and content.

Conclusion

We encourage you to experiment with different settings and models to find the optimal configuration for your specific use case. And remember, the Meilisearch community is always here to help - join us on Discord or check out our documentation for more detailed information and support.

Ready to get started with multilingual search? Try Meilisearch today with a 14 day trial!

How to cache semantic search: a complete guide

How to cache semantic search: a complete guide

Learn how to cache semantic search to slash API costs and response times. Discover practical strategies for implementing caching.

Ilia Markov
Ilia Markov21 Jan 2025
How personalization and recommendations enhance search and discovery

How personalization and recommendations enhance search and discovery

Learn to boost search efficiency with personalization and recommendation strategies with case studies and tips.

Laurent Cazanove
Laurent Cazanove14 Jan 2025
Fuzzy search: a comprehensive guide to implementation

Fuzzy search: a comprehensive guide to implementation

Learn how to implement fuzzy search to handle typos and misspellings in your applications. Get practical code examples and best practices for better UX.

Ilia Markov
Ilia Markov18 Dec 2024