Configure a REST embedder

    You can integrate any text embedding generator with Meilisearch if your chosen provider offers a public REST API.

    The process of integrating a REST embedder with Meilisearch varies depending on the provider and the way it structures its data. This guide shows you where to find the information you need, then walks you through configuring your Meilisearch embedder based on the information you found.

    Find your embedder provider's documentation

    Each provider requires queries to follow a specific structure.

    Before beginning to create your embedder, locate your provider's documentation for embedding creation. This should contain the information you need regarding API requests, request headers, and responses.

    For example, Mistral's embeddings documentation is part of their API reference. In the case of Cloudflare's Workers AI, expected input and response are tied to your chosen model.

    Set up the REST source and URL

    Open your text editor and create an embedder object. Give it a name and set its source to "rest":

    {
      "EMBEDDER_NAME": {
        "source": "rest"
      }
    }
    

    Next, configure the URL Meilisearch should use to contact the embedding provider:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL"
      }
    }
    

    Setting an embedder name, a source, and a url is mandatory for all REST embedders.

    Configure the data Meilisearch sends to the provider

    Meilisearch's request field defines the structure of the input it will send to the provider. The way you must fill this field changes for each provider.

    For example, Mistral expects two mandatory parameters: model and input. It also accepts one optional parameter: encoding_format. Cloudflare instead only expects a single field, text.

    Choose a model

    In many cases, your provider requires you to explicitly set which model you want to use to create your embeddings. For example, in Mistral, model must be a string specifying a valid Mistral model.

    Update your embedder object adding this field and its value:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME"
        }
      }
    }
    

    In Cloudflare's case, the model is part of the API route itself and doesn't need to be specified in your request.

    The embedding prompt

    The prompt corresponds to the data that the provider will use to generate your document embeddings. Its specific name changes depending on the provider you chose. In Mistral, this is the input field. In Cloudflare, it's called text.

    Most providers accept either a string or an array of strings. A single string will generate one request per document in your database:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": "{{text}}"
        }
      }
    }
    

    {{text}} indicates Meilisearch should replace the contents of a field with your document data, as indicated in the embedder's documentTemplate.

    An array of strings allows Meilisearch to send up to 10 documents in one request, reducing the number of API calls to the provider:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": [
            "{{text}}", 
            "{{..}}"
          ]
        }
      }
    }
    

    When using array prompts, the first item must be {{text}}. If you want to send multiple documents in a single request, the second array item must be {{..}}. When using "{{..}}", it must be present in both request and response.

    When using other embedding providers, input might be called something else, like text or prompt:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "text": "{{text}}"
        }
      }
    }
    

    Provide other request fields

    You may add as many fields to the request object as you need. Meilisearch will include them when querying the embeddings provider.

    For example, Mistral allows you to optionally configure an encoding_format. Set it by declaring this field in your embedder's request:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": ["{{text}}", "{{..}}"],
          "encoding_format": "float"
        }
      }
    }
    

    The embedding response

    You must indicate where Meilisearch can find the document embeddings in the provider's response. Consult your provider's API documentation, paying attention to where it places the embeddings.

    Cloudflare's embeddings are located in an array inside response.result.data. Describe the full path to the embedding array in your embedder's response. The first array item must be "{{embedding}}":

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "text": "{{text}}"
        },
        "response": {
          "result": {
            "data": ["{{embedding}}"]
          }
        }
      }
    }
    

    If the response contains multiple embeddings, use "{{..}}" as its second value:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": [
            "{{text}}", 
            "{{..}}"
          ]
        },
        "response": {
          "data": [
            {
              "embedding": "{{embedding}}"
            },
            "{{..}}"
          ]
        }
      }
    }
    

    When using "{{..}}", it must be present in both request and response.

    It is possible the response contains a single embedding outside of an array. Use "{{embedding}}" as its value:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": "{{text}}"
        },
        "response": {
          "data": {
            "text": "{{embedding}}"
          }
        }
      }
    }
    

    It is also possible the response is a single item or array not nested in an object:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": [
            "{{text}}",
            "{{..}}"
          ]
        },
        "response": [
          "{{embedding}}",
          "{{..}}"
        ]
      }
    }
    

    The prompt data type does not necessarily match the response data type. For example, Cloudflare always returns an array of embeddings, even if the prompt in your request was a string.

    Meilisearch silently ignores response fields not pointing to an "{{embedding}}" value.

    The embedding header

    Your provider might also request you to add specific headers to your request. For example, Azure's AI services require an api-key header containing an API key.

    Add the headers field to your embedder object:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "text": "{{text}}"
        },
        "response": {
          "result": {
            "data": ["{{embedding}}"]
          }
        },
        "headers": {
          "FIELD_NAME": "FIELD_VALUE"
        }
      }
    }
    

    By default, Meilisearch includes a Content-Type header. It may also include an authorization bearer token, if you have supplied an API key.

    Configure remainder of the embedder

    source, request, response, and header are the only fields specific to REST embedders.

    Like other remote embedders, you're likely required to supply an apiKey:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": ["{{text}}", "{{..}}"],
          "encoding_format": "float"
        },
        "response": {
          "data": [
            {
              "embedding": "{{embedding}}"
            },
            "{{..}}"
          ]
        },
        "apiKey": "PROVIDER_API_KEY",
      }
    }
    

    You should also set a documentTemplate. Good templates are short and include only highly relevant document data:

    {
      "EMBEDDER_NAME": {
        "source": "rest",
        "url": "PROVIDER_URL",
        "request": {
          "model": "MODEL_NAME",
          "input": ["{{text}}", "{{..}}"],
          "encoding_format": "float"
        },
        "response": {
          "data": [
            {
              "embedding": "{{embedding}}"
            },
            "{{..}}"
          ]
        },
        "apiKey": "PROVIDER_API_KEY",
        "documentTemplate": "SHORT_AND_RELEVANT_DOCUMENT_TEMPLATE"
      }
    }
    

    Update your index settings

    Now the embedder object is complete, update your index settings:

    curl \
      -X PATCH 'MEILISEARCH_URL/indexes/INDEX_NAME/settings/embedders' \
      -H 'Content-Type: application/json' \
      --data-binary '{
        "EMBEDDER_NAME": {
          "source": "rest",
          "url": "PROVIDER_URL",
          "request": {
            "model": "MODEL_NAME",
            "input": ["{{text}}", "{{..}}"],
          },
          "response": {
            "data": [
              {
                "embedding": "{{embedding}}"
              },
              "{{..}}"
            ]
          },
          "apiKey": "PROVIDER_API_KEY",
          "documentTemplate": "SHORT_AND_RELEVANT_DOCUMENT_TEMPLATE"
        }
      }'
    

    Conclusion

    In this guide you have seen a few examples of how to configure a REST embedder in Meilisearch. Though it used Mistral and Cloudflare, the general steps remain the same for all providers:

    1. Find the provider's REST API documentation
    2. Identify the embedding creation request parameters
    3. Include parameters in your embedder's request
    4. Identify the embedding creation response
    5. Reproduce the path to the returned embeddings in your embedder's response
    6. Add any required HTTP headers to your embedder's header
    7. Update your index settings with the new embedder