Elastic Search: One index with custom type to differentiate document schemas VS multiple index, one per document type?

1.3k views Asked by At

I am not experienced in ES (my background is more of relational databases) and I am trying to achieve the goal of having a search bar in my web application to search the entire content of it (or the content I will be willing to index in ES).

The architecture implemented is Jamstack with a gatsby application fetching content (sometimes at build time, sometimes at runtime) from a strapi application (headless cms). In the middle, I developed a microservice to write the documents created in the strapi application to the ES database. At this moment, there is only one index for all the documents, regardless the type.

My problem is, as the application grows and different types of documents are created (sometimes very different from one another, as example I can have an article (news) and a hospital) I am having hard time to correctly query the database as I have to define a lot of specific conditions when making the query (to cover all types of documents).

My solution to this is to keep only one index and break down the query in several ones and when the user hits the search button those queries are run and the results will be joined together before being presented OR break down the only index into several ones, one per document which leads me to another doubt, is it possible to query multiple indexes at once and define specific index fields in the query?

Which is the best approach? I hope I could make my self clear in this.

Thanks in advance.

1

There are 1 answers

7
Amit On BEST ANSWER

According to the example you provided, where one type of document can be of type news and another type is hospital, it makes sense to create multiple indices(but you also need to tell, how many such different types you have). there are pros and cons with both the approach and once you know them, you can choose one based on your use-case.

Before I start listing out the pros/cons, the answer to your other question is that you can query multiple indices in a single search query using multi-search API.

Pros of having a single index

  1. less management overhead of multiple indices(this is why I asked how many such indices you may have in your application).
  2. More performant search queries as data are present in a single place.

Cons

  1. You are indexing different types of documents, so you will have to include a complex filter to get the data that you need.
  2. Relevance will not be good, as you have a mix of documents which impacts the IDF of similarity algo(BM25), and impacts the relevance.

Pros of having a different index

  1. It's better to separate the data based on their properties, for better relevant results.
  2. Your search queries will not be complex.
  3. If you have really huge data, it makes sense to break the data, to have the optimal shard size and better performance.

cons

  1. More management overhead.
  2. if you need to search in all indices, you have to implement multi-search and wait for all indices search result, which might be costly.