Choosing the Right Solution for Search and Indexing

326 views Asked by At

We are working on headless application design and development. Currently, we are facing a **architectural question** which we need to find the answer to proceed with designing the system, we are not experts in the **search engine**, but we are doing research on this area.

Our tech stack is .net Core/SQL Server and in future we may plan to use Raven DB.

Instead of using content delivery API, we plan to use Query based content delivery to make it more flexible and reduce the overheads of API development for each front end framework. and We decided to use indexing and index for majority of the data management, i.e. to reduce the DB load. So basically most content operations will be handled using the indexes.

The problem we observed with Search Engine: On the first cut, we planned to use Elastic Search, but again we understood the following issues.

The system will have a dynamic field management and field data management, i.e. user will be editing the fields, and field values while the system is running. for each time we may need to rebuild the index to update the field in elastic search (We are not experts in search engine), this will increase the network load which may not be feasible for us to operate in a large multitenant environment.

So we decided to go with Lucene.net, but before proceeding with lucene.net we want to make sure the following things can be solved.

Updating field dynamically without rebuilding indexing each time, does lucene support this or can we customize to manage this?

The second Issue is managing separate indexes for each tenant with a distributed architecture.

We plan to have a partition for each tenant in production so that data will not be in a single index. This is because we don't need to put high load on web server for managing permission-based query results, instead, Lucene will do this. so for any query results will be returned based on permission of the users who queried it, so it is better to have separate index for each tenant to reduce the operations.

Is it possible to have distributed Lucene implementation by having a partition for each tenant exclusively?

So kindly help in finding a solution for above two problems that we facing right now.

1

There are 1 answers

2
Amit On BEST ANSWER

Elasticsearch internally uses Lucene only, every elasticsearch index(made up of one or more shards) is internally a Lucene index. You can even think of Elasticsearch as a distributed Lucene which can be easily scaled to thousands of physical servers easily.

Now, this should clear you any doubt as all the low-level operation like updating a document and deleting the document is done by internally Lucene in case of Elasticsearch which is part 1 of your question.

Your first question

Q: Updating field dynamically without rebuilding indexing each time, does Lucene support this or can we customize to manage this?

You are just updating a single document, it would not cause the entire index to rebuild and you will get the updated document within 1 sec(default refresh interval) or you if want updated document immediately you can do an explicit refresh(Not recommended).

Coming to your second question:

Q: Is it possible to have distributed Lucene implementation by having a partition for each tenant exclusively?

Answer: As explained you can think of Elasticsearch as a distributed Lucence only and can create a separate index for each of tenant easily and they won't interface with each other data(although if you are storing multiple indices on the same Elasticsearch cluster there will not an infra resource isolation(CPU, memory)) etc and you can get the noisy neighbors issue.