Is there a way to find out the documents updated/written during a day to a solr index

208 views Asked by At

We have a product which acts as the source of reference data to various product teams within our organization. The data has been stored in a solr index we have exposed services to provide clients with access to this data.

Now we have a requirement to provide kind of an event-driven mechanism so that the clients get notified when something on the server side.

Though I know this is quite easy to implement with products such as Oracle coherence and solr is not the right product suited for this purpose. But now its not possible for us to go backwards and change the solution.

So, to achieve the requirement, somewhat, we have exposed a RESTFul service that returns all the documents in a particular index and client applications keep on hitting this resful service so that the get the full dataset in a certain number of iterations.

I know this is not the best way but we had limited options available as we didn't want another datastore just for this.

As an improvement to this approach, what we want that we expose another service which returns the inserts/updates/deleted done to the solr index during a particular time frame. something like /companyIndex/itr/15 which gives the modification done to the company index in last 15 mins. This will help clients in reducing a volume of data they'll be handling. Once a client takes the full dataset from the index, they can work with the incremental updates later on and in this way clients data set will be in sync with master data-set. Some lag will still be there but that is fine.

Is there a way to achieve this using solr/lucene itself? Does solr maintains soem kind of audit trail which can be exposed?

Though we can keep such information with our data-loading layer, but we wanted to know if something available with solr can used?

Any suggestions/opinions?

1

There are 1 answers

1
MatsLindh On BEST ANSWER

There are several ways you can handle this. Lucene exposes information about commits in the IndexDeletionPolicy (see IndexCommits) which Solr uses to power its own Replication. You can probably hook into the replication yourself and retrieve the current version of the index and which files has changes in the meantime (see the HTTP API for replication).

If you want more details about each commit event, you'll have to dig a bit deeper, but I'm sure you can either hook directly into Lucene to observe the events yourself (in the same way the replication handler for Solr does) and then broadcast them through RabbitMQ or some other message queue to expose the information to several clients.

Hopefully that'll point you in the right direction!