Identifying documents by multiple unique keys in solr

692 views Asked by At

I have been setting SOLR up to automatically generate IDs for my documents by following this guide: https://wiki.apache.org/solr/UniqueKey, which is working as intended.

Now, when inserting a document, I would like to check/ensure that the url field (just a string) is unique for all documents in the index. So whenever a new document is added, it should just update any existing document if an document already exists with that particular url. The unique id is used to identify a document in another part of the system.

I have tried adding url to the url field, but it is just ignored and it is thus still possible to add a document with a non-unique url.

I'm using SOLR 4.10.2.

Any help is greatly appreciated!

1

There are 1 answers

0
spyk On

You could prevent duplicates from entering the index by using the "De-duplication" Solr feature. Please have a look at the wiki for configuration and more details: https://cwiki.apache.org/confluence/display/solr/De-Duplication

There is a also a flag "overwriteDupes" that I believe issues an "update" command that overrides the old values, although it is not clearly documented in the wiki.