Openrefine - reconcile by second or third candidate

109 views Asked by At

With the reconcile service I often come across this problem: the best candidate isn't really correct, the best is the second or the third candidate (ad it has also a better score), like this:

The third candidate is the correct one

How can I select the correct one in mass? I've got thousand of records, and I'm stumbling across lots of cases like this. I think it should be some way that is not doing it one by one.

For instance something that says "take the best candidate score, no matter what's its position".

Edit: as pintoch says, it could be a bug. In the meantime it's possible to create two numeric facet. One with cell.recon.candidates[1].score and the other with cell.recon.candidates[2].score. Playing with them it's possible to select the score of the third and the second candidates to make sure you get the candidate with the best score. Then it has to be reconciled one by one, but it's just a question of clicking.

1

There are 1 answers

2
pintoch On BEST ANSWER

I would say that this behaviour is a bug in the first place: the candidates should be sorted by decreasing score. The reconciliation service API does not specify that services should return their candidates with any particular order, but that is probably unintended.

The quickest solution would be to contact the person running the reconciliation service that you are using and ask them to sort the candidates by decreasing score on their side.

This also suggests improvements in OpenRefine itself: OpenRefine could always sort the results of a reconciliation service by decreasing score. I have opened a ticket about this.

More broadly, I agree that the current ways to match candidates based on specific criteria could be improved (but this might require redesigning important parts of the reconciliation system, which will take time).