How to restart SOLR search using Solrj from page?

114 views Asked by At

I'm iterating whole solr using solrj. Solr will return me page with uuid's records and I'm checking that uuid in my Fedora Commons Repository. I want iterate whole solr, in my case It can take up to one week to finish. So far It ran 3 days a then it failed on error not related to solr.

So I'm asking, is there a way how to run search from some specific page of results? Let's say I would always log my last page, so next time when my program fails, I dont need to run it from beginning, but instead I will run it from last page where my program failed. Can anybody help? Thank you.

How I iterate solr:

for (String model : models) {
        try {
            //SOLR
            final String solrUrl = "http://localhost:1234/solr/test";
            HttpSolrClient solr = new HttpSolrClient.Builder(solrUrl).build();
            solr.setParser(new XMLResponseParser());
            SolrQuery query = new SolrQuery();
            query.setQuery("fedora." + model);
            query.setRows(10);
            query.addSort("PID", SolrQuery.ORDER.asc);
            String cursorMark = CursorMarkParams.CURSOR_MARK_START;
            boolean done = false;
            while (!done) {
                query.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
                QueryResponse rsp = solr.query(query);
                String nextCursorMark = rsp.getNextCursorMark();
                for (SolrDocument doc : rsp.getResults()) {
                    ....I do something with result
                }
                if (cursorMark.equals(nextCursorMark)) {
                    done = true;
                }
                cursorMark = nextCursorMark;
            }
            solr.close();
        } catch (SolrServerException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
}
1

There are 1 answers

0
MatsLindh On

If the index hasn't changed, the cursorMark value is still valid. As long as you keep the last cursorMark stored locally, you can restart the pagination by using that cursorMark.

The cursorMark indicates how far into the sorted result set you've progressed, so it's just as good as a page number in regular pagination.

If the index has changed however, you can't re-use the same cursorMark and expect to get all results (if you're sorting on a field that can have entries added earlier - something different than time) - but that wouldn't be true for regular pagination either.