I've just figured out how to complete a Nutch crawl via the REST api for the 2.3 version of Nutch. You can see my post here. So after running the crawl, I go to MongoVue to check out the results and there is no "status" or "baseUrl" fields, along with others. Now if I do a normal crawl through cygwin, I get all fields. Is there some parameter I'm missing from the POST request to UPDATEDB call?
Here is the last call I make for Updatedb.
{
"args":{
"crawlId":"crawl-01",
"batch":"1428526896161-4430"
},
"confId":"default",
"crawlId":"crawl-01",
"type":"UPDATEDB"
}
I figured it out. The timestamp used in the GenerateJob step was wrong. It needed to be in a particular format and my code wasn't supporting it. Found a work around.