FYI, this was cross-posted to the Apache Nutch mailing list.
I'm really not sure where the issue lies with this problem, whether it's a Nutch problem, Kibana or Elasticsearch. I'm using Nutch 2.3, HBase 0.94.14 and Elasticsearch 1.6 with Kibana 4.1.0 to crawl, archive and index.
I primarily followed the below tutorial, with the only exception being the upgrade to ES 1.6 from the tutorial's version of 1.4 (which I am now wondering if that's a problem).
https://gist.github.com/xrstf/b48a970098a8e76943b9
Following this tutorial, I am using the /bin/nutch script.
Most everything works; Nutch follows my seed URL's, HBase stores the downloads and Elasticsearch seems to be indexing the content, however I can't get Kibana to visualize the content coming from Nutch. Kibana recognizes the index and its fields, however shows no content. I've loaded the index in Kibana with and without time-based events to no avail.
I have other indexes and 'types' in that Elasticsearch instance which Kibana can visualize AND I can query Elasticsearch with cURL and get the nutch results just fine, I just can't get Kibana to visualize the specific content from Nutch.
I've tried two different ES + Kibana setups and just redirected the Nutch indexing output and am having the same problem on both. I have also tried deleting the index and starting over, creating the index first and then running 'nutch index -all' and trying a clean Elasticsearch / Kibana install.
I even went so far as to deploy Elasticsearch 1.4, however that requires downgrading Kibana to v3 and I am having difficulty getting that to work, but I have confirmed (again) that the content is IN Elasticsearch via cURL.
My guess is there is something about the different in the ES version, though if there were a problem with that, wouldn't the Transport Client simply fail on inserting?
Below are the logs from Kibana, which doesn't appear to show anything interesting.
{
"name": "Kibana",
"hostname": "VirtualBeast",
"pid": 6695,
"level": 30,
"req": {
"method": "POST",
"url": "\/elasticsearch\/_msearch?timeout=0&ignore_unavailable=true&preference=1434483458287",
"headers": {
"host": "localhost:5601",
"connection": "keep-alive",
"content-length": "732",
"accept": "application\/json, text\/plain, *\/*",
"origin": "http:\/\/localhost:5601",
"user-agent": "Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/43.0.2357.125 Safari\/537.36",
"content-type": "application\/json;charset=UTF-8",
"referer": "http:\/\/localhost:5601\/",
"accept-encoding": "gzip, deflate",
"accept-language": "en-US,en;q=0.8"
},
"remoteAddress": "127.0.0.1",
"remotePort": 51632
},
"res": {
"statusCode": 200,
"responseTime": 12,
"contentLength": 4992
},
"msg": "POST \/_msearch?timeout=0&ignore_unavailable=true&preference=1434483458287 200 - 12ms",
"time": "2015-06-16T19:39:57.372Z",
"v": 0
}
Any help would be appreciated, do I need to upgrade the Indexer to match the Elasticsearch version?
Thanks!