Nutch 2.3 + Elasticsearch / results not visualizing in Kibana

532 views Asked by At

FYI, this was cross-posted to the Apache Nutch mailing list.

I'm really not sure where the issue lies with this problem, whether it's a Nutch problem, Kibana or Elasticsearch. I'm using Nutch 2.3, HBase 0.94.14 and Elasticsearch 1.6 with Kibana 4.1.0 to crawl, archive and index.

I primarily followed the below tutorial, with the only exception being the upgrade to ES 1.6 from the tutorial's version of 1.4 (which I am now wondering if that's a problem).

https://gist.github.com/xrstf/b48a970098a8e76943b9

Following this tutorial, I am using the /bin/nutch script.

Most everything works; Nutch follows my seed URL's, HBase stores the downloads and Elasticsearch seems to be indexing the content, however I can't get Kibana to visualize the content coming from Nutch. Kibana recognizes the index and its fields, however shows no content. I've loaded the index in Kibana with and without time-based events to no avail.

I have other indexes and 'types' in that Elasticsearch instance which Kibana can visualize AND I can query Elasticsearch with cURL and get the nutch results just fine, I just can't get Kibana to visualize the specific content from Nutch.

I've tried two different ES + Kibana setups and just redirected the Nutch indexing output and am having the same problem on both. I have also tried deleting the index and starting over, creating the index first and then running 'nutch index -all' and trying a clean Elasticsearch / Kibana install.

I even went so far as to deploy Elasticsearch 1.4, however that requires downgrading Kibana to v3 and I am having difficulty getting that to work, but I have confirmed (again) that the content is IN Elasticsearch via cURL.

My guess is there is something about the different in the ES version, though if there were a problem with that, wouldn't the Transport Client simply fail on inserting?

Below are the logs from Kibana, which doesn't appear to show anything interesting.

{
  "name": "Kibana",
  "hostname": "VirtualBeast",
  "pid": 6695,
  "level": 30,
  "req": {
    "method": "POST",
    "url": "\/elasticsearch\/_msearch?timeout=0&ignore_unavailable=true&preference=1434483458287",
    "headers": {
      "host": "localhost:5601",
      "connection": "keep-alive",
      "content-length": "732",
      "accept": "application\/json, text\/plain, *\/*",
      "origin": "http:\/\/localhost:5601",
      "user-agent": "Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/43.0.2357.125 Safari\/537.36",
      "content-type": "application\/json;charset=UTF-8",
      "referer": "http:\/\/localhost:5601\/",
      "accept-encoding": "gzip, deflate",
      "accept-language": "en-US,en;q=0.8"
    },
    "remoteAddress": "127.0.0.1",
    "remotePort": 51632
  },
  "res": {
    "statusCode": 200,
    "responseTime": 12,
    "contentLength": 4992
  },
  "msg": "POST \/_msearch?timeout=0&ignore_unavailable=true&preference=1434483458287 200 - 12ms",
  "time": "2015-06-16T19:39:57.372Z",
  "v": 0
}

Any help would be appreciated, do I need to upgrade the Indexer to match the Elasticsearch version?

Thanks!

0

There are 0 answers