How do I send a list of nested boolean filter queries to Elasticsearch using pycurl?

262 views Asked by At

I am using ElasticSearch 1.5. I have eight nodes set up and a separate client node. I am sending a filter request to the engine via Python using pycurl. I'm trying to create some nested boolean filters. I can get two separate single boolean filters to work but when I try to string them together, I get the following failure:

{u'error': u'SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[hT6TiTqoTpGaCr45SrjUtg][ships][0]: RemoteTransportException[[Kaitlyn][inet[/172.31.14.203:9300]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[ships][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"qu1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ufffd\ufffd\ufffd3\ufffd\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"must": 1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ufffd\ufffd\ufffd3\ufffd\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"melia"}1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ufffd\ufffd\ufffd3\ufffd\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00": {"sho1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ufffd\ufffd\ufffd3\ufffd\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}}, {"te1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ufffd\ufffd\ufffd3\ufffd\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 {"match1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00]]]; nested: JsonParseException[Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash to be included in name\n at [Source: UNKNOWN; line: 1, column: 7]]; }

Can anyone tell me what I'm doing wrong?

BACKGROUND

The following code works for data defined as

data = {
  "query": {
    "filtered": {
      "query": {"match_all":{}},
      "filter": {
        "bool": {
          "must": [
            {"bool": {
                "should": [
                  { "term": {"stuff":"thingamabobs"}},
                  { "term": {"stuff":"trucs"}}
                ]
              }
            }
          ]
        }
      }
    }
  }
}

and when data is defined as

data = {
  "query": {
    "filtered": {
      "query": {"match_all":{}},
      "filter": {
        "bool": {
          "must": [
            {"bool": {
                "should": [
                  { "term": {"name":"melia"}},
                  { "term": {"name":"heli"}}
                ]
              }
            },
          ]
        }
      }
    }
  }
}

but not when data is defined as (the two filters combined, apparently incorrectly)

data = {
  "query": {
    "filtered": {
      "query": {"match_all":{}},
      "filter": {
        "bool": {
          "must": [
            {"bool": {
                "should": [
                  { "term": {"name":"melia"}},
                  { "term": {"name":"heli"}}
                ]
              }
            },
            {"bool": {
                "should": [
                  { "term": {"stuff":"thingamabobs"}},
                  { "term": {"stuff":"trucs"}}
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Here is the code snippet.

import pycurl
import json
from StringIO import StringIO
from pprint import pprint
import time

def handlePycurlResult(result):
  resultDict = json.loads(result)
  print 'handlePycurlResult', type(resultDict)
  pprint(resultDict)

data = ...

c = pycurl.Curl()
c.setopt(c.WRITEFUNCTION, handlePycurlResult)
c.setopt(c.URL, 'http://localhost:9200/owners/owner/_search?pretty')
c.setopt(pycurl.HTTPHEADER, ['Accept: application/json'])
c.setopt(c.POST, 1)
c.setopt(c.POSTFIELDS, json.dumps(data))
c.perform()
c.close()

With the exception of the JSON documents I'm constructing with Python objects, the above code effectively copies Convert curl example to pycurl

1

There are 1 answers

0
Val On

I don't see anything wrong in your query. However, the error you get complains about NULL characters (i.e. \x00) being present in your JSON query and that is the issue:

Failed to parse source [{"qu1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ufffd\ufffd\ufffd3\ufffd\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"must"
...
JsonParseException[Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash to be included in name 

If you're editing your query elsewhere and then copy/pasting it into your Python code, try validating it first (e.g. at http://jsonformatter.curiousconcept.com) and paste the resulting compact JSON (i.e. without white spaces) into your code to see what happens.