How to search over all fields and return every document containing that search in elasticsearch?

1.7k views Asked by At

I have a problem regarding searching in elasticsearch. I have a index with multiple documents with several fields. I want to be able to search over all the fields running a query and want it to return all the documents that contains the value specified in the query. I Found that using simple_query_string worked well for this. However, it does not return consistent results. In my index I have documents with several fields that contain dates. For example:

"revisionDate" : "2008-01-01T00:00:00",
"projectSmirCreationDate" : "2008-07-01T00:00:00",
"changedDate" : "1971-01-01T00:00:00",
"dueDate" : "0001-01-01T00:00:00",

Those are just a few examples, however when I index for example:

GET new_document-20_v2/_search
{
  "size": 1000, 
  "query": {
    "simple_query_string" : {
        "query": "2008"
    }
  }
}

It only returns two documents, this is a problem because I have much more documents than just two that contains the value "2008" in their fields.

I also have problem searching file names. In my index there are fields that contain fileNames like this:

"fileName" : "testPDF.pdf",
"fileName" : "demo.pdf",
"fileName" : "demo.txt",

When i query:

GET new_document-20_v2/_search
{
  "size": 1000, 
  "query": {
    "simple_query_string" : {
        "query": "demo"
    }
  }
}

I get no results But if i query:

GET new_document-20_v2/_search
{
  "size": 1000, 
  "query": {
    "simple_query_string" : {
        "query": "demo.txt"
    }
  }
}

I get the proper result.

Is there any better way to search across all documents and fields than I did? I want it to return all the document matching the query and not just two or zero. Any help would be greatly appreciated.

1

There are 1 answers

2
ESCoder On BEST ANSWER

Elasticsearch uses a standard analyzer if no analyzer is specified. Since no analyzer is specified on "fileName", demo.txt gets tokenized to

{
  "tokens": [
    {
      "token": "demo.txt",
      "start_offset": 0,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

Now when you are searching for demo it will not give any result, but searching for demo.txt will give the result.


You can instead use a wildcard query to search for a document having demo in fileName

{
  "query": {
    "wildcard": {
      "fileName": {
        "value": "demo*"
      }
    }
  }
}

Search Result will be

"hits": [
      {
        "_index": "67303015",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "fileName": "demo.pdf"
        }
      },
      {
        "_index": "67303015",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "fileName": "demo.txt"
        }
      }
    ]

Since revisionDate, projectSmirCreationDate, changedDate, dueDate are all of type date, so you cannot do a partial search on these dates.

You can use multi-fields, to add one more field (of text type) in the above fields. Modify your index mapping as shown below

{
  "mappings": {
    "properties": {
      "changedDate": {
        "type": "date",
        "fields": {
          "raw": {
            "type": "text"
          }
        }
      },
      "projectSmirCreationDate": {
        "type": "date",
        "fields": {
          "raw": {
            "type": "text"
          }
        }
      },
      "dueDate": {
        "type": "date",
        "fields": {
          "raw": {
            "type": "text"
          }
        }
      },
      "revisionDate": {
        "type": "date",
        "fields": {
          "raw": {
            "type": "text"
          }
        }
      }
    }
  }
}

Index Data:

{
  "revisionDate": "2008-02-01T00:00:00",
  "projectSmirCreationDate": "2008-02-01T00:00:00",
  "changedDate": "1971-01-01T00:00:00",
  "dueDate": "0001-01-01T00:00:00"
}
{
  "revisionDate": "2008-01-01T00:00:00",
  "projectSmirCreationDate": "2008-07-01T00:00:00",
  "changedDate": "1971-01-01T00:00:00",
  "dueDate": "0001-01-01T00:00:00"
}

Search Query:

{
  "query": {
    "multi_match": {
      "query": "2008"
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "67303015",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "revisionDate": "2008-01-01T00:00:00",
          "projectSmirCreationDate": "2008-07-01T00:00:00",
          "changedDate": "1971-01-01T00:00:00",
          "dueDate": "0001-01-01T00:00:00"
        }
      },
      {
        "_index": "67303015",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.18232156,
        "_source": {
          "revisionDate": "2008-02-01T00:00:00",
          "projectSmirCreationDate": "2008-02-01T00:00:00",
          "changedDate": "1971-01-01T00:00:00",
          "dueDate": "0001-01-01T00:00:00"
        }
      }
    ]