Fields in documents are analyzed, to create token.
{"message":"hello world"}
-> token: ["hello", "world"]{"message":"hello"}
-> token: ["hello"]{"message":"world"}
-> token: ["world"]{"message":"hello java"}
-> token: ["hello", "java"]{"message":"java"}
-> token: ["java"]
Is there a possibility to search all documents in which a specific field contains a given token and 1 or more token other token?
- Result for the given example for token "hello" would be:
- 1,4
- For "world":
- 1
As described in termvectors, one can access the tokens or statistics about them. This only works for specific documents but not as search filter for a query or aggregation.
Would be nice if someone could help.
Yes, you can use the
token_count
type for this. For instance, in your mapping, you can definemessage
as a multi-field to contain the message itself (i.e. "hello", "hello world", etc) and also the number of tokens of the message. Then you'll be able to include constraints on the word count in your queries.So your mapping for
message
should look like this:Then, you can query for all documents having
hello
in the message, but only those whosemessage
has more than one token. With the following query, you'll only gethello java
andhello world
, but nothello
Similarly, if you replace
hello
withworld
in the above query, you'll only gethello world
.