I have a custom class in ES 2.5 of the following:
Title
DataSources
Content
Running a search is fine, except with the middle field - it's built/indexed using a delimiter of '|'.
ex: "|4|7|8|9|10|12|14|19|20|21|22|23|29|30"
I need to build a query that matches some in all fields AND matches at least one number in the DataSource field.
So to summarize what I currently have:
QueryBase query = new SimpleQueryStringQuery
{
//DefaultOperator = !operatorOR ? Operator.And : Operator.Or,
Fields = LearnAboutFields.FULLTEXT,
Analyzer = "standard",
Query = searchWords.ToLower()
};
_boolQuery.Must = new QueryContainer[] {query};
That's the search words query.
foreach (var datasource in dataSources)
{
// Add DataSources with an OR
queryContainer |= new WildcardQuery { Field = LearnAboutFields.DATASOURCE, Value = string.Format("*{0}*", datasource) };
}
// Add this Boolean Clause to our outer clause with an AND
_boolQuery.Filter = new QueryContainer[] {queryContainer};
}
That's for the datasources query. There can be multiple datasources.
It doesn't work, and returns on results with the filter query added on. I think I need some work on the tokenizer/analyzer, but I don't know enough about ES to figure that out.
EDIT: Per Val's comments below I have attempted to recode the indexer like this:
_elasticClientWrapper.CreateIndex(_DataSource, i => i
.Mappings(ms => ms
.Map<LearnAboutContent>(m => m
.Properties(p => p
.String(s => s.Name(lac => lac.DataSources)
.Analyzer("classic_tokenizer")
.SearchAnalyzer("standard")))))
.Settings(s => s
.Analysis(an => an.Analyzers(a => a.Custom("classic_tokenizer", ca => ca.Tokenizer("classic"))))));
var indexResponse = _elasticClientWrapper.IndexMany(contentList);
It builds successfully, with data. However the query still isn't working right.
New query for DataSources:
foreach (var datasource in dataSources)
{
// Add DataSources with an OR
queryContainer |= new TermQuery {Field = LearnAboutFields.DATASOURCE, Value = datasource};
}
// Add this Boolean Clause to our outer clause with an AND
_boolQuery.Must = new QueryContainer[] {queryContainer};
And the JSON:
{"learnabout_index":{"aliases":{},"mappings":{"learnaboutcontent":{"properties":{"articleID":{"type":"string"},"content":{"type":"string"},"dataSources":{"type":"string","analyzer":"classic_tokenizer","search_analyzer":"standard"},"description":{"type":"string"},"fileName":{"type":"string"},"keywords":{"type":"string"},"linkURL":{"type":"string"},"title":{"type":"string"}}}},"settings":{"index":{"creation_date":"1483992041623","analysis":{"analyzer":{"classic_tokenizer":{"type":"custom","tokenizer":"classic"}}},"number_of_shards":"5","number_of_replicas":"1","uuid":"iZakEjBlRiGfNvaFn-yG-w","version":{"created":"2040099"}}},"warmers":{}}}
The Query JSON request:
{
"size": 10000,
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"fields": [
"_all"
],
"query": "\"housing\"",
"analyzer": "standard"
}
}
],
"filter": [
{
"terms": {
"DataSources": [
"1"
]
}
}
]
}
}
}
One way to achieve this is to create a custom analyzer with a classic tokenizer which will break your
DataSources
field into the numbers composing it, i.e. it will tokenize the field on each|
character.So when you create your index, you need to add this custom analyzer and then use it in your
DataSources
field:As a result, if you index the string
"|4|7|8|9|10|12|14|19|20|21|22|23|29|30"
, youDataSources
field will effectively contain the following array of token:[4, 7, 8, 9, 10, 12, 14, 191, 20, 21, 22, 23, 29, 30]
Then you can get rid of your
WildcardQuery
and simply use aTermsQuery
instead: