Filtration, aggregation and pagination for document array properties

26 views Asked by At

I'm trying to implement a flexible searcher for my frontend application.

Document structure

The structure of my documents looks like this:

{
  "student": {
    "id": 1,
    "name": "Joe"
  },
  "city": {
    "id": 102,
    "name": "London"
  },
  "tags": [
    { "id": 33, "name": "football" },
    { "id": 34, "name": "basketball" },
    { "id": 35, "name": "music" },
    ...
  ],
  "skills": [
    { "id": 302, "name": "Active listening" },
    { "id": 23, "name": "Collaboration" },
    { "id": 34, "name": "Communication" },
    ...
  ]
}

I upload multiple documents with the same structure but with different data. Various students might have different tags but also certain tag can be assigned to multiple users. The same relation applies to skills.

What I want to achieve?

Let's imagine that I have three <Selects />. The first one is used to select city, the second one to select tags and the third one to select skills. When I choose London from the first select the I want to get the list of all tags for students live in London. I would like to have a possibility to filter out tags (tags names) based on the text I enter to <Select />. What's more - that list should be paginated.

Example design

Mapping I currently use

{
  "mappings": {
    "properties": {
      "student": {
        "type": "nested",
        "properties": {
          "id": { "type": "integer" },
          "name": {
            "type": "keyword"
          }
        }
      },
      "city": {
        "type": "nested",
        "properties": {
          "id": { "type": "integer" },
          "name": {
            "type": "keyword"
          }
        }
      },
      "tags": {
        "type": "nested",
        "properties": { 
           "id": { "type": "integer" },
           "name": { "type": "keyword" }
      },
      "skills": {
        "type": "nested",
        "properties": {
            "id": { "type": "integer" },
            "name": { "type": "keyword" }
      }

My attempts

This is my first attempt to solve my case. It allows pagination but it doesn't support filtering out results based on text provided in <Select />

{
  "aggs": {
    "aggregatorField": {
      "nested": {
        "path": "tags"
      },
      "aggs": {
        "aggregator": {
          "composite": {
            "size": 11,
            "sources": [
              {
                "aggregator": {
                  "terms": {
                    "field": "tags.id"
                  }
                }
              }
            ]
          },
          "aggs": {
            "item": {
              "top_hits": {
                "size": 1,
                "sort": [
                  {
                    "tags.id": {
                      "order": "desc"
                    }
                  }
                ],
                "_source": {
                  "include": [
                    "tags"
                  ]
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

To solve add filtration I added normalizer to my mapping and applied it to tags and skills

"settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },

Now my search body looks like this:

{
  "aggs": {
    "aggregatorField": {
      "nested": {
        "path": "tags"
      },
      "aggs": {
        "aggregator": {
          "terms": {
            "field": "tags.name",
            "include": ".*ball.*",
            "size": 10000
          },
          "aggs": {
            "item": {
              "top_hits": {
                "size": 1,
                "sort": [
                  {
                    "tags.id": {
                      "order": "desc"
                    }
                  }
                ],
                "_source": {
                  "include": [
                    "tags"
                  ]
                }
              }
            },
            "bucks": {
              "bucket_sort": {
                "from": 0,
                "size": 11
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

It supports searching by text but doesn't support pagination. Additionally, it doesn't look like an elegant query to me so I assume that maybe there is a better way to handle that case. Maybe there is a way to extract all tags and skills to a separate index and make something similar to sql join?

Thanks for the help!

1

There are 1 answers

0
G0l0s On

Your Way

I shortened your mapping

PUT /tags_by_city
{
    "settings": {
        "analysis": {
            "normalizer": {
                "lowercase_asciifolding_normalizer": {
                    "type": "custom",
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "city": {
                "type": "nested",
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "name": {
                        "type": "keyword"
                    }
                }
            },
            "tags": {
                "type": "nested",
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "name": {
                        "type": "keyword",
                        "normalizer": "lowercase_asciifolding_normalizer"
                    }
                }
            }
        }
    }
}

created some documents with my tags

PUT /tags_by_city/_bulk
{"create":{"_id":1}}
{"city":{"id":"301","name":"London"},"tags":[{"id":"1","name":"abcd"},{"id":"2","name":"bcde"},{"id":"4","name":"defg"},{"id":"5","name":"efgh"}]}
{"create":{"_id":2}}
{"city":{"id":"301","name":"London"},"tags":[{"id":"4","name":"cdef"},{"id":"4","name":"defg"},{"id":"5","name":"efgh"},{"id":"6","name":"fghi"},{"id":"7","name":"ghij"}]}
{"create":{"_id":3}}
{"city":{"id":"301","name":"New York"},"tags":[{"id":"4","name":"defg"},{"id":"5","name":"efgh"},{"id":"7","name":"ghij"},{"id":"8","name":"hijk"},{"id":"9","name":"ijkl"}]}

and shortened your aggregation query

GET /tags_by_city/_search?filter_path=aggregations.by_tags.by_tag_names.buckets.key
{
    "query": {
        "nested": {
            "path": "city",
            "query": {
                "term": {
                    "city.name": {
                        "value": "London"
                    }
                }
            }
        }
    },
    "aggs": {
        "by_tags": {
            "nested": {
                "path": "tags"
            },
            "aggs": {
                "by_tag_names": {
                    "terms": {
                        "field": "tags.name",
                        "include": ".*d.*",
                        "size": 10000
                    },
                    "aggs": {
                        "bucks": {
                            "bucket_sort": {
                                "sort": [
                                    {
                                        "_key": {
                                            "order": "desc"
                                        }
                                    }
                                ],
                                "from": 0,
                                "size": 4
                            }
                        }
                    }
                }
            }
        }
    }
}

This query paginates tags

"from": 0, "size": 2

{
    "aggregations" : {
        "by_tags" : {
            "by_tag_names" : {
                "buckets" : [
                    {
                        "key" : "defg"
                    },
                    {
                        "key" : "cdef"
                    }
                ]
            }
        }
    }
}

"from": 2, "size": 2

{
    "aggregations" : {
        "by_tags" : {
            "by_tag_names" : {
                "buckets" : [
                    {
                        "key" : "bcde"
                    },
                    {
                        "key" : "abcd"
                    }
                ]
            }
        }
    }
}

but sort tags only by name (or document count)

Separate Index

I don't know a way to extract all tags to a separate index only inside Elasticsearch. This way found a new problem - synchronization