How to store relational data in elasticsearch

Question

How to store relational data in elasticsearch

12.5k views Asked by mkalsi At 07 September 2017 at 16:16

What are the options to store relational data in elasticsearch. I know the following approaches

Nested object :- I don't want to store data in nested format because I want to update the one document without changing the other document and if I use nested object then there will be repetition of child data in parent documents.
Parent-child :- I don't want to store data in single index, but for using Parent-child data needs to be present in one index(different types). I know this restriction will be removed in future release as mentioned in https://github.com/elastic/elasticsearch/issues/15613 issue, but I want a solution that should work with 5.5 version.

Is there any other approach other then above.

Original Q&A

There are 3 answers

**AudioBubble** · Answer 1 · 2017-09-08T00:04:45+00:00

There are two more approaches: Denormalization and running multiple queries for joins.

Denormalization will eat up some more space and increase your write time, but you will just need to run one query to retrieve your data, hence, your read time will improve. Since you don't want to store data in a single index, so joining might help you out.

**Hatim Stovewala** · Answer 2 · 2017-09-08T21:20:25+00:00

Nested Object is a perfect approach for it. There will be no repetition of child objects in parent document if you update the child objects correctly. I'm using the same approach for one of my use case where I need to maintain relational data of Master-Child One-to-Many relationship. I've written a Painless script for Update API to Add & Update existing nested child objects inside parent document without creating duplicates or repetitive entries.

Updated Answer:

Below is the structure of Parent-Child Nested Type document with embedded nested type documents "childs".

{
    "parent_id": 1,
    "parent_name": "ABC",
    "parent_number": 123,
    "parent_addr": "123 6th St. Melbourne, FL 32904"
    "childs": [
      {
        "child_id": 1,
        "child_name": "PQR",
        "child_number": 456,
        "child_age": 10
      },
      {
        "child_id": 2,
        "child_name": "XYZ",
        "child_number": 789,
        "child_age": 12
      },
      {
        "child_id": 3,
        "child_name": "QWE",
        "child_number": 234,
        "child_age": 16
      }

    ]   
}

Mapping would be as below:

PUT parent/
{
  "parent": {
    "mappings": {
      "parent": {
        "properties": {
          "parent_id": {
            "type": "long"
          },
          "parent_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "parent_number": {
            "type": "long"
          },
          "parent_addr": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "child_tickets": {
            "type": "nested",
            "properties": {
              "child_id": {
                "type": "long"
              },
              "child_name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "child_number": {
                "type": "long"
              },
              "child_age": {
                "type": "long"
              }
            }
          }
        }
      }
    }
  }
}

In RDMS, these both entities(parent, child) are two different tables with One to Many relation between Parent -> Child. Parent's id is foreign key for Child's row. (id is must for both tables)

Now in Elasticsearch, to index the parent document we must have id to index it, in this case it is parent_id. Index Parent document Query(parent_id is the id which i was talking about and have index the document with id(_id) = 1):

POST parent/parent/1
{
    "parent_id": 1,
    "parent_name": "ABC",
    "parent_number": 123,
    "parent_addr": "123 6th St. Melbourne, FL 32904"
}

Now, adding child(s) to the parent. For that you will require child document which should have child id plus parent id. To add a child, parent id is must. Below is the update query to add new childs or update already present childs.

POST parent/parent/1/_update
{
    "script":{
    "lang":"painless",
    "inline":"if (!ctx._source.containsKey(\"childs\")) {
                ctx._source.childs = [];
                ctx._source.childs.add(params.child);
            } else {
                int flag=0;
                for(int i=0;i<ctx._source.childs.size();i++){
                    if(ctx._source.childs[i].child_id==params.child.child_id){
                        ctx._source.childs[i]=params.child;
                        flag++;
                    }
                }
                if(flag==0){
                    ctx._source.childs.add(params.child);
                }
            }",
    "params":{
        "child":{
                "child_id": 1,
                "child_name": "PQR",
                "child_number": 456,
                "child_age": 10
            }
        }
    }
}

Give it a shot. Cheers!

Let me know if you need anything else.

**Nacho** · Answer 3 · 2023-01-04T21:58:33+00:00

There are four mechanisms that can be used to provide relational data modelling support. Each has its pros and cons, making them useful for different situations... here's a summary:

Inner Object

Easy, fast, performant
Only applicable when one-to-one relationships are maintained
No need for special queries

Nested

Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.
Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document. This can be very expensive for large nested docs
"Cross referencing" nested documents is impossible
Best suited for data that does not change frequently

Parent/Child

Children are stored separately from the parent, but are routed to the same shard. So parent/children are slightly less performance on read/query than nested
Parent/child mappings have a bit extra memory overhead, since ES maintains a "join" list in memory
Updating a child doc does not affect the parent or any other children, which can potentially save a lot of indexing on large docs
Sorting/scoring can be difficult with Parent/Child since the Has
Child/Has Parent operations can be opaque at times

Denormalization

You get to manage all the relations yourself!
Most flexible, most administrative overhead
May be more or less performant depending on your setup

For more info, please go to: https://www.elastic.co/blog/managing-relations-inside-elasticsearch

TechQA.

How to store relational data in elasticsearch

There are 3 answers

Related Questions in ELASTICSEARCH

Related Questions in ELASTICSEARCH-MAPPING

Popular Questions

Popular Tags

Trending Questions