sub group distribution-key is strange

80 views Asked by At

The following is from vespa doc. Since documents are distributed to all groups, sub group is replia. Why distribution-key s are different among subgroups? 0,1,2 for group0, while 3,4,5 for group1

In my understanding, ditribution-key s are also 0,1,2 in group1 and group2

9 nodes, 3 groups with 3 nodes per group: This example has 3 groups and each group index all of the documents over the 3 nodes in the group. With 3 groups there are 3 replicas in total of each document, and each replica is indexed and active. Losing a node does not reduce search coverage.

<services version="1.0">
  <container id="stateless-container-cluster" version="1.0">  
    <search/>
    <document-api/>
    <nodes>
      <node hostalias="container0"/>
      <node hostalias="container1"/>
      <node hostalias="container2"/>
    </nodes>
  </container>

  <content id="my-content" version="1.0">
    <documents>
      <document type="my-document" mode="index">
    </documents>
    <redundancy>3</redundancy>
     <engine>
      <proton>
        <searchable-copies>3</searchable-copies>
      </proton>
     </engine>
    <group name="top-group">
      <distribution partitions="*|*|*"/>
      <group name="group0" distribution-key="0">
        <node hostalias="searcher1" distribution-key="0"/>
        <node hostalias="searcher2" distribution-key="1"/>
        <node hostalias="searcher3" distribution-key="2"/>
      </group>
      <group name="group1" distribution-key="1">
        <node hostalias="searcher4" distribution-key="3"/>
        <node hostalias="searcher5" distribution-key="4"/>
        <node hostalias="searcher6" distribution-key="5"/>
      </group>
      <group name="group3" distribution-key="2">
        <node hostalias="searcher7" distribution-key="6"/>
        <node hostalias="searcher8" distribution-key="7"/>
        <node hostalias="searcher9" distribution-key="8"/>
      </group>
    </group>
  </content>
</services>

1

There are 1 answers

2
Tor Brede Vekterli On

This is partly due to historical reasons, as support for groups was built on top of the protocols used for managing a non-hierarchical cluster. This means messages are routed to nodes based on their individual distribution keys, which therefore have to be unique within a content cluster and across groups.

Distribution keys should also be stable for the node's lifetime, as they affect the paths used on disk for storing indices and document data etc. Enforcing unique distribution keys across groups means that you can move a node to another group without the risk of accidental data loss due to suddenly getting a new distribution key.

I'll take a look at our documentation to see if I can make this more intuitive. It's very reasonable to expect distribution keys to be hierarchical, so it should be clearly explained why this is not the case.