Uneven load distribution after data import to DSE Search cluster

408 views Asked by At

I am experimenting with DataStax Enterprise Search. I have a two node cluster and I am importing data using Solr console Dataimport capability. I have my virtual nodes disabled (num_tokens = 1 in cassandra.yaml) as per "Configuring Solr" doc (http://www.datastax.com/docs/datastax_enterprise3.2/solutions/dse_search_schema#configuring-solr). My simplified schema is as follows:

<schema name="spatial" version="1.1">

<types>
    <fieldType name="string" class="solr.StrField" omitNorms="true"/>
    <fieldType name="boolean" class="solr.BoolField" omitNorms="true"/>
    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> 
    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" omitNorms="true"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"/>
    <fieldType name="binary" class="solr.BinaryField"/>

    <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
</types>

  <fields>
      <field name="id"  type="string" indexed="true"  stored="true"/>
      <field name="objectid" type="tint" indexed="true" stored="true" required="true" multiValued="false" />
      <field name="guwi" type="string" indexed="true" stored="true" required="false" multiValued="false" />
      <field name="country" type="string" indexed="true" stored="true" required="false" multiValued="false" />
      <field name="region" type="string" indexed="true" stored="true" required="false" multiValued="false" />
      <field name="latlong" type="location" indexed="true" stored="false"/>
  </fields>
  <defaultSearchField>objectid</defaultSearchField>
  <uniqueKey>id</uniqueKey>
</schema>

Data import succeeds. However when I run "nodetool status" I can see that the load is not evenly distributed across my two node but is all concentrated on the node I used to perform data import. I tried to modify uniqueKey to be a composite key, like (id,latlong) or even a just latlong, but it does not seem to change load distribution. Am I missing something?

Thanks, Leon

1

There are 1 answers

5
RussS On

Your problem, as seen in the nodetool output, is that the two nodes have tokens that are too close together. Because of this, node (10.30.161.137) is responsible for 94% of the token range.

This is most likely because when you set the num_token=1 you did not set the initial token value. When initial token isn't set, undesirable values may be assigned.

initial_token (Default: disabled) Used in the single-node-per-token architecture, where a node owns exactly one contiguous range in the ring space. If you haven't specified num_tokens or have set it to the default value of 1, you should always specify this parameter when setting up a production cluster for the first time and when adding capacity. For more information, see this parameter in the Cassandra 1.1 Node and Cluster Configuration documentation.

Configuring Cassandra

A token calculator is available here Token Generator