I am experimenting with DataStax Enterprise Search. I have a two node cluster and I am importing data using Solr console Dataimport capability. I have my virtual nodes disabled (num_tokens = 1 in cassandra.yaml) as per "Configuring Solr" doc (http://www.datastax.com/docs/datastax_enterprise3.2/solutions/dse_search_schema#configuring-solr). My simplified schema is as follows:
<schema name="spatial" version="1.1">
<types>
<fieldType name="string" class="solr.StrField" omitNorms="true"/>
<fieldType name="boolean" class="solr.BoolField" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" omitNorms="true"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"/>
<fieldType name="binary" class="solr.BinaryField"/>
<!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="objectid" type="tint" indexed="true" stored="true" required="true" multiValued="false" />
<field name="guwi" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="country" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="region" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="latlong" type="location" indexed="true" stored="false"/>
</fields>
<defaultSearchField>objectid</defaultSearchField>
<uniqueKey>id</uniqueKey>
</schema>
Data import succeeds. However when I run "nodetool status" I can see that the load is not evenly distributed across my two node but is all concentrated on the node I used to perform data import. I tried to modify uniqueKey to be a composite key, like (id,latlong) or even a just latlong, but it does not seem to change load distribution. Am I missing something?
Thanks, Leon
Your problem, as seen in the nodetool output, is that the two nodes have tokens that are too close together. Because of this, node (10.30.161.137) is responsible for 94% of the token range.
This is most likely because when you set the num_token=1 you did not set the initial token value. When initial token isn't set, undesirable values may be assigned.
Configuring Cassandra
A token calculator is available here Token Generator