Getting error when trying to unload or count data from AWS Keyspace using dsbulk.
Error:
Operation COUNT_20221021-192729-813222 failed: Token metadata not present.
Command line:
$ dsbulk count/unload -k my_best_storage -t book_awards -f ./dsbulk_keyspaces.conf
Config:
datastax-java-driver {
basic.contact-points = [ "cassandra.us-east-2.amazonaws.com:9142"]
advanced.auth-provider {
class = PlainTextAuthProvider
username = "aw.keyspaces-at-XXX"
password = "XXXX"
}
basic.load-balancing-policy {
local-datacenter = "us-east-2"
}
basic.request {
consistency = LOCAL_QUORUM
default-idempotence = true
}
advanced {
request{
log-warnings = true
}
ssl-engine-factory {
class = DefaultSslEngineFactory
truststore-path = "./cassandra_truststore.jks"
truststore-password = "XXX"
hostname-validation = false
}
metadata {
token-map.enabled = false
}
}
}
dsbulk load - loading operator works fine...
I suspect the problem here is that your cluster is using the proprietary
com.amazonaws.cassandra.DefaultPartitionerpartitioner which most open-source tools and drivers don't recognise.The DataStax Bulk Loader (DSBulk) tool uses the Cassandra Java driver under the hood to connect to Cassandra clusters. The Java driver uses the partitioner to determine which nodes own tokens [ranges]. Only the following Cassandra partitioners are supported:
Murmur3PartitionerRandomPartitionerByteOrderedPartitionerSince the Java driver doesn't know about
DefaultPartitioner, it doesn't have a map of token range owners (token metadata) and so can't determine how to "split" the Cassandra ring to query the nodes.As you already figured out, this doesn't affect the
loadcommand because it simply sends writes to coordinators and lets the coordinators figure out how the data is partitioned. But forunloadandcountcommands which require reads, the Java driver can't determine which coordinators to pick for sub-range queries with an unsupported partitioner.Maybe as a workaround you can try to disable token-awareness with:
but I don't have an AWS Keyspaces cluster I could test and I'm doubtful it will work. In any case, you're welcome to try.
There is an outstanding DSBulk feature request to provide the ability to completely disable token-awareness (internal ticket ID DAT-622) but it is unassigned at the time of writing so I'm not in a position to provide any expectation on when it will be prioritised. Cheers!