cassandra search a row by secondary index returns null

525 views Asked by At

I have created a TABLE and index As follows

CREATE TABLE refresh_token (
    user_id bigint,
    refresh_token text,
    access_token text,
    device_desc text,
    device_type text,
    expire_time timestamp,
    org_id bigint,
    PRIMARY KEY (user_id, refresh_token)
) WITH CLUSTERING ORDER BY (refresh_token ASC)
CREATE INDEX i_access_token ON demodb.refresh_token (access_token);

After i insert or delete data about millions times.I'm found when i user the follow query can not return any data. Actually,there has this row in the data.

when i query by PRIMARY KEY

select * from refresh_token where user_id=405198 and refresh_token='E82B57D9D64BECDBD6B5602A72816BD19016323504F803116F66A32598E04298';

it returns data:

 select * from refresh_token where user_id=405198 and refresh_token='E82B57D9D64BECDBD6B5602A72816BD19016323504F803116F66A32598E04298';

 user_id | refresh_token                                                    | access_token                                                     | device_desc | device_type | expire_time              | org_id
---------+------------------------------------------------------------------+------------------------------------------------------------------+-------------+-------------+--------------------------+--------------
  405198 | E82B57D9D64BECDBD6B5602A72816BD19016323504F803116F66A32598E04298 | E82B57D9D64BECDB16D4F3F9F81AC0EF7AF2C4B460CB0F33C9CEFA5846BA7BE1 |        null |        null | 2016-06-07 14:09:52+0800 | 481036337156

but when i query by secondary index,it return null.

 select * from refresh_token where access_token ='E82B57D9D64BECDB16D4F3F9F81AC0EF7AF2C4B460CB0F33C9CEFA5846BA7BE1';

 user_id | refresh_token | access_token | device_desc | device_type | expire_time | org_id
---------+---------------+--------------+-------------+-------------+-------------+--------

thanks

1

There are 1 answers

0
Jim Meyer On BEST ANSWER

Secondary indexes are suggested only for fields with low cardinality. Your access_token field looks like it has very high cardinality (and may even be unique for all million rows). This is a known anti pattern in Cassandra.

High cardinality fields are good for things like partition keys because they will hash to a known location. But secondary indexes are not hashed and are found via local data structures on each node. These local data structures become cumbersome and inefficient when there are a lot of different values being indexed. I suspect you are hitting an internal timeout before the node with the matching access_token is finding the needle in the haystack.

If you need to find data by access_token, I'd suggest creating a second table where access_token is the partition key and use it to look up the corresponding user_id and refresh_token. That way you will be using access_token as a hash and will get reliable and quick look ups.