I want to select a sample of random 'n' bins from a set in the namespace. Is there a way to achieve this in Aerospike Query Language?
In Oracle, we achieve something similar with the following query:
SELECT * FROM <table-name> sample block(10) where rownum < 101
The above query fetches blocks of size of 10 rows from a sample size of 100.
Can we do something similar to this in Aerospike also?
Rows are like records in Aerospike, and columns are like bins. You don’t have a way to sample random columns from a table, do you?
You can sample random records from a set using
ScanPolicy.maxRecords
added to a scan of that set. Note the new (optional) set indexes in Aerospike version 5.6 may accelerate that operation.Each namespace has its data partitioned into 4096 logical partitions, and the records in the namespace evenly distributed to each of those using the characteristics of the 20-byte RIPEMD-160 digest. Therefore, Aerospike doesn't have a
rownum
, but you can leverage the data distribution to sample data.Each partition is roughly 0.0244% of the namespace. That's a sample space you can use, similar to the SQL query above. Next, if you are using the
ScanParition
method of the client, you can give it theScanPolicy.maxRecords
to pick a specific number of records out of that partition. Further you can start after an arbitrary digest (seePartitionFilter.after
) if you'd like.Ok, now let's talk data browsing. Instead of using the aql tool, you could be using the Aerospike JDBC driver, which works with any JDBC compatible data browser like DBeaver, SQuirreL, and Tableau. When you use
LIMIT
on aSELECT
statement it will basically do what I described above - use partition scanning and a max-records sample on that scan. I suggest you try this as an alternative.