Aerospike reporting requirement | Hourly frequent scans required

57 views Asked by At

We are having below aerospike set data model @ prod. We are relying on ONLY Aerospike as our data-house. Now, we need to generate the hourly report for sales team: Report detailing no of customers acquired in every hour.

@Document(collection = "cust")
public class Customer {

   @Id
   @Field(value = "PK")
   private String custId;

   @Field(value = "mobileNumber")
   private String mobileNumber;

   @Field(value = "status")
   private String customerStatus;

   @Field(value = "creationTime")
   private String creationTime;
 
   @Field(value = "corrDetails")
   private HashMap<String, Object> corrDetails;
 
}

Concerns needs help :-

a.) How the same can be achieved by avoiding the Secondary Indices ! We don't have any secondary indexes on production and would want to avoid them.

b.) is there a way where aforementioned kind of reports can be generated, Since we DON'T have MYSQL / RDBMS replicating the data unnderneath !

c.) Are frequent aerospike SET scans leads to deterioration in performance ?

1

There are 1 answers

0
sunil On

Aerospike can scan/query for records whose 'last update time'(LUT) is greater than a particular value. Assuming that there are no other updates for the set that you are talking about, you should be able to exploit this feature. Also, it seems like you need to know only the count and do not need the details of users you acquired in the last one hour. In that case, you can avoid getting bin data, which is going to make the scan/query even more efficient.

Aerospike scan based on LUT is going to be efficient as LUT is part of the primary index and in memory. However, each scan needs to walk the entire in-memory primary index to compare LUTs. So, it is not as efficient as secondary index, but it possibly is still a better tradeoff overall given the other overheads with secondary indices. But be careful not to overwhelm the system with too many scans. May be you can cache summary in aerospike itself and keep refreshing it.

You can take a look at the java client example on how to do a scan with predicate expression (query without a where clause on a bin). Refer to runQuery2 function in the example. You do not need an end time for your use case. To avoid fetching bin data, you can set the includeBinData to false in the QueryPolicy.