I am working on inserting 1 billion records into aerospike using a spark job. I am using spark aeropsike connector for this. This job is scheduled to run daily. Whenever job runs, it shoots up CPU on the aeropsike cluster (upto 90% max). I was trying to understand how can we rate limit. So documents suggest the following to rate limit writes. https://aerospike.com/docs/connect/spark/rate-limiting
Can anyone exaplin me more on this how rate limting is applied if I am using a 16 core machine and lets say 100 as trasnaction rate. What will be write qps in this case and if there are any other params I can tweak to limit my writes.
It seems that you would want to configure the
aerospike.transaction.rateproperty (per the doc on the performance configuration section). From the description of that parameter, the total write qps should be limited toaerospike.transaction.ratexnumber of spark partitions. By mentioning that you are running the spark job on a 16 core machine, I am assuming that would then translate to 16 spark partitions (I may be wrong here). If that is the case, I would then expect the max write throughput to be 100 x 16 = 1600 writes per second. Did you observe something different?Having said this, the reason for the CPU shooting up may not be directly related to the throughput itself but could be also caused (for example) by connections being established (or re-established) at an unnecessary high rate (especially if leveraging TLS) and this could be tuned differently (more generous timeouts if there are any transactions timing out for example) or different compression or encryption settings (if leveraging those on the Aerospike cluster).