Suppose I have a restful service X with an API getName(String id). My code is deployed to 5 machines which are behind a load balancer. Same client code runs on these machines and somewhere it makes a call to service X for getName(String) api. The service has put a restriction that it will handle maximum of 3 calls per second. Suppose the turn around time for a request is 200 ms, how do I ensure that my clients don't surpass the 3 TPS limit of the server? I have no mechanism for my clients to communicate with each other. How do I avoid the throttling on server side. What if I grow my fleet size to 10 or 15 from 5? Is there some thing that I can do?
Would something like a truncated exponential backoff work for me?
You just have to save the request arrival times of last 3 requests in the session. Then check if the 4th request is within that 1 second window or not.
Assuming t0 is the 4th request in this timeline (Time in milliseconds),
...--t3-----t2-----t1----t0----
If you want to throttle across the load balancer: One solution would be to use some API/storage common to all the API instances.