Let's say I have a large set of users in a queue that I query each user against a rate-limited API periodically. Once all users have been queried, the process is restarted. The rate limit is applied such that I cannot check all users within a reasonable amount of time (> 1 day to check everyone).
Every time I check a user, I am able to check when they were last active. If a user has been actively recently (let's say last few days), then they should have priority over users that have not been active at all (> a year). However, the probability of an inactive user being queried should still be more than 0. Are there any existing research/methods on how to manage this queue efficiently?
Currently what i'm thinking is doing a simple priority queue and have user's initial value be the time they were last active. Any time a user is queried, their position in the queue is replaced with the date they are last active + some random number generated from a distribution so that all users can be checked eventually.
After some thought I decided to use a bayesian model to infer each user's time between events. I assume that each user's amount of activity within a time-period follows a poisson distribution. It follows that the time between each event follows an exponential distribution. For the rate parameter, i assumed it to be gamma distributed. Therefore, the posterior distribution is a lomax distribution. For every user I add to the queue, I sampled from the posterior as their new priority #. When a user has a recent activity, i update their user-specific hyperparameters and then resample a new priority value. This allows me the flexibility to adjust each user's priority based on data as well as set priors for new users that i dont have any information on.