Since the application from which I require most updated data every time does not send change-notifications, I am required to poll it every second to get an update. Since there are thousands of items for which I need to get an update every second, I thought of designing an application with thousands of pollers [of course, I would welcome your suggestion for a better solution].
Got to know from this post that Java VM on each server can support a lot of threads. Just to mention that I'm not bound by a language.
Now, I'm trying to figure out how to make it scalable and make it work in distributed environment.
One idea is to create a master server having the list of thousands of items to poll. It passes a few of them to slave servers for polling while getting heartbeats from them. The other one is about servers talking to each other sharing index-ranges (probably for a file in S3) with each other about who's working on what items. I'm not even sure if any of them even work.
I couldn't find any frameworks that could help me with it. Or as a newbie, probably I don't know what to look for.
What would be your suggestion? Any pointers would help. Really appreciate it.
Polling does not scale beyond a certain point, where you need to scale out (not up) to multiple consumers. you will have to consider a reactive (pub-sub) model, which allow you to add more consumers seamlessly to your application when you need more throughput.
your master server can publish events on a state change and while consumers are listening to those events, they can pick an events and process it if they are interested. Consumers can filter their subscriptions to only receive notifications for messages they are interested in only.
Have a look at something like RabbitMQ as an underlying message routing service.