For self-educational purposes I try to learn Elixir and to wrap my head around GenStage library.
I read the documentation and got it for the most part, however I have a couple of questions for my particular domain.
I try to build a web-scraper, which should start several times per day and do some scraping and post-processing.
First question
So, my topmost producer is a Stage that makes HTTP requests and hands them down to consumers.
How do I handle "wait 6 hours" here?
Should I just accept demands, but send empty events to consumers? That sounds like a waste of CPU cycles.
Maybe, GenStage is not a right approach for this kind of events?
Second question
Sometimes I need to return an event back to the chain.
ProducerConsumerAloads page #nProducerConsumerBparses page and emits events for items found on the page to the next Consumers. But it also should send an event forProducerConsumerAfor the next page (if the results are paginated)
I personally feel that GenStage may be a little overkill for what your trying to do. Especially if your only scraping every 6 hours. Furthermore, if your just learning Elixir, you may what to start with a more basic approach. You can always refactor later if you feel you need more flow control.
I would create one main GenServer that does the scraping serially. You can start it with the list of sites to scrape, and have it work though the list. Once a site have been fetched, you could start a
Taskto process the data and then fetch the next site, starting a new processingTaskeach time.When your done scrapping all the sites, you can use
Process.send_afterto send a wakeup message that will start the next cycle of scraping.This will give you a good understanding of Elixir and OTP. Once you have this all working, you can then investigate GenStage further.