Elixir: GenStage topology

190 views Asked by At

For self-educational purposes I try to learn Elixir and to wrap my head around GenStage library.

I read the documentation and got it for the most part, however I have a couple of questions for my particular domain.

I try to build a web-scraper, which should start several times per day and do some scraping and post-processing.

First question

So, my topmost producer is a Stage that makes HTTP requests and hands them down to consumers.

How do I handle "wait 6 hours" here?

Should I just accept demands, but send empty events to consumers? That sounds like a waste of CPU cycles.

Maybe, GenStage is not a right approach for this kind of events?

Second question

Sometimes I need to return an event back to the chain.

  • ProducerConsumerA loads page #n
  • ProducerConsumerB parses page and emits events for items found on the page to the next Consumers. But it also should send an event for ProducerConsumerA for the next page (if the results are paginated)
1

There are 1 answers

0
Steve Pallen On

I personally feel that GenStage may be a little overkill for what your trying to do. Especially if your only scraping every 6 hours. Furthermore, if your just learning Elixir, you may what to start with a more basic approach. You can always refactor later if you feel you need more flow control.

I would create one main GenServer that does the scraping serially. You can start it with the list of sites to scrape, and have it work though the list. Once a site have been fetched, you could start a Task to process the data and then fetch the next site, starting a new processing Task each time.

When your done scrapping all the sites, you can use Process.send_after to send a wakeup message that will start the next cycle of scraping.

This will give you a good understanding of Elixir and OTP. Once you have this all working, you can then investigate GenStage further.