Scrapy on AWS EC2 : where to write the items?

1.1k views Asked by At

I have a working spider on my local machine, which writes items to a local postgres database.

I am now trying to run the same spider through scrapyd on an EC2 instance. This obviously won't work, because the code (models, pipelines, settings files) refers to a database on my local machine.

Which adaptations should I implement to make this work ?

1

There are 1 answers

2
S Leon On BEST ANSWER

Found it, answer was easier than i thought. In the settings.py file, delete the settings for ITEM_PIPELINES and DATABASE. After deletion, deploy the project through scrapyd on EC2.

By default, items will now be written as JSON-lines. This can be overridden with FEED_FORMAT and FEED_URI :

sudo curl http:/xxxxxxxxx.us-west-2.compute.amazonaws.com:6800/schedule.json -d project=xxxxxxxxxx -d spider=xxxxxxxxx -d setting=FEED_URI=/var/lib/scrapyd/items/xxxxxxxxxx.csv -d setting=FEED_FORMAT=csv