Scrapy on AWS EC2 : where to write the items?

Question

Scrapy on AWS EC2 : where to write the items?

1.2k views Asked by S Leon At 18 November 2014 at 23:25

I have a working spider on my local machine, which writes items to a local postgres database.

I am now trying to run the same spider through scrapyd on an EC2 instance. This obviously won't work, because the code (models, pipelines, settings files) refers to a database on my local machine.

Which adaptations should I implement to make this work ?

Original Q&A

There are 1 answers

**S Leon** · Accepted Answer · 2014-11-19T03:22:36+00:00

Found it, answer was easier than i thought. In the settings.py file, delete the settings for ITEM_PIPELINES and DATABASE. After deletion, deploy the project through scrapyd on EC2.

By default, items will now be written as JSON-lines. This can be overridden with FEED_FORMAT and FEED_URI :

sudo curl http:/xxxxxxxxx.us-west-2.compute.amazonaws.com:6800/schedule.json -d project=xxxxxxxxxx -d spider=xxxxxxxxx -d setting=FEED_URI=/var/lib/scrapyd/items/xxxxxxxxxx.csv -d setting=FEED_FORMAT=csv

TechQA.

Scrapy on AWS EC2 : where to write the items?

There are 1 answers

Related Questions in AMAZON-EC2

Related Questions in SCRAPY

Related Questions in SCRAPYD

Popular Questions

Popular Tags

Trending Questions