JSON.Dump doesn't capture the whole stream

88 views Asked by At

So I have a simple crawler that crawls 3 store location pages and parses the locations of the stores to json. I print(app_data['stores']) and it prints all three pages of stores. However, when I try to write it out I only get one of the three pages, at random, written to my json file. I'd like everything that streams to be written to the file. Any help would be great. Here's the code:

import scrapy
import json
import js2xml

from pprint import pprint

class StlocSpider(scrapy.Spider):
    name = "stloc"
    allowed_domains = ["bestbuy.com"]
    start_urls = (
        'http://www.bestbuy.com/site/store-locator/11356',
        'http://www.bestbuy.com/site/store-locator/46617',
        'http://www.bestbuy.com/site/store-locator/77521'
    )

    def parse(self, response):
        js = response.xpath('//script[contains(.,"window.appData")]/text()').extract_first()
        jstree = js2xml.parse(js)
        # print(js2xml.pretty_print(jstree))

        app_data_node = jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]
        app_data = js2xml.make_dict(app_data_node)
        print(app_data['stores'])

        for store in app_data['stores']:
            yield store

        with open('stores.json', 'w') as f:
            json.dump(app_data['stores'], f, indent=4)
1

There are 1 answers

0
pault On BEST ANSWER

You are opening the file for writing every time, but you want to append. Try changing the last part to this:

with open('stores.json', 'a') as f:
    json.dump(app_data['stores'], f, indent=4)

Where 'a' opens the file for appending.