I'm having issue while deploying scrapper to Zyte formerly (Scraping hub)

139 views Asked by At

My spider has to read some data from input.csv file. It runs fine locally. But when I try to deploy it on Zyte by shub deploy it does not includes input.csv in build.

So when I try to run it on the server it produces following error.

Traceback (most recent call last):
  File "<frozen zipimport>", line 177, in get_data
KeyError: 'webscrap/resources/input.csv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/scrapy/core/engine.py", line 127, in _next_request
    request = next(slot.start_requests)
  File "/app/__main__.egg/webscrap/spiders/website_scraper.py", line 13, in start_requests
    zipcodes_csv = pkgutil.get_data("webscrap", "resources/input.csv")
  File "/usr/local/lib/python3.8/pkgutil.py", line 637, in get_data
    return loader.get_data(resource_name)
  File "<frozen zipimport>", line 179, in get_data
OSError: [Errno 0] : 'webscrap/resources/input.csv'

Here is my code

        zipcodes_csv = pkgutil.get_data("webscrap", "resources/input.csv")
        with io.TextIOWrapper(io.BytesIO(zipcodes_csv), encoding='utf-8') as file:
            csvreader = csv.DictReader(file)

Here is setup.py file

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = webscrap.settings']},
    package_data={
        'project': ['resources/*.csv']
    },
    include_package_data=True,
)

Here is the directory structure of my project

1

There are 1 answers

0
Muhammad Ahmad On

Fixed it by changing setup.py file to

setup(
name         = 'webscrap',
version      = '2.0',
packages     = find_packages(),
entry_points = {'scrapy': ['settings = webscrap.settings']},
package_data={
    'webscrap': ['resources/*.csv']
},
include_package_data=True,

)

and solved some dependency issues in requirements.txt and added it in scrapinghub.yml file