How to use pip to install middleware on Scrapinghub

Question

How to use pip to install middleware on Scrapinghub

374 views Asked by Haha TTpro At 02 September 2017 at 19:01

I have a scrapy project that use middleware install via pip. More specifically scrapy-random-useragent.

Setting file # -- coding: utf-8 --

# Scrapy settings for batdongsan project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#     http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
#     http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'batdongsan'

SPIDER_MODULES = ['batdongsan.spiders']
NEWSPIDER_MODULE = 'batdongsan.spiders'
FEED_EXPORT_ENCODING = 'utf-8' # make output in json become human readable utf-8
CLOSESPIDER_PAGECOUNT = 10 # limit the number of page crawl
LOG_LEVEL = 'INFO' # write less log

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

# Enable or disable downloader middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
    #'batdongsan.middlewares.MyCustomDownloaderMiddleware': 543,
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'random_useragent.RandomUserAgentMiddleware': 400
}
USER_AGENT_LIST = "agents.txt"

The scrapy project run fine on my machine.
I deploy on scrapinghub using linked github project.
I got the error on logs on scrapinghub.

  File "/usr/local/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 168, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 172, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1445, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl
    six.reraise(*exc_info)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 77, in crawl
    self.engine = self._create_engine()
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/usr/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/usr/local/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
exceptions.ImportError: No module named random_useragent

it is clear that the problem is No module named random_useragent.

But I don't know how to install that module via pip on Scrapinghub.

Original Q&A

There are 1 answers

**paul trmbrth** · Answer 1 · 2017-09-04T13:10:56+00:00

When linking GitHub repositories with Python dependencies on Scrapinghub, you'll need to have 2 files at the root of your repository (that is at the same level as your scrapy.cfg file):

scrapinghub.yml
requirements.txt

They should contain the same things as detailed in the shub deploy section from their docs:

scrapinghub.yml:

requirements:
  file: requirements.txt

requirements.txt

scrapy-random-useragent

TechQA.

How to use pip to install middleware on Scrapinghub

There are 1 answers

Related Questions in PYTHON

Related Questions in SCRAPY

Related Questions in PIP

Related Questions in SCRAPINGHUB

Popular Questions

Popular Tags

Trending Questions