Why scrapy shell did not return an output?

26 views Asked by At

I followed this tutorial because I wanted to learn web scraping. https://www.datacamp.com/tutorial/making-web-crawlers-scrapy-python

When I got to the point of using CSS Selectors for Extraction, I keyed in this code:

response.css(".product::text").extract_first()

as the tutorial told me to.

Instead of getting a response object, I received a blank output.

What went wrong? Thanks in advance!

I tried printing it too, but the output was None. print(response.css(".product::text").extract_first())

EDIT: This is what I get on my screen

2024-03-25 23:34:20 [scrapy.utils.log] INFO: Scrapy 2.11.1 started (bot: scrapybot)
2024-03-25 23:34:20 [scrapy.utils.log] INFO: Versions: lxml 4.9.3.0, libxml2 2.10.4, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 23.10.0, Python 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 24.0.0 (OpenSSL 3.0.13 30 Jan 2024), cryptography 42.0.5, Platform Windows-10-10.0.19045-SP0
2024-03-25 23:34:20 [scrapy.addons] INFO: Enabled addons:
[]
2024-03-25 23:34:20 [py.warnings] WARNING: C:\Users\user\anaconda3\Lib\site-packages\scrapy\utils\request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
  return cls(crawler)

2024-03-25 23:34:20 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2024-03-25 23:34:20 [scrapy.extensions.telnet] INFO: Telnet Password: 159a778fd19a4980
2024-03-25 23:34:20 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole']
2024-03-25 23:34:20 [scrapy.crawler] INFO: Overridden settings:
{'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'LOGSTATS_INTERVAL': 0}
2024-03-25 23:34:20 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-03-25 23:34:20 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-03-25 23:34:20 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-03-25 23:34:20 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x000001F203D63210>
[s]   item       {}
[s]   settings   <scrapy.settings.Settings object at 0x000001F204C42710>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
2024-03-25 23:34:21 [asyncio] DEBUG: Using proactor: IocpProactor
In [1]: fetch("https://pt.aliexpress.com/category/201005406/special-store.html")
2024-03-25 23:34:25 [scrapy.core.engine] INFO: Spider opened
2024-03-25 23:34:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://pt.aliexpress.com/category/201005406/special-store.html> (referer: None)

2024-03-25 23:34:27 [asyncio] DEBUG: Using proactor: IocpProactor
In [2]: response.css(".product::text").get()

2024-03-25 23:35:04 [asyncio] DEBUG: Using proactor: IocpProactor
In [3]:
1

There are 1 answers

2
SuperUser On

This is probably just an outdated tutorial (as it uses extract_first). The selectors in the tutorial are not correct (anymore).

You need to find the correct selectors (use your browser's devtools like the tutorial do), for example:

In [6]: response.css(".card--out-wrapper  h3::text").get()
Out[6]: 'Magcubic-Projetor de Cinema Portátil, 4K, Android 11, 1080P nativo, 390ANSI, HY320, Dual Wifi6, BT5.0, 1920*1080P, importado, HY300'