I am just exploring scrapy with splash and I am trying to scrape all the product (pants) data with productid,name and price from one of the e-commerce site gap but I didn't see all the dynamic product data loaded when I see from splash web UI splash web UI (only 16 items are loading though for every request - no clue why) I tried with the following options but no luck
- Increasing wait time upto 20 sec
- By starting the docker with "--disable-private-mode"
- By using lua_script for page scrolling
- With view report full option splash:set_viewport_full()
lua_script2 = """ function main(splash)
local num_scrolls = 10
local scroll_delay = 2.0
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)
for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
return splash:html()
end"""
yield SplashRequest(
url,
self.parse_product_contents,
endpoint='execute',
args={
'lua_source': lua_script2,
'wait': 5,
}
)
Can anyone please shed some light on this behavior? p.s : I am using scrapy framework and I am able to parse the product information (itemid,name and price) from the render.html (but render.html has only 16 items information)
I updated the script to below
And ran it in my local splash, the png doesn't work fine but the HTML has the last product
The only issue was when the email subscribe popup is there it won't scroll, so I added code to close it