How can I scrape the correct number of URLs from an infinite-scroll webpage?

Question

How can I scrape the correct number of URLs from an infinite-scroll webpage?

683 views Asked by Vaibhav Sinha At 18 June 2015 at 16:52

I am trying to scrape URLs from a webpage. I am using this code:

from bs4 import BeautifulSoup

import urllib2 

url = urllib2.urlopen("http://www.barneys.com/barneys-new-york/men/clothing/shirts/dress/classic#sz=176&pageviewchange=true")

content = url.read()
soup = BeautifulSoup(content)

links=soup.find_all("a", {"class": "thumb-link"})

for link in links:

      print (link.get('href'))

But what I'm getting as output is just 48 links instead of 176. What am I doing wrong?

Original Q&A

There are 1 answers

**heinst** · Answer 1 · 2015-06-18T17:28:47+00:00

So what I did is I used Postmans interceptor feature to look at the call the website made each time it loaded the next set of 36 shirts. Then from there replicated the calls in code. You can't dump it all 176 items all at once so I replicated the 36 at a time the website did.

from bs4 import BeautifulSoup
import requests

urls = []

for i in range(1, 5):
    offset = 36 * i
    r = requests.get('http://www.barneys.com/barneys-new-york/men/clothing/shirts/dress/classic?start=1&format=page-element&sz={}&_=1434647715868'.format(offset))
    soup = BeautifulSoup(r.text)

    links = soup.find_all("a", {"class": "thumb-link"})

    for link in links:
        if len(urls) < 176:
            print (link.get('href'))
            urls.append(link.get('href'))

TechQA.

How can I scrape the correct number of URLs from an infinite-scroll webpage?

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in INFINITE-SCROLL

Popular Questions

Popular Tags

Trending Questions