Python Web Scraping title in a special div & Page 1 + 15

497 views Asked by At

Hey guys following problem. I want to scrap data from a website. But there are 2 issues:

  1. I have setup to check pricing. That works very well but it does only work for page 1 and 15. But I want all from 1-15 like 1,2,3,4,5 etc.

  2. I have the problem that the product title is named as div class title How could I grep that data? Because there are also many other titles. I Only want the name of the whisky.

Some code:

from lxml import html
import requests

urls = ['http://whiskey.de/shop/Aktuell/']

for url in urls:
    for number in range(1,15):
        page = requests.get(url+str(number))

tree = html.fromstring(page.text)

prices = tree.xpath('//div[@class="price "]/text()')
names = tree.xpath('//div[@class="column-inner infos"]/text()')

print 'Whiskey Preis: ', prices
print 'Whiskey Names: ', names

The site I want to scrape is this.

1

There are 1 answers

4
alecxe On BEST ANSWER

Here are the things I would fix/improve:

  • the code is not properly indented, you need to move the HTML-parsing code into the loop body
  • a url whisky.de/shop/Aktuell/1 for the page number 1 would not work, instead don't specify the page number: whisky.de/shop/Aktuell/
  • to get the prices and titles I would use CSS selectors (you can continue using XPath expressions, there is no problem with that, it's just for the sake of an example and to learn something new)

The code with the applied improvements:

from lxml import html
import requests


urls = ['http://whiskey.de/shop/Aktuell/']

for url in urls:
    for number in range(1, 15):
        page_url = url + str(number) if number > 1 else url
        page = requests.get(page_url)

        tree = html.fromstring(page.text)

        prices = tree.cssselect('div#content div.price')
        names = tree.cssselect('div#content div.title a')

        print 'Whiskey Preis: ', [price.text for price in prices]
        print 'Whiskey Names: ', [name.text for name in names]