Hey guys following problem. I want to scrap data from a website. But there are 2 issues:
I have setup to check pricing. That works very well but it does only work for page 1 and 15. But I want all from 1-15 like 1,2,3,4,5 etc.
I have the problem that the product title is named as div class title How could I grep that data? Because there are also many other titles. I Only want the name of the whisky.
Some code:
from lxml import html
import requests
urls = ['http://whiskey.de/shop/Aktuell/']
for url in urls:
for number in range(1,15):
page = requests.get(url+str(number))
tree = html.fromstring(page.text)
prices = tree.xpath('//div[@class="price "]/text()')
names = tree.xpath('//div[@class="column-inner infos"]/text()')
print 'Whiskey Preis: ', prices
print 'Whiskey Names: ', names
The site I want to scrape is this.
Here are the things I would fix/improve:
whisky.de/shop/Aktuell/1
for the page number 1 would not work, instead don't specify the page number:whisky.de/shop/Aktuell/
The code with the applied improvements: