Scrape Price Title Image of product from website

1.6k views Asked by At

Scraping is a new topic for me and I am struggling to understand it. Note: I am using wordpress.

For example, say for my Shoes Blog I want to display a FootLocker product by scraping. How would I extract price, title, and image of a product from Footlocker?

From my research, it seems like DOMdocument in PHP or Python BeautifulSoup may be used for this purpose but I am unsure. For my situation (extracting price, title, image) which method will work?

Will DOMdocument work for this? Really need some guidance.

EDIT

Here is the specimen html

PRODUCT TITLE

<div class="title" data-info="product_title">
<h1 tabindex="698">Jordan Flight Origin 2 - Men's</h1>
</div>

PRODUCT PRICE

<div class="regular_price">
<span class="label" tabindex="-1"></span>
<span class="value">$114.99</span>
</div>

PRODUCT IMAGE

<div class="regular_price">
<span class="label" tabindex="-1"></span>
<span class="value">$114.99</span>
</div>

PRODUCT URL

http://www.footlocker.com/product/model:234353/sku:05155015/jordan-flight-origin-2-mens/grey/multicolor/?cm=newarrivalsshoessupercat

1

There are 1 answers

2
Tarun Venugopal Nair On
import urllib2
import re
result = []
response = urllib2.urlopen('http://www.footlocker.com/product/model:234353/sku:05155015/jordan-flight-origin-2-mens/grey/multicolor/?cm=newarrivalsshoessupercat')
html = response.read()
m = re.search('<link rel="image_src" href="(.+?)"', html)
if m:
    image = m.group(1)
result.append(image)
m = re.search('<meta name="title" content="(.+?)"', html)
if m:
    product = m.group(1)
result.append(product)
print result

I have not used BeautifulSoup and just written a simple code so as to get your work done....hope it works fine, let me know about the changes if required...frankly speaking i never thought of time complexity issues related to BeautifulSoup.