I am trying to scrape data from Yellow Pages. I have used this scraper successfully several times, but it has recently stopped working. I noticed a recent change on the Yellow Pages website where they have added a Sponsored Links table that contains three results. Since this change, the only thing my scraper picks up is the advertisement below this Sponsored Links table. It does not retrieve any of the results.
Where am I going wrong on this?
I have included my code below. As an example, it shows a search for 7 Eleven locations in Wisconsin.
import requests
from bs4 import BeautifulSoup
import csv
my_url = "https://www.yellowpages.com/search?search_terms=7-eleven&geo_location_terms=WI&page={}"
for link in [my_url.format(page) for page in range(1,20)]:
res = requests.get(link)
soup = BeautifulSoup(res.text, "lxml")
placeHolder = []
for item in soup.select(".info"):
try:
name = item.select("[itemprop='name']")[0].text
except Exception:
name = ""
try:
streetAddress = item.select("[itemprop='streetAddress']")[0].text
except Exception:
streetAddress = ""
try:
addressLocality = item.select("[itemprop='addressLocality']")[0].text
except Exception:
addressLocality = ""
try:
addressRegion = item.select("[itemprop='addressRegion']")[0].text
except Exception:
addressRegion = ""
try:
postalCode = item.select("[itemprop='postalCode']")[0].text
except Exception:
postalCode = ""
try:
phone = item.select("[itemprop='telephone']")[0].text
except Exception:
phone = ""
with open('yp-7-eleven-wi.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerow([name, streetAddress, addressLocality, addressRegion, postalCode, phone])
The Scraping Life... the struggle is real!
A quick inspection of the page shows that the information you're scraping is housed in a different structure:
For example, for address, rather than
itemprop=address
you'd need.street-address
, and so on.For the nested example of Locality, use built-in selectors which mimic
CSS
style selectors.In summary:
Fix those hard coded values to match the new class names and you should be back in business.