I was trying to Web Scraping by the following code:

from bs4 import BeautifulSoup
import requests
import pandas as pd

page = requests.get('https://www.google.com/search?q=phagwara+weather')
soup = BeautifulSoup(page.content, 'html-parser')
day = soup.find(id='wob_wc')

print(day.find_all('span'))

But constantly getting the following error:

 File "C:\Users\myname\Desktop\webscraping.py", line 6, in <module>
    soup = BeautifulSoup(page.content, 'html-parser')
  File "C:\Users\myname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\__init__.py", line 225, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser. Do you need to install a parser library?

I installed lxml and html5lib still this issue is persisting.

3

There are 3 answers

0
αԋɱҽԃ αмєяιcαη On BEST ANSWER

You need to mention the tag, so instead of soup.find(id="wob_wc"), it's should be soup.find("div", id="wob_wc"))

And the parser name is html.parser not html-parser the difference is the dot.

Also by default, Google will give you usually a response of 200 to prevent you from getting to know if you blocked or not. usually you've to check r.content.

I've included the headers and now it's works.

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
r = requests.get(
    "https://www.google.com/search?q=phagwara+weather", headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')

print(soup.find("div", id="wob_wc"))
3
Sohaib On

you need to change 'html-parser' into soup = BeautifulSoup(page.content, 'html.parser')

0
Dmitriy Zub On

Actually, you don't need to iterate over the whole thing: "div #wob_wc", since the current location, weather, date, temperature, precipitation, humidity, and wind consist of one element and don't repeat anywhere else and you can use select() or find() instead.

If you want to iterate over something then iterating over temperature forecast is a good idea, for example:

for forecast in soup.select('.wob_df'):
  high_temp = forecast.select_one('.vk_gy .wob_t:nth-child(1)').text
  low_temp = forecast.select_one('.QrNVmd .wob_t:nth-child(1)').text
  print(f'High: {high_temp}, Low: {low_temp}')

'''
High: 67, Low: 55
High: 65, Low: 56
High: 68, Low: 55
'''

Have a look at the SelectorGadget Chrome extension where you can grab CSS selectors by clicking on the desired element in your browser. CSS selectors reference.

Code and full example in the online IDE:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "phagwara weather",
  "hl": "en",
  "gl": "us"
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text

print(f'Weather condition: {weather_condition}\n'
      f'Tempature: {tempature}°F\n'
      f'Precipitation: {precipitation}\n'
      f'Humidity: {humidity}\n'
      f'Wind speed: {wind}\n'
      f'Current time: {current_time}\n')

for forecast in soup.select('.wob_df'):
  day = forecast.select_one('.QrNVmd').text
  weather = forecast.select_one('img.uW5pk')['alt']
  high_temp = forecast.select_one('.vk_gy .wob_t:nth-child(1)').text
  low_temp = forecast.select_one('.QrNVmd .wob_t:nth-child(1)').text
  print(f'Day: {day}\nWeather: {weather}\nHigh: {high_temp}, Low: {low_temp}\n')

---------
'''
Weather condition: Partly cloudy
Temperature: 87°F
Precipitation: 5%
Humidity: 70%
Wind speed: 4 mph
Current time: Tuesday 4:00 PM

Forcast temperature:
Day: Tue
Weather: Partly cloudy
High: 90, Low: 76
...
'''

Alternatively, you can achieve the same by using Google Direct Answer Box API from SerpApi. It's a paid API with a free plan.

The main difference in your example is that you only need to iterate over already extracted data rather than doing everything from scratch, or figuring out how to bypass blocks from Google.

Code to integrate:

params = {
  "engine": "google",
  "q": "phagwara weather",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
  "gl": "us",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

forecast = results['answer_box']['forecast']

print(f'{loc}\n{weather_date}\n{weather}\n{temp}°F\n{precipitation}\n{humidity}\n{wind}\n')

print(json.dumps(forecast, indent=2))



---------
'''
Phagwara, Punjab, India
Tuesday 4:00 PM
Partly cloudy
87°F
5%
70%
4 mph

[
  {
    "day": "Tuesday",
    "weather": "Partly cloudy",
    "temperature": {
      "high": "90",
      "low": "76"
    },
    "thumbnail": "https://ssl.gstatic.com/onebox/weather/48/partly_cloudy.png"
  }
...
]
'''

Disclaimer, I work for SerpApi.