Python Beautiful Soup Web Scraping Specific Numbers

864 views Asked by At

On this page the final score (number) of each team has the same class name class="finalScore".

When I call the final score of the away team (on top) the code calls that number without a problem. If ... favLastGM = 'A'

When I try to call the final score of the home team (on bottom) the code gives me an error. If ... favLastGM = 'H'

Below is my code:

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

#Last Two Game info Home [H] or Away [A]
favLastGM = 'A' #Higher week number 2

#Game Info (Favorite) Last Game Played - CBS Sports (Change Every Week)
favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
favPrevGMInfoHtml = urlopen(favPrevGMInfoUrl).read()
favPrevGMInfoSoup = BeautifulSoup(favPrevGMInfoHtml)
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })
elif favLastGM == 'H':
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
else:
    print("***************************************************")
    print("NOT A VALID ENTRY - favLastGM  !")
    print("***************************************************")


print ("Enter: Total Points Allowed from Favored Team Defense for last game played: "),
print favScore[0].text

This is the error I get if favLastGM = 'H'

Traceback (most recent call last): File "C:/Users/jcmcdonald/Desktop/FinalScoreTest.py", line 26, in print favScore[0].text File "C:\Python27\lib\site-packages\bs4\element.py", line 905, in getitem return self.attrs[key] KeyError: 0

3

There are 3 answers

5
alecxe On BEST ANSWER

There are just two elements with class="finalScore", the first is the score of the home team, the second is the score of the away team:

>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>> 
>>> favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
>>> 
>>> favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))
>>> score = [item.get_text() for item in favPrevGMInfoSoup.find_all("td", {"class": "finalScore"})]
>>> score
[u'30', u'7']

FYI, instead of .find_all("td", {"class": "finalScore"}), you can use a CSS selector: .select("td.finalScore").

0
mkkane On

In your code you are assigning different types of objects to favScore. So in the first case, where you have:

if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })

You end up with a list...

faveScore = [<td class="finalScore">30</td>, <td class="finalScore">7</td>]

Whereas in the second case, where you have:

elif favLastGM == 'H':
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]

You end up with a BeautfulSoup element...

favScore = <td class="finalScore">7</td>

You could fix this by doing (note the [0]):

if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[0]
elif favLastGM == 'H':
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]

And then at the end do:

print favScore.text
0
jorgeh On

I slight extension to @alecxe's answer, with explicit selection of home and away teams (instead on relying on the implicit ordering of the array):

from urllib import urlopen
from bs4 import BeautifulSoup

favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'

favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))

home_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo homeTeam"}).find("td", {"class": "finalScore"}).get_text()
away_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo awayTeam"}).find("td", {"class": "finalScore"}).get_text()

print home_score, away_score