Python Beautiful Soup Table Data Scraping Specific TD Tags

2.2k views Asked by At

This webpage has multiple tables on it: http://www.nfl.com/player/tombrady/2504211/gamelogs .

Within the HTML all of the tables are labeled the exact same:

<table class="data-table1" width="100%" border="0" summary="Game Logs For Tom Brady In 2014">

I can scrape data from only the first table (Preseason table) but I do not know how to skip the first table (Preseason) and scrape data from the second and third tables (Regular Season and Post Season).

I'm trying to scrape specific numbers.

My code:

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

year = '2014'
lastWeek = '2'
favQB1 = "Tom Brady"

favQBurl2 = 'http://www.nfl.com/player/tombrady/2504211/gamelogs'
favQBhtml2 = urlopen(favQBurl2).read()
favQBsoup2 = BeautifulSoup(favQBhtml2)
favQBpass2 = favQBsoup2.find("table", { "summary" : "Game Logs For %s In %s" % (favQB1, year)})
favQBrows2 = []

for row in favQBpass2.findAll("tr"):
    if lastWeek in row.findNext('td'):  
        for item in row.findAll("td"):
            favQBrows2.append(item.text)
print ("Enter: Starting Quarterback QB Rating of Favored Team for the last game played (regular season): "),
print favQBrows2[15]
2

There are 2 answers

3
Vikas Ojha On BEST ANSWER

Following should work as well -

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

year = '2014'
lastWeek = '2'
favQB1 = "Tom Brady"

favQBurl2 = 'http://www.nfl.com/player/tombrady/2504211/gamelogs'
favQBhtml2 = urlopen(favQBurl2).read()
favQBsoup2 = BeautifulSoup(favQBhtml2)
favQBpass2 = favQBsoup2.find_all("table", { "summary" : "Game Logs For %s In %s" % (favQB1, year)})[1]
favQBrows2 = []

for row in favQBpass2.findAll("tr"):
    if lastWeek in row.findNext('td'):
        for item in row.findAll("td"):
            favQBrows2.append(item.text)
print ("Enter: Starting Quarterback QB Rating of Favored Team for the last game played (regular season): "),
print favQBrows2[15]
0
alecxe On

Rely on the table title, which is located in the td element in the first table row:

def find_table(soup, label):
    return soup.find("td", text=label).find_parent("table", summary=True)

Usage:

find_table(soup, "Preseason")
find_table(soup, "Regular Season")
find_table(soup, "Postseason")

FYI, find_parent() documentation reference.