Scraping with BeautifulSoup: want to scrape entire column including header and title rows

Question

Scraping with BeautifulSoup: want to scrape entire column including header and title rows

2k views Asked by user131983 At 09 June 2015 at 19:51

I'm trying to get a hold of the data under the columns having the code "SEVNYXX", where "XX" are the numbers that follow (eg. 01, 02, etc) on the site using Python.

With the code below I can get the first row of all the Columns data that I want. However, is there a way I could include the header and row Titles to these?

I know I have the Headers, but I was wondering if there is a way to include these in the data that is outputted? And, also how could I look to include all the rows?

from bs4 import BeautifulSoup
from urllib import request

page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)

desired_table = soup.findAll('table')[2]

# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
    if 'SVENY' in th.string:
        desired_columns.append(headers.index(th))

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    cells= row.findAll('td')
    for column in desired_columns:
        print(cells[column].text)

Original Q&A

There are 1 answers

**double_j** · Accepted Answer · 2015-06-10T05:36:12+00:00

How's this?

I added th.getText() and created a list on the desired columns which pulled the column name, and then added row_name = row.findNext('th').getText() to get the row.

from bs4 import BeautifulSoup
from urllib import request

page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)

desired_table = soup.findAll('table')[2]

# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
    if 'SVENY' in th.string:
        desired_columns.append([headers.index(th), th.getText()])

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    cells = row.findAll('td')
    row_name = row.findNext('th').getText()
    for column in desired_columns:
        print(cells[column[0]].text, row_name, column[1])

TechQA.

Scraping with BeautifulSoup: want to scrape entire column including header and title rows

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Popular Questions

Popular Tags

Trending Questions