Grabbing text data from Baseball-reference Python

1.7k views Asked by At

http://www.baseball-reference.com/players/split.cgi?id=aardsda01&year=2015&t=p

I would like to get the data of what arm this pitcher pitches with. If it were a table i would be able to grab the data but I dont know how to get the text.

David Aardsma    \ARDS-mah\

David Allan Aardsma (twitter: @TheDA53)

Position: Pitcher
Bats: Right, Throws: Right 
Height: 6' 3", Weight: 220 lb.

The text looks like this. I would like to get everything after Throws:.

1

There are 1 answers

2
alecxe On BEST ANSWER

If you were to solve it with BeautifulSoup, you would find the b tag by text Throws: and get the following sibling:

>>> from urllib2 import urlopen
>>> from bs4 import BeautifulSoup
>>>
>>> url = "http://www.baseball-reference.com/players/split.cgi?id=aardsda01&year=2015&t=p"
>>> soup = BeautifulSoup(urlopen(url))
>>> soup.find("b", text='Throws:').next_sibling.strip()
u'Right'