finding string in beutifulsoup

121 views Asked by At

i'm searching for the text City immediately prior to the tag I want: the city and state string. Here is the html:

<b>City:</b>
  <a href="/city/New-York-New-York.html">New York, NY</a>

here is the code:

zipCode = str(11021)
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
main_body = soup.findAll(text="City:")
print main_body

All I get, however, are empty brackets. How do I search for the City: text and then get the string for the next tag?

2

There are 2 answers

2
Birei On

You can use next_elements from the text node until you find an <a> tag and extract its text:

from bs4 import BeautifulSoup
import sys

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')

for t in soup.find_all(text="City:"):
    print(t)
    for e in t.next_elements:
        if e.name == 'a':
            print(e.string)
            break

Run it like (asumming htmlfile has the test data of the question):

python3 script.py htmlfile

That yields:

City:
New York, NY
0
DBWeinstein On

answers from @Birei and @JohnClements got me most of the way there, but here is code that works for me:

zipCode = str("07928")
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
if soup.findAll(text="City:") ==[]:
    cityNeeded = soup.findAll(text="Cities:")
    for t in cityNeeded:
        print t.find_next('a').string
else:
    cityNeeded = soup.findAll(text="City:")
    for t in cityNeeded:
        print t.find_next('a').string