BeautifulSoup IndexError

126 views Asked by At

I'm trying to scrape the text from the "Industry" row of this table:

<tr>
    <th style="padding-right: 0.5em;" scope="row">Industry</th>
    <td class="category" style="line-height: 1.35em;">
        <a title="Professional Services" href="/wiki/Professional_services">Professional services</a>
        <br></br>
        <a title="Technology Services" href="/wiki/Technology_services">Technology services</a>
    </td>
</tr>

My python code is as follows (r being the table variable):

industry = r.find('th', text = 'Industry').findNext('td').find_all('a')[0].get_text() print industry

The first one "Professional services" gets printed but then I get the error:

IndexError: list index out of range

1

There are 1 answers

0
alecxe On

From what I can reproduce, it could be because of the differences between parsers:

>>> soup = BeautifulSoup(data, "html.parser")
>>> soup.find('th', text='Industry').findNext('td').find_all('a')[0].get_text()
u'Professional services'
>>> 
>>> soup = BeautifulSoup(data, "lxml")
>>> soup.find('th', text='Industry').findNext('td').find_all('a')[0].get_text()
u'Professional services'
>>> 
>>> soup = BeautifulSoup(data, "html5lib")
>>> soup.find('th', text='Industry').findNext('td').find_all('a')[0].get_text()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'findNext'

Note that I wasn't able to reproduce the exact same error as you've got.