Why won't this Beautiful Soup code parse the text I am targeting?

53 views Asked by At

I am trying to select the Properties section header in this 10K filing; and once selected from there I intend to to grab the text in that section (i.e. all text between the Properties and Legal Proceedings section headers.

When I run the code below I get the IndexError 'list index out of range' but I don't understand why since the text "PROPERTIES" appears to be within a 'p' tag. I have also tried using 'id="ITEM_2_PROPERTIES"' instead of text= but that didn't work either

Where am I going wrong?

import requests
from bs4 import BeautifulSoup


url = 'https://www.sec.gov/ix?doc=/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

properties_header = soup.find_all('p', text="PROPERTIES")[0]

print(properties_header)
1

There are 1 answers

0
baduker On BEST ANSWER

It's because you're making a request to a JS rendered site, so there's no such p with text PROPERTIES.

However, if you change your target URL, there's one:

import requests
from bs4 import BeautifulSoup


url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

properties_header = soup.find_all('p', text="PROPERTIES")

print(properties_header)

Output:

[<p id="ITEM_2_PROPERTIES" style="margin-bottom:0pt;margin-top:0pt;font-weight:bold;font-style:normal;text-transform:none;font-variant: normal;font-family:Times New Roman;font-size:10pt;">PROPERTIES</p>]

I got the new target URL from the Developer Tool. This comes up when you turn JS back on. So, I guess you should target that URL for your future requests.