I am trying to select the Properties section header in this 10K filing; and once selected from there I intend to to grab the text in that section (i.e. all text between the Properties and Legal Proceedings section headers.
When I run the code below I get the IndexError 'list index out of range' but I don't understand why since the text "PROPERTIES" appears to be within a 'p' tag. I have also tried using 'id="ITEM_2_PROPERTIES"' instead of text= but that didn't work either
Where am I going wrong?
import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/ix?doc=/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')
properties_header = soup.find_all('p', text="PROPERTIES")[0]
print(properties_header)
It's because you're making a request to a
JS
rendered site, so there's no suchp
with textPROPERTIES
.However, if you change your target URL, there's one:
Output:
I got the new target URL from the Developer Tool. This comes up when you turn
JS
back on. So, I guess you should target that URL for your future requests.