Why isn't this Beautiful Soup code getting the targeted data?

Question

Why isn't this Beautiful Soup code getting the targeted data?

92 views Asked by cfadr2021 At 21 October 2020 at 17:29

I am trying to use Beautiful Soup to grab the text in the Properties section of a 10K SEC filing on EDGAR.

I can get the Properties section header okay and work my way up the parent nodes but from there the next_sibling method is not identifying the next sibling (which in this case I believe contains the first paragraph of text in the section). Can someone tell me why this is not working / how to fix?

Code:

import requests
from bs4 import BeautifulSoup

url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

properties_header = soup.find_all('p', text="PROPERTIES")[0]

print(properties_header.parent.parent.parent.parent.next_sibling)

Expected Result:

<p style="margin-top:4pt;margin-bottom:0pt;text-indent:5.24%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">We are headquartered in Palo Alto, California. Our principal facilities include a large number of properties in North America, Europe and Asia utilized for manufacturing and assembly, warehousing, engineering, retail and service locations, Supercharger sites, and administrative and sales offices. Our facilities are used to support both of our reporting segments, and are suitable and adequate for the conduct of our business. We primarily lease such facilities with the exception of some manufacturing facilities. The following table sets forth the location of our primary owned and leased manufacturing facilities.</p>

Original Q&A

There are 1 answers

**AudioBubble** · Accepted Answer · 2020-10-21T18:01:32+00:00

AudioBubble On 21 October 2020 at 18:01 BEST ANSWER

The first next_sibling is a NavigableString. Double-up on the next_sibling to get to the following p.

print(properties_header.parent.parent.parent.parent.next_sibling.next_sibling)

TechQA.

Why isn't this Beautiful Soup code getting the targeted data?

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in EDGAR

Popular Questions

Trending Questions