regex findall in beautifulsoup -python 3

Question

regex findall in beautifulsoup -python 3

278 views Asked by reuben At 05 January 2017 at 15:53

I need to get the name and value and context ref for all the fields under the tag ix:nonfraction which looks like this:

<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>.

with the output needed as :

TangibleFixedAssets, FY1.end, 238,011

the string that the regex will have to search through contains many of these tags so would there be a way of keeping all the 3 outputs concatenated (or within the same index of the list)?

Original Q&A

There are 1 answers

**宏杰李** · Accepted Answer · 2017-01-05T15:59:18+00:00

import bs4
html = '''<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>'''

soup = bs4.BeautifulSoup(html, 'lxml')

ixs = soup.find_all('ix:nonfraction')
for ix in ixs:
    name = ix['name'].split(':')[-1]
    contextref = ix['contextref']
    text = ix.text
    output = [name, contextref, text]
    print(output)

out:

['TangibleFixedAssets', 'FY1.END', '238,011']

TechQA.

regex findall in beautifulsoup -python 3

There are 1 answers

Related Questions in REGEX

Related Questions in PYTHON-3.X

Related Questions in PARSING

Related Questions in BEAUTIFULSOUP

Related Questions in FINDALL

Popular Questions

Popular Tags

Trending Questions