Beautifulsoup - Extract text from next div sub tag based on previous div sub tag

Question

Beautifulsoup - Extract text from next div sub tag based on previous div sub tag

465 views Asked by Ven At 21 September 2018 at 15:24

I'm trying to extract the data which is in next span of div based on previous div-span text.below is the html content,

<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:37px; top:161px; width:38px; height:13px;"><span style="font-family: b'Times-Bold'; font-size:13px">Name
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:85px; top:161px; width:58px; height:13px;"><span style="font-family: b'Helvetica'; font-size:13px">Ven
    <br></span></div>

I trying to find the text using,

n_field = soup.find('span', text="Name\")

And then trying to get the text from next sibling using,

n_field.next_sibling()

However, due to "\n" in the field, I'm unable to find the span and the extract the next_sibling text.

In short, I'm trying to form a dict in the below format,

{"Name": "Ven"}

Any help or idea on this is appreciated.

Original Q&A

There are 2 answers

jnvilo On 21 September 2018 at 16:02

I had a go at this, and for some reason even after removing the \n, I could not get the nextSibling() so I tried a different tactic as shown below:

from bs4 import BeautifulSoup

"""Lets get rid of the \n""" 
html = """<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:37px; top:161px; width:38px; height:13px;"><span style="font-family: b'Times-Bold'; font-size:13px">Name<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:85px; top:161px; width:58px; height:13px;"><span style="font-family: b'Helvetica'; font-size:13px">Ven<br></span></div>""".replace("\n","")
soup = BeautifulSoup(html)
span_list = soup.findAll("span")
result = {span_list[0].text:span_list[1].text.replace(" ","")}

And that gives result as:

{'Name': 'Ven'}

**dudko** · Accepted Answer · 2018-09-21T15:42:56+00:00

You could use re instead of bs4.

import re

html = """
    <div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:37px; top:161px; width:38px; height:13px;">
        <span style="font-family: b'Times-Bold'; font-size:13px">Name
            <br>
        </span>
    </div>
    <div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:85px; top:161px; width:58px; height:13px;">
        <span style="font-family: b'Helvetica'; font-size:13px">Ven
            <br>
        </span>
    """

mo = re.search(r'(Name).*?<span.*?13px">(.*?)\n', html, re.DOTALL)
print(mo.groups())

# for consecutive cases use re.finditer or re.findall
html *= 5
mo = re.finditer(r'(Name).*?<span.*?13px">(.*?)\n', html, re.DOTALL)

for match in mo:
    print(match.groups())

for (key, value) in re.findall(r'(Name).*?<span.*?13px">(.*?)\n', html, re.DOTALL):
    print(key, value)

TechQA.

Beautifulsoup - Extract text from next div sub tag based on previous div sub tag

There are 2 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in PYTHON-2.7

Related Questions in BEAUTIFULSOUP

Related Questions in PYTHON-BEAUTIFULTABLE

Popular Questions

Trending Questions