How to handle regex in BeautifulSoup / CSS selector?

44 views Asked by At

I'm looking for a solution to use regex in BeautifulSoup to find elements that may contain the text HO # with possible spaces and ignoring case sensitivity.

check_ho_number3 = soup.select_one('td:-soup-contains("HO #")+ td')
print(check_ho_number3)

How can I integrate my existing regex expression into soup.select_one()?

check_regex = re.compile("HO\s?#",re.IGNORECASE)
1

There are 1 answers

0
HedgeHog On BEST ANSWER

CSS SELECTORS only supports CSS syntax but you can search by tag content with the string property:

soup.find(string=re.compile(r"HO\s?#",re.IGNORECASE))
soup.find_all(string=re.compile(r"HO\s?#",re.IGNORECASE))

To findNextSibling('td') step back to the parent of your located object.

Example
from bs4 import BeautifulSoup
import re

soup = BeautifulSoup('<table><tr><td>HO #</td><td>I am the next sibling</td></tr><tr><td>HO #</td></tr><tr><td>ho#</td><td>I am the next sibling</td></tr></table>')

for e in soup.find_all(string=re.compile(r"HO\s?#",re.IGNORECASE)):
    print(e)
    print(e.parent.findNextSibling('td'))

HO #
<td>I am the next sibling</td>
HO #
None
ho#
<td>I am the next sibling</td>