Using BeautifulSoup and python regexp to search html for string and add some tags

648 views Asked by At

I am using BeautifulSoup to look for user entered word on a specific page, and highlight all this word. For example, I want to highlight the all words 'Finance' which located on the page 'https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ'.

#!/usr/bin/python
# charset=utf-8

import urllib2
import re
from bs4 import BeautifulSoup

html = urllib2.urlopen('https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ').read()
soup = BeautifulSoup(html)

matches = soup.body(text='Finance')
for match in matches:
    match.wrap(soup.new_tag('span', style="background-color:#FE00FE"))
print soup
1

There are 1 answers

0
user2546252 On

I found this variant of regex for word highlighting. But result document contain broken javascript

import urllib2
import re
from bs4 import BeautifulSoup

html = urllib2.urlopen('https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ').read()
soup = BeautifulSoup(html)

for text in soup.body.findAll(text=True):
    if re.search(r'inance\b',text):
        new_html = "<p>"+re.sub(r'(\w*)inance\b', r'<span style="background-color:#FF00FF">\1inance</span>', text)+"</p>"
        new_soup = BeautifulSoup(new_html)
        text.parent.replace_with(new_soup.p)
print soup