Handle <nobr> tag in python sgmllib

439 views Asked by At

I'm trying to parse a page using my python script. But <nobr> tag along with '&' is giving me trouble. Here the actual html.

<A HREF="http://enpass.in/algo/c12.html" CLASS="style"> <NOBR>Simulation for 1st & 2nd path</NOBR></A>

Now my handle_data function of my parser(using sgmllib) is not able to handle the data properly. Here is the handle_data code.

def handle_data(self, data):
        self.datainfo.append(data)

I expect datainfo array to be have only one element namely "Simulation for 1st & 2nd path"

However, when I print the datainfo array, the actual contents of datainfo array are 7 in number.

datainfo -> ['', '', 'Simulation for 1st', '&', '2nd path', '', '']

Whats happening?

1

There are 1 answers

3
Bjorn On

You need to encode the ampersand, like &amp; to become valid HTML.