Python Convert string to HTML char equivalent

5.6k views Asked by At

Let's say we have a string

Bruce Wayne is Batman

When I convert this string to HTML characters, the output will be

Bruce Wayne is Batman

I am trying to find a way to do this in python 2.7. Can anybody suggest how it can be done?

I have searched all over the stackoverflow and all the answers I have found are how to escape HTML special characters. I am not looking to escape special characters, rather convert any and all the strings into their HTML char equivalent. The hackbar addon of mozila firefox is able to do it successfully, I want to implement the same using python.

Also the HTMLParse library is able to decode it successfully using the unescape() module. Is there a library in python to encode it the way it's mentioned above? Also I am not looking for external libraries like BeautifulSoup, rather an inbuilt library, as it will add no dependencies to the tool.

1

There are 1 answers

1
Zero Piraeus On BEST ANSWER

To the best of my knowledge there's nothing in the standard library to do this (encoding every character as its entity reference is not a common thing to need to do), but a function to do the conversion is straightforward:

def entitify(text):
    return ''.join('&#%d;' % ord(c) for c in text)

>>> entitify('Bruce Wayne is Batman')
'Bruce Wayne is Batman'

>>> entitify(u'Rinôçérôse')
'Rinôçérôse'

The function simply uses the ord() builtin to get the byte value or code point of each character, wraps it in the &#...;, then joins the results together.