HtmlCleaner not processing character references

35 views Asked by At

A project I'm working on is using a very old (2.1-gr12) version of HtmlCleaner to get info out of HTML files. We found out that HtmlCleaner is not decoding character references such as &amp;, neither in HTML element text content, nor in HTML attribute values! For example the test attribute <span test="foo&amp;bar">stuff &amp; more stuff</span> would be interpreted as foo&amp;bar, and the content would be interpreted as stuff &amp; more stuff.

Does HtmlCleaner not support character references? Was this added in a later version, or is there some setting we need to set?

0

There are 0 answers