Using Google Apps Script, I would like to decode HTML, so that e.g.:
Some text & text <br/> ¢
is stored as:
Some text & text
ยข
So, similar question as: How to decode HTML entities
Posting as new question because the answer does not work when using HTML entity names and because the supported GAS service has changed since.
I use:
var str = 'Some text & text <br/> ¢';
var xml = XmlService.parse('<d>' + str + '</d>');
var strDecoded = xml.getRootElement().getText();
Logger.log(strDecoded);
The GAS error message when parsing:
TypeError: The entity "cent" was referenced, but not declared.
I am using ¢
as an example, I tested several other HTML entity names, all with same result.
When I use the entity decimal code instead of the HTML entity name it works fine (in this case: ¢
instead of ¢
). Same effect with the old GAS services.
Any solution that can parse the above HTML in GAS is appreciated.
It appears to be a known issue: https://code.google.com/p/google-apps-script-issues/issues/detail?id=3565
To avoid the error you can prepend the doctype to the string, but note that this will filter out the HTML entities:
Workarounds are still welcome. At the moment I manually convert some of the frequently used HTML entity names to the decimal equivalent before parsing.