Java HtmlCleaner: Does not handle extended ascii characters

Question

Java HtmlCleaner: Does not handle extended ascii characters

1.4k views Asked by anahnarciso At 16 May 2012 at 16:38

I'm using HTMLCleaner to clean an HTML file which has characters like '€' (ascii decimal 128), 'TM' (ascii decimal 153), etc. That is, chars from the ASCII extended table.

HTMLCleaner cannot handle those chars and replaces them by character '?' (ascii decimal 63).

Is there any flag I can set in HTMLCleaner in order to process those chars?

Thanks in advance.

EDIT: The variable "encoding" is "iso-8859-1", just like the source file encoding.

    try {
        System.out.print("Parsing and cleaning:" + fileStr);
        URL url = new File(this.fileStr).toURI().toURL();
        // create an instance of HtmlCleaner
        HtmlCleaner cleaner = new HtmlCleaner();
        // default properties
        CleanerProperties props = cleaner.getProperties();
        // do parsing
        TagNode tagNode = new HtmlCleaner(props).clean(url);
        // serialize to XML file
        new PrettyXmlSerializer(props).writeToFile(tagNode, fileStr,
                encoding);
        System.out.println("Output: " + fileStr);
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

I've just figured this out. The line:

TagNode tagNode = new HtmlCleaner(props).clean(url);

Shoube be replaced by:

TagNode tagNode = new HtmlCleaner(props).clean(url, encoding);

Where 'encoding' is the string representation of the charset of the source url.

Thank you!

Original Q&A

There are 1 answers

**Has QUIT--Anony-Mousse** · Accepted Answer · 2012-05-16T16:43:14+00:00

Has QUIT--Anony-Mousse On 16 May 2012 at 16:43 BEST ANSWER

Did you try setting the charset?

TechQA.

Java HtmlCleaner: Does not handle extended ascii characters

There are 1 answers

Related Questions in JAVA

Related Questions in ASCII

Related Questions in EXTENDED-ASCII

Related Questions in HTMLCLEANER

Popular Questions

Trending Questions