JTidy issue with numbered list item

95 views Asked by At

I am facing a weird problem with Numbered List Item while generating pdf using IText. The serial number of the item list is not incremented by one when a <br/> tag is appended. consider the following example:

String withoutBrTag = "<html><head></head><body><p>this is order list</p>" +
                "        <br>" +
                "        <ol>" +
                "          <li>lafjad</li>" +
                "          <li>alsdfkjla </li>" +
                "          <li>asdflkjadslfkj</li>" +
                "        </ol>" +
                "        <br>" +
                "        <p>list item ended</p>&nbsp;"+
                "</body></html>";

String wihBrTag = "<html><head></head><body><p>this is order list</p>" +
                "        <br>" +
                "        <ol>" +
                "          <br>" +
                "          <li>lafjad</li>" +
                "          <br>" +
                "          <li>alsdfkjla </li>" +
                "          <br>" +
                "          <li>asdflkjadslfkj</li>" +
                "          <br>" +
                "        </ol>" +
                "        <br>" +
                "        <p>list item ended</p>&nbsp;"+
                "</body></html>";

Tidy tidy = new Tidy();
tidy.setXmlOut(true);
tidy.setQuiet(true);
tidy.setShowWarnings(false);

OutputStream outputStream = new FileOutputStream("test1.pdf");

ByteArrayInputStream inputStream = new ByteArrayInputStream(withBrTag.getBytes());

Document doc = tidy.parseDOM(inputStream, null);
inputStream.close();

ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(outputStream);

For the String withBrTag the output is:

this is order list

2. list item one
4. list item two
6. list item three

list item ended

Note the numbering - 2, 4, 6! Each <br/> tag is parsed as <li> node while performing tidy.parseDOM(inputStream, null). Therefore the numbering changes. TIDY parsed the html content in a wrong way that leads to the numbering issue.

But if I use the String withoutBrTag the generated output is as expected.

this is order list

1. list item one
2. list item two
3. list item three

list item ended

Can anyone explain why <br/> tag is considered as a </li> tag and how can this be solved?

NOTE 1: not only for <br/> tag the numbering changes but also for any of the html tag like - <p>, <i>, <hr/> tags. That means if any tag is added right before or after a <li> tag, it will affect the numbering.

NOTE 2: itextpdf-2.0.1 is used.

0

There are 0 answers