PDF generation from HTML having multilingual text using flying-saucer+iText, Only Chinese fonts are working

5.5k views Asked by At

I am trying to convert a html page into pdf using iText and flying-saucer. coding for the html page is

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml"><head>
 <title>中文測試</title>
 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 <style type="text/css">
     name
     {
         font-family: "Arial Unicode MS";
         color: blue;
         font-size: 48;
     }
 </style>
</head>
<body>  
  <name>名偵探小怪獸</name>
     <h1>भारतीय जनता पार्टी ने फिर कहा है कि बहुमत न होने के कारण वो दिल्ली में सरकार बनाने की
         इच्छुक नहीं है और दोबारा चुनाव के लिए तैयार है.
    </h1>
 <h1>Japanese 日本国</h1>
</body>
</html>

and Java code for this is

import java.io.*;
import org.xhtmlrenderer.pdf.*;
import com.lowagie.text.pdf.*;
public class ChineseToPdf {
    public static void main(String[] args) {
        try {
            String inputFile = "chinese.html";
            String url = new File(inputFile).toURI().toURL().toString();
            String outputFile = "test.pdf";
            OutputStream os = new FileOutputStream(outputFile);
            ITextRenderer renderer = new ITextRenderer();
            ITextFontResolver resolver = renderer.getFontResolver();
            resolver.addFont("C:/Windows/Fonts/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
            renderer.setDocument(url);
            renderer.layout();
            renderer.createPDF(os);
            os.close();
        } catch (Exception e) {
            System.out.println(e.getMessage());
        }
    }
}

and in output only chinese fonts are rendered properly, Hindi and Japanese come as White space.

Please help me out.

2

There are 2 answers

0
obourgain On BEST ANSWER

The style you defined only apply to tag name, and the Hindi and Japanese text is outside this tag. It is rendered with the default font, which does not support all unicode characters.

To fix the bug, you can change your style to use font "Arial Unicode MS" for all document:

body{font-family: "Arial Unicode MS";}
0
filamoon On

The accepted answer did work. But one more thing to point out:

The font-family setting should start with "Arial Unicode MS". If it starts with a font that does not support CJK, the output pdf will still not display those characters.