PDF to Image using PDFBox 1.8.9 text overlapped

1k views Asked by At

I am trying to convert PDF to Image using PDFBox 1.8.9. The fonts are getting overlapped. I know this problem does not occur in PDFBox 2.0 SNAPSHOT. But it cannot be used in production until it is officialy released.

Below is the code:

PDDocument pdDocument = PDDocument.load(new File("test.pdf"));  
List<PDPage> pages = pdDocument.getDocumentCatalog().getAllPages();
int pageCounter = 1;
for (PDPage page : pages) {
    BufferedImage bufferedImage = page.convertToImage();
    File imageFile = new File(String.format("/tmp/pdf-image-%s.jpg", pageCounter));
    ImageIO.write(bufferedImage, "jpg", imageFile);     
    pageCounter++;
}

The font appears similar to PDF except overlapping?

When I try to convert PDF to Image using pdfbox-app-1.8.9.jar through command line, the Image is generated with a different font. Does PDFBox has any option to specify a custom font using .ttf? In that case, how can I specify?

Actual PDF:

enter image description here

Converted Image using above code:

enter image description here

Converted Image using commandline:

enter image description here

1

There are 1 answers

0
jaghan On BEST ANSWER

PDFBox substitutes font for Helvetica and Times New Roman etc., when it is not found on the server. After installing the above fonts on my Linux machine, the problem got solved.