Python Wand converts from PDF to JPG background is incorrect

8.6k views Asked by At

I found a so wired thing while converting a pdf to jpeg, so i'd like to figure out that maybe this is a small bug. See the converted jpg below, you could find that, the background color are all black. The image is here: www.shdowin.com/public/02.jpg

However, in the source file of pdf, you can see that the background color are normal white. The image is here: www.shdowin.com/public/normal.jpg

I thought this maybe my pdf file's fault, however, when i try to use Acrobat.pdf2image in .NET environment, the converted jpg shows correctly.

Here is my code:

from wand.image import Image
from wand.color import Color
import os, os.path, sys

def pdf2jpg(source_file, target_file, dest_width, dest_height):
    RESOLUTION    = 300
    ret = True
    try:
        with Image(filename=source_file, resolution=(RESOLUTION,RESOLUTION)) as img:
            img.background_color = Color('white')
            img_width = img.width
            ratio     = dest_width / img_width
            img.resize(dest_width, int(ratio * img.height))
            img.format = 'jpeg'
            img.save(filename = target_file)
    except Exception as e:
        ret = False

    return ret

if __name__ == "__main__":
    source_file = "./02.pdf"
    target_file = "./02.jpg"

    ret = pdf2jpg(source_file, target_file, 1895, 1080)

Any suggestions for the issue?

I have uploaded the pdf to the url: 02.pdf

You can try...

3

There are 3 answers

0
cendy On

I got the answer by myself. It's because of the alpha_channel case. This pdf includes some transparent background(after i transfomred to png format), and for resize, ImageMagick choose the best resize filter, so black background displayed.

So, after a lot of experiments, I found that just add "img.alpha_channel=False" in "with" statement(before img.save()), that would work properly.

Thanks for VadimR's advise, it is helpful.

0
hynekcer On

An easy solution is to change the order of commands: Change the format to jpeg first and then to resize

        img.format = 'jpeg'
        img.resize(dest_width, int(ratio * img.height))

It is also very easy to open the PDF in the exact size by the resolution tuple, because the resolution can be a float number.

0
Martin On

For others who still have this problem I fixed it after googling and trying a couple of hours thanks to this question https://stackoverflow.com/a/40494320/2686243 by using this two lines:

img.background_color = Color("white")
img.alpha_channel = 'remove'

Tried with Wand version 0.4.4