How to convert pdf to image in Pymupdf while retaining original document shape?

248 views Asked by At

I am trying to convert pdf to image using pymupdf. It is converting but the issue is it is changing the output size of the image. I want to retain the shape of the image as the input pdf.

def split_pdf_mu(src):
    doc = fitz.open(src)  # open document
    file_name = os.path.basename(src)
    dest = os.path.expanduser("~")+'/tmp/splitted/'
    split_paths = []
    for page_index, page in enumerate(doc):  # iterate through the pages
        
        # zoom = 300/72    # zoom factor
        # mat = fitz.Matrix(zoom, zoom)
        # pix = page.get_pixmap(matrix = mat)
        
        pix = page.get_pixmap()  # render page to an image
        
        dest_path = os.path.join(dest, f'{file_name}_page{page_index}.png')
        # pix.save(dest_path)
        pix.pil_save(dest_path, format="PNG",optimize=False)
        split_paths.append(dest_path)
    return split_paths

I have tried using zoom factor but that doesn't seem to help. Can anyone help convert pdf and produce image of the same dimensions as the original pdf document.

1

There are 1 answers

0
ASiD-0 On

Using PyMuPDF you have 2 options to control the dimensions of the output image

  1. using the dpi (Dots per Inch).

    dpi = 90
    pix = page.get_pixmap(dpi=dpi)
    

Now let's look what dpi actually means.

In general Pdfs have physical dimensions instead of a resolution ( like images do). e.g. a pdf that is created at a standard A4 paper size will have 8.3 x 11.7 inch dimensions.

Now, what you are trying is to translate this to pixels so you need a conversion unit. That is what dpi means here. e.g. in the previous example if we use 90 dpi our A4 sized pdf will result in an image of size 8.3 * 90 x 11.7 * 90 pixels resulting in an image of 747 x 1.053 pixels. Thus keeping the original aspect ratio and scaling the dimensions of a pdf to pixels of an image

2.using zoom

zoom = 2
magnify = fitz.Matrix(zoom, zoom) 
pix = first_page.get_pixmap(matrix=magnify)

In zoom we still need to know based on our input pdf what will the image size will be Well here you will observe that it's the same conversion as before only this time the dpi is set to 72 by default but with the difference that in the conversion now we also use the zoom variable as well. Here in our example we have set it to 2. So our output image will be

8.3 * 72 * 2 x 11.7 * 72 * 2

this will result in an output image of 11.952 x 845 pixels

So if you want to know the exact image resolution you want to end up with I would suggest you go with the dpi way because you can adjust the dpi to end up with very specific numbers for the output image dimensions.

E.g. if you want to end up with an output image of 1700, 2200 from a pdf you can just figure out the correct dpi. For this example it would be

# use this to get the size of your pdf page
pdf_width, pdf_height = page.rect.width/72, page.rect.height/72

dpi = max(round(sqrt(1700*2200/(pdf_width*pdf_height))),1)

and this would round the dpi to be as close as you could get to the specific pixel dimensions you want.