How to correctly redact PDF file using Python?

1.3k views Asked by At

I am trying to redact PDF files using python, tried many libraries like pdfrw, pdfminner and even pyPDF2 but none them helped me to redact content of PDF properly because it turns out that the library is merging the PDF using layers & its placing my redaction box on top of the original layer. I can Even select the contents beneath my redaction box (see refer image). I tried to flattened them it did helped me but it turns out I cant even select or copy any contents or search any part.

Issue with PDF

Issue with PDF

So please suggest me how to correctly redact the contents inside a PDF.

This is what I want to achieve: the redacted part is no longer accessible but the rest of the file is

1

There are 1 answers

0
Marcus On

Finally I was able to redact the contents properly, all you have to do is to get stream of PDF data & found out the object 'TJ' ,'BT' and 'ET' then replace those text with a Null or with words you need to replace. After that you can create a box over it also