I read in a PDF file in Python, added a text box on top of the text that I'd like to redact, and saved the change in a new PDF file. When I searched for the text in the redacted PDF file using a PDF reader, the text can still be found.
Is there a way to save the PDF as a single layer file? Or is there a way to ensure that the text under the text box can be removed?
import PyPDF2
import re
import fitz
import io
import os
import pandas
import numpy as np
from PyPDF2 import PdfFileReader, PdfFileWriter
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.graphics import renderPDF
from reportlab.lib import colors
from reportlab.graphics.shapes import *
reader = PyPDF2.PdfReader(files)
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize = A4)
can.rect(65, 750, 40, 30, stroke=1, fill=1)
can.setFillColorRGB(1, 1, 1)
can.save()
packet.seek(0)
new_pdf = PdfFileReader(packet)
output = PyPDF2.PdfFileWriter()
pageToOutput = reader.getPage(1)
pageToOutput.mergePage(new_pdf.getPage(0))
output.addPage(pageToOutput)
outputStream = open('NewFile.pdf', "wb")
output.write(outputStream)
outputStream.close()
I used one of the solutons (pdf2image and PIL) in the link provided by @Matt Pitken, and it worked well.