Programatically redact text and images on a page in PDF file

781 views Asked by At

Adobe Acrobat provides the ability to redact PDF files. I would like to use this feature programmatically where I provide the page number and all text and/or images are redacted.

Is there any way to do this programmatically?

1

There are 1 answers

0
Hussam Barouqa On

Adobe Acrobat is mostly a GUI-based application. Though Adobe offers an Acrobat SDK that could be used to extend the functionality of the application, which might be worth looking into if your main concern is using Acrobat specifically.

If you are looking for programmatic control of PDF's redaction in general, I would say to consider using a programming library. For example, I know the LEADTOOLS Redaction SDK (which is what I am familiar with, since I work for the vendor) has features for performing what you describe.

The general approach (Using C#) would look something like this:

Extract a specific page from a PDF and parse all objects in the page

PDFDocument document = new PDFDocument(pdfFileName);
PDFParsePagesOptions options = PDFParsePagesOptions.All; 
document.ParsePages(options, 1, 1); // parses all objects for page 1

This will populate each document.page with a number of PDFObject objects that represent the text and images in the PDF.

You can then use the properties of these objects to loop through them, adding redaction to each, which would look something like this:

AnnRedactionObject redactionObject = new AnnRedactionObject();
redactionObject.Rect = bounds; // corresponding coordinates
redactionObject.Fill = AnnSolidColorBrush.Create("Black");                    
annotations.Add(redactionObject);

Finally, with the redaction added, the final step would be to realize the annotation by setting redaction options for document:

document.Annotations.RedactionOptions = new DocumentRedactionOptions(); 
document.Annotations.RedactionOptions.ViewOptions.Mode = DocumentRedactionMode.Apply; 
document.Annotations.RedactionOptions.ViewOptions.ReplaceCharacter = '*'; 
document.Annotations.RedactionOptions.ConvertOptions.Mode = DocumentRedactionMode.Apply; 
document.Annotations.RedactionOptions.ConvertOptions.ReplaceCharacter = '*'; 

The document can will be redacted appropriately when viewed or can be used to write a PDF containing the redaction.