I need to replace the images in the PDF document which I can do. However, after extracting and replacing the resource image object, PDF images become corrupted if there is a soft mask or mask(stencil image) in the image, mask is lost.
This is my code (used pdfbox via python with jpype):
for page in pages:
resources = page.getResources()
if resources:
pageImages = resources.getXObjectNames()
for c in pageImages:
imageObj = resources.getXObject(c)
if isinstance(imageObj, PDImageXObject):
bImage = imageObj.getImage()
suffix = str(imageObj.getSuffix())
tmp = tempfile.NamedTemporaryFile(suffix="." + suffix, mode="wb")
out = FileOutputStream(tmp.name)
if bImage.getColorModel().hasAlpha():
target = BufferedImage(bImage.getWidth(), bImage.getHeight(), BufferedImage.TYPE_INT_RGB)
g = target.createGraphics()
g.fillRect(0, 0, bImage.getWidth(), bImage.getHeight())
g.drawImage(bImage, 0, 0, None)
g.dispose()
ImageIOUtil.writeImage(target, suffix, out)
else:
ImageIOUtil.writeImage(bImage, suffix, out)
newImage = PDImageXObject.createFromFile(tmp.name, pdf)
resources.put(c, newImage)
tmp.close()
In the above example code even if i didn't touch extracted image and immediately replace it with extracted one if image has mask it's gone after that. I can get smask like this to check:
imageObj.getSoftMask()
How can i replace image and apply original image mask to newImage PDImageXObject using pdfbox? There is "applyMask" function in PDImageXObject but it is private.