Extract xml from ZUGFeRD PDF with Ghostscript

561 views Asked by At

We would like to automate the processing of Zugferd invoices. Is there a way to extract and save the xml files embedded in the PDF using Ghostscript?

1

There are 1 answers

1
K J On

as mentioned by KenS Ghostscript can help assemble Zugferd files but not extract the contents. Below we can see those contents in the source xml (lower) and a good !? PDF where the plain text is visible (upper part of image is PDF viewed in WordPad) and can be easily extracted as text. However nothing about PDF extraction is reliable since the format of one PDF is rarely the same as the next unless you make it so.

Many PDF readers have the ability to export such attachments as the source file and many PDF libraries will allow for extraction of the named file in a scripted fashion.

enter image description here

The samples above are from currently very up to date Open Source Java application https://www.mustangproject.org/

For very simple cross platform use there is pdfdetach which can save any attachments by name or all attachments

enter image description here