How to read form fragment references in a pdf document?

707 views Asked by At

I am working with adobe LiveCycle ES4 and I am attempting to make a custom LiveCycle component (in java) that counts references to a specified form fragment. However I am having some difficulty finding documentation on form fragments in pdf files. So my question is how would I go about reading form fragment references from a pdf document?

Also any documentation, API, or library that may help me with this task would be greatly appreciated.

--Form Fragments-- A form fragment is a collection of form objects (fields, buttons, shapes, tables, etc) as well as related style/formatting that is saved as a separate .xsd file in a form fragment library (usually a directory on the livecycle server that holds the .xsd files). References to form fragments can be inserted into a form when working in LiveCycle Designer.This is particularly useful for creating many similar forms (such as a form fragment with fields for contact information). When a form fragment is edited the changes are reflected across all forms that hold a reference to that fragment (when the pdf is opened and has access to the form fragment library.

2

There are 2 answers

13
David van Driessche On BEST ANSWER

The PDF specification currently is an official ISO standard - ISO 32000. You should be able to get the document from the ISO organisation or from your country's standards organisation.

However, before being an ISO standard, PDF was developed and maintained by Adobe and they still have the specification available on their site: http://www.adobe.com/devnet/pdf/pdf_reference.html

There are differences between this specification and the ISO 32000 specification document, but they are largely from an editorial manner so for your purpose I'd look at the Adobe document.

Using the additional information in the comments, your test file and a low level PDF document browser (pdfToolbox in this case - attention, I'm affiliated with this product), I found following information:

In the "Catalog" object for your test PDF, you'll find a key named "AcroForm" that points to a dictionary with information about your form.

In that "AcroForm" dictionary you'll find a key called "XFA" that contains what appears to be almost all information about the XFA form that LifeCycle designer generated.

That "XFA" key points to an array which seems to consist of pairs of information. Element 0 is a string called "preamble", element 1 seems to be the data belonging to that string. So each pair of elements is a bit of information.

The information in that array consists of "preamble", "config", "template", "localeset", "xmpmeta" and "postamble". If you look at the element for "template" (the 6th element in the array if you calculate 1-based), you'll find the data you are looking for. The data is stored as a FlateDecoded stream that you'll have to uncompress - then it's just XML data that should be fairly easy to parse. In it are three lines that for you should be of particular interest:

<subform x="6.35mm" y="6.35mm" name="TestFragment1"
<subform x="3.175mm" y="34.925mm" name="TestFragment2"
<subform x="0.125in" y="2.75in" name="TestFragment2"

I'm assuming the XFA specification pointed to by mkl contains more information about these things, but it seems like simply looking for "subform" elements in the XML should get you the references to the form fragments pretty easily.

0
mkl On

In the XFA xml data in the sample PDF you provided there are processing instructions like this:

<?templateDesigner expand 1?><?designerFragmentSource CjxzdWJmb3JtIHVzZWhyZWY9Ii4uXC4uXC4uXEFkb2JlXEFkb2JlIExpdmVDeWNsZSBFUzRcZm9y
bV9mcmFnbWVudHNcVGVzdEZyYWdtZW50MS54ZHAjc29tKCR0ZW1wbGF0ZS5mb3JtMS5UZXN0RnJh
Z21lbnQxKSIgeD0iNi4zNW1tIiB5PSI2LjM1bW0iIHhtbG5zPSJodHRwOi8vd3d3LnhmYS5vcmcv
c2NoZW1hL3hmYS10ZW1wbGF0ZS8zLjMvIgo+PD90ZW1wbGF0ZURlc2lnbmVyIGV4cGFuZCAxPz48
L3N1YmZvcm0KPg==?>

Decoding the base64 encoded argument in there one gets:

<subform usehref="..\..\..\Adobe\Adobe LiveCycle ES4\form_fragments\TestFragment1.xdp#som($template.form1.TestFragment1)" x="6.35mm" y="6.35mm" xmlns="http://www.xfa.org/schema/xfa-template/3.3/"
><?templateDesigner expand 1?></subform
>

So it looks like your

custom LiveCycle component (in java) that counts references to a specified form fragment

should parse the XFA XML, look for these designerFragmentSource processing instructions, and analyze them.

Please be aware, though, that these processing instructions most likely are proprietary Adobe stuff (I at least did not find them in the current XFA specification). Thus, as soon some third party tool touches the XFA XML, the PIs might not be accurate anymore. You actually even cannot be sure what happens in different Adobe software versions.