extract XML metadata/column from jpeg2000 images using vbscript/VBA

297 views Asked by At

I have routines that collect and analyze file/folder data in relation to database content--system has been in place and working well for many years. These routines use vbscript/AccessVBA to collect file information and prepare/load records to a SQL server db. I'm not currently storing the filestreams in SQL server, just their paths and data about the files. Now I need to extract XML metadata from some of these files, which I haven't had to work with.

The files are JPEG2000 derived from TIFFs. They are generated via batch and metadata from the original TIFFs is added to the JP2s. I can see the XML using JP2 Meta Editor:

j2k tool

The XML looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Originating Facility -->
<TIFF>
   <METADATA>
      <FILENAME>L145Y1921I001S0005.tif</FILENAME>
      <SEPARATOR>\</SEPARATOR>
      <PARENT>I:\Processing_Unit\L145\Box127</PARENT>
      <CANONICALPATH>I:\Processing_Unit\L145\Box127\L145Y1921I001S0005.tif</CANONICALPATH>
      <ABSOLUTEPATH>I:\Processing_Unit\L145\Box127\L145Y1921I001S0005.tif</ABSOLUTEPATH>
      <PATH>I:\Processing_Unit\L145\Box127\L145Y1921I001S0005.tif</PATH>
      <FILE>true</FILE>
      <DIRECTORY>false</DIRECTORY>
      <FILELENGTH>18462952</FILELENGTH>
      <HIDDEN>false</HIDDEN>
      <ABSOLUTE>true</ABSOLUTE>
      <URL>file:/I:/Processing_Unit/L145/Box127/L145Y1921I001S0005.tif</URL>
      <URI>file:/I:/Processing_Unit/L145/Box127/L145Y1921I001S0005.tif</URI>
      <READ>true</READ>
      <WRITE>true</WRITE>
      <EXTENSION>tif</EXTENSION>
      <MODIFIED>2009-04-02 11:17:31</MODIFIED>
      <DATE>20090402</DATE>
      <DATEPATTERN>yyyyMMdd</DATEPATTERN>
      <TIME>111731987</TIME>
      <TIMEPATTERN>HHmmssSSS</TIMEPATTERN>
      <TYPE>image/tiff</TYPE>
      <PID>null</PID>
      <OID>null</OID>
      <FID>null</FID>
      <PROCESSOR>unknown</PROCESSOR>
   </METADATA>
   <HEADER>
      <LITTLEENDIAN>true</LITTLEENDIAN>
      <VERSION>1.0</VERSION>
   </HEADER>
   <IMAGEFILEDIRECTORY>
      <ELEMENT>
         <NAME>NewSubfileType</NAME>
         <TAG>254</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>4</TYPE>
         <VALUE>0</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ImageWidth</NAME>
         <TAG>256</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2705</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ImageLength</NAME>
         <TAG>257</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2275</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>BitsPerSample</NAME>
         <TAG>258</TAG>
         <LENGTH>3</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>8</VALUE>
         <VALUE>8</VALUE>
         <VALUE>8</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Compression</NAME>
         <TAG>259</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>1</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>PhotometricInterpretation</NAME>
         <TAG>262</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>DocumentName</NAME>
         <TAG>269</TAG>
         <LENGTH>22</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>L145Y1921I001S0005.tif</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ImageDescription</NAME>
         <TAG>270</TAG>
         <LENGTH>6</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>paper</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Make</NAME>
         <TAG>271</TAG>
         <LENGTH>10</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>Phase One</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Model</NAME>
         <TAG>272</TAG>
         <LENGTH>6</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>P 30+</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Orientation</NAME>
         <TAG>274</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>1</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>SamplesPerPixel</NAME>
         <TAG>277</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>3</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>RowsPerStrip</NAME>
         <TAG>278</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2275</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>XResolution</NAME>
         <TAG>282</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>5</TYPE>
         <VALUE>300.0</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>YResolution</NAME>
         <TAG>283</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>5</TYPE>
         <VALUE>300.0</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>PlanarConfiguration</NAME>
         <TAG>284</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>1</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ResolutionUnit</NAME>
         <TAG>296</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Software</NAME>
         <TAG>305</TAG>
         <LENGTH>51</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>Capture One 4 Windows; Adobe Photoshop CS3 Windows</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>DateTime</NAME>
         <TAG>306</TAG>
         <LENGTH>20</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>2009:03:26 11:23:36</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Artist</NAME>
         <TAG>315</TAG>
         <LENGTH>33</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>Preservation Center</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Custom</NAME>
         <TAG>34665</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>4</TYPE>
         <VALUE>null</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Custom</NAME>
         <TAG>34675</TAG>
         <LENGTH>560</LENGTH>
         <TYPE>7</TYPE>
         <VALUE>null</VALUE>
      </ELEMENT>
   </IMAGEFILEDIRECTORY>
</TIFF>

I need to extract the original document name--the parent TIFF name--from each of these JP2 files.

Is there a straightforward way to incorporate this into the existing file collection routine, using VBA/VBscript? I will need to be able to process hundreds of thousands of existing file records to get this new additional value, as well as including this extraction in folder scans going forward.

Thanks in advance.

0

There are 0 answers