How do I process a DXL filedata element when the file encoding is "none"?

331 views Asked by Jaccoud At 15 June 2020 at 21:28

I'm trying to extract attachments from Domino documents which were exported to DXL (Domino XML schema). For elements with encoding="base64" I can handle the filedata content with ease. However, most of the files have encoding="none" — which logically should mean direct embedding — but the container does not have a readable text, rather contains 76-character lines much similar to base64 encoding. They are not valid base64 or uuencoded info, nor anything I can recognize. Does anyone know what sort of arcane encoding is this one IBM calls "none"? A typical segment looks like this:

<file hosttype='msdos' compression='none' flags='sign storedindoc' encoding='none' 
name='myfilename.doc' size='50688' storagesize='32519' desiredcompression='huffman'>
<created><datetime dst='true'>20061110T193351,87-02</datetime></created>
<modified><datetime dst='true'>20061110T193351,73-02</datetime></modified><filedata>
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAXgAAAAAAAAAA
EAAAYAAAAAEAAAD+////AAAAAF0AAAD/////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////

(it goes on for hundreds of lines... up to)

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
</filedata></file>

It looks like some MIME encodign but it is not base64. The number of bits do not add up end the decoder fails. (Yes, I removed the NLs from the parser feed.)

How to decode something which is suposedly not encoded? (According to the IBM magi.)

[post-script] I realized that the document does not conform to the DXL DTD, i.e. it is parseable but does not validate. Also, although encoding="none" the filedata content is indeed base64, although no necessarily padded with '='s at the end. Also, the XML SAX parser was passing me chunks of the text content instead of entire lines. Since base64 needs multiples of 4 characters to operate on (generating 3 bytes), it messed up the decoding. If I ignore the DTD and force a carefully buffered base64 decoding, even when @encoding != "base64" (by the DTD), then all goes well. Looks like IBM does not care following its own DTDs.

Original Q&A

TechQA.

How do I process a DXL filedata element when the file encoding is "none"?

There are 0 answers

Related Questions in ENCODING

Related Questions in LOTUS-NOTES

Related Questions in LOTUS-DOMINO

Related Questions in IBM-DOORS

Related Questions in IBM-DOMINO

Popular Questions

Trending Questions