Tools that I'm using for this:
Chrome
Notepad++
Sublime Text 3
Fiddler
WinMerge
Adobe Acrobat Reader X
Synopsis
I have downloaded a pdf twice, once through Chrome as an experimental control; once again through a raw /GET
request via Fiddler which returns me an octet-stream. To this point, I can save the octet-stream as pdf and I can get the proper page count and some of the page headers and numbers, but very little of the body content is loading. When I open my file in Adobe Reader X, I get an error that it
Cannot extract the embedded font 'LFIDTH+ArialMT'. Some characters may not display or print correctly
and I cannot work through why it can be extracted from the 'true' pdf but cannot from the one I am saving.
Details
As for my manual pull of the file, I have provided
Accept: application/pdf, application/x-pdf, application/x-gzpdf, application/x-bzpdf
The server sent me back an aplication/octet-stream
with an attachment Disposition.
So to recap:
- Valid Foo.pdf sitting on my hard drive
- HTTP Response with an octet-stream version of same file, in UTF-8 encoding (I assume)
Here is what I know:
I pulled the Message Body of the response from the server and dropped it to file. I then ran a WinMerge comparison of it against the contents of the pdf and every line mismatched on line endings. I re-encoded the EOLs for Unix and the diff shrank to ~1k lines out of 160k. A close inspection of the mismatch indicates that the valid pdf maintains what looks like a NUL 00
character in places whereas my octet-stream contains literal spaces. Also, the "true" pdf is reporting EOL: LF 1252 Mixed
through WinMerge. My "raw" pdf is reporting 1252 Unix
When I homogenize the 'true' pdf to 1252 Unix
, I get the same issue as I explained in the 'raw' one.
Is there anything I can do to get this mess of an octet-stream straightened out?
Note that the pdf that was downloaded through Chrome is historic. I have it on my machine, but I downloaded it "sometime in the past" and the request headers used when processing that
/GET
are no longer available. Attempting to download through the browser "now" results in an error, but an explicit GET request against the resource through Fiddler is returning the pdf as an octet-stream.
Well now....
In Fiddler Session,
Right click HTTP Response with the
application/octet-stream
body | Save | Response | Response BodyIf
Content-Disposition: attachment;filename
has been set on the response, the File Save Dialog will be prepopulated withfilename
Easy after you know it's there.