Downloaded octet-stream then encoding as pdf; can't get line endings worked out

9.8k views Asked by At

Tools that I'm using for this:

Chrome Notepad++ Sublime Text 3 Fiddler WinMerge Adobe Acrobat Reader X

Synopsis

I have downloaded a pdf twice, once through Chrome as an experimental control; once again through a raw /GET request via Fiddler which returns me an octet-stream. To this point, I can save the octet-stream as pdf and I can get the proper page count and some of the page headers and numbers, but very little of the body content is loading. When I open my file in Adobe Reader X, I get an error that it

Cannot extract the embedded font 'LFIDTH+ArialMT'. Some characters may not display or print correctly

and I cannot work through why it can be extracted from the 'true' pdf but cannot from the one I am saving.

Details

As for my manual pull of the file, I have provided

Accept: application/pdf, application/x-pdf, application/x-gzpdf, application/x-bzpdf

The server sent me back an aplication/octet-stream with an attachment Disposition.

So to recap:

  1. Valid Foo.pdf sitting on my hard drive
  2. HTTP Response with an octet-stream version of same file, in UTF-8 encoding (I assume)

Here is what I know:

I pulled the Message Body of the response from the server and dropped it to file. I then ran a WinMerge comparison of it against the contents of the pdf and every line mismatched on line endings. I re-encoded the EOLs for Unix and the diff shrank to ~1k lines out of 160k. A close inspection of the mismatch indicates that the valid pdf maintains what looks like a NUL 00 character in places whereas my octet-stream contains literal spaces. Also, the "true" pdf is reporting EOL: LF 1252 Mixed through WinMerge. My "raw" pdf is reporting 1252 Unix When I homogenize the 'true' pdf to 1252 Unix, I get the same issue as I explained in the 'raw' one.

Is there anything I can do to get this mess of an octet-stream straightened out?

Note that the pdf that was downloaded through Chrome is historic. I have it on my machine, but I downloaded it "sometime in the past" and the request headers used when processing that /GET are no longer available. Attempting to download through the browser "now" results in an error, but an explicit GET request against the resource through Fiddler is returning the pdf as an octet-stream.

1

There are 1 answers

1
K. Alan Bates On BEST ANSWER

Well now....

In Fiddler Session,

Right click HTTP Response with the application/octet-stream body | Save | Response | Response Body

If Content-Disposition: attachment;filename has been set on the response, the File Save Dialog will be prepopulated with filename

Easy after you know it's there.