Using sed to fix pdf files

Question

Using sed to fix pdf files

457 views Asked by Maurizio Zocchi At 07 September 2017 at 16:59

I am running GNU sed version 4.2.1 on windows. I have a huge number of PDF files having %%EOF + newline + a lot of NUL chars in the last record.

See hexdump below.

0000b890: 25 25 45 4F 46 0D 0A 00 - 00 00 00 00 00 00 00 00 |%%EOF           
|

0000b8a0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 |                
|

I need to change the last record to be %%EOF only. The expression ^%%EOF\x0d\x0a\x0{10,30000} matches the characters in Notepad++, but it seems it does not work in sed. Is anyone able to help? Many thanks.

Original Q&A

There are 1 answers

**Stefan Hegny** · Answer 1 · 2017-09-08T21:43:39+00:00

Assuming your grep supports it, for a given input.pdf do

Read the byte offset of the last %%EOF in the file into the variable offset

offset=$( grep -a -b '%%EOF' input.pdf  | tail -1 | cut -d: -f1 )

cut off the first offset + 5 bytes (the length of the string %%EOF) from the original file, then the output.pdf should be what you wanted

head -c$(( $offset + 5 )) input.pdf > output.pdf

But depending on the nature of the PDF (e.g. no %%EOF at all at the end, (edit: or other data but null bytes following the %%EOF[thx @mkl) this might behave different from what you want or cause a lot of other problems.

TechQA.

Using sed to fix pdf files

There are 1 answers

Related Questions in PDF

Related Questions in SED

Related Questions in MALFORMED

Related Questions in NUL

Popular Questions

Popular Tags

Trending Questions