I am running GNU sed version 4.2.1 on windows. I have a huge number of PDF files having %%EOF
+ newline + a lot of NUL chars in the last record.
See hexdump below.
0000b890: 25 25 45 4F 46 0D 0A 00 - 00 00 00 00 00 00 00 00 |%%EOF
|
0000b8a0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 |
|
I need to change the last record to be %%EOF
only. The expression ^%%EOF\x0d\x0a\x0{10,30000}
matches the characters in Notepad++, but it seems it does not work in sed. Is anyone able to help? Many thanks.
Assuming your grep supports it, for a given input.pdf do
Read the byte offset of the last %%EOF in the file into the variable offset
cut off the first offset + 5 bytes (the length of the string
%%EOF
) from the original file, then the output.pdf should be what you wantedBut depending on the nature of the PDF (e.g. no %%EOF at all at the end, (edit: or other data but null bytes following the
%%EOF
[thx @mkl) this might behave different from what you want or cause a lot of other problems.