Why does inserting characters into an executable binary file cause it to "break"?

2.9k views Asked by At

Why does inserting characters into an executable binary file cause it to "break" ?

And, is there any way to add characters without breaking the compiled program?

Background

I've known for a long time that it is possible to use a hex editor to change code in a compiled executable file and still have it run as normal...

Example

As an example in the application below, Facebook could be changed to Lacebook, and the program will still execute just fine:

enter image description here

enter image description here

But it Breaks with new Characters

I'm also aware that if new characters are added, it will break the program and it won't run, or it will crash immediately. For example, adding My in front of Facebook would achieve this:

enter image description here

What I know

What I don't know

  • I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
  • I understand why having extra characters in a text string of the binary file would cause problems

What I'd like to know

  1. Why do the extra characters cause the program to break?
  2. What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
  3. Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
3

There are 3 answers

0
David Schwartz On BEST ANSWER

I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'

Modern operating systems just map the file into memory. They don't bother loading pages of it until it's needed.

Why do the extra characters cause the program to break?

Because they put all the other information in the file in the wrong place, so the loader winds up loading the wrong things. Also, jumps in the code wind up being to the wrong place, perhaps in the middle of an instruction.

What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?

It depends on exactly what gets screwed up. It may be that you move a header and the loader notices that some parameters in the header have invalid data.

Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?

Probably not reliably. At a minimum, you'd need to reliably identify sections of code that need to be adjusted. That can be surprisingly difficult, particularly if someone has attempted to make it so deliberately.

0
Barmar On

When a program is compiled into machine code, it includes many references to the addresses of instructions and data in the program memory. The compiler determines the layout of all the memory of the program, and puts these addresses into the program. The executable file is also organized into sections, and there's a table of contents at the beginning that contains the number of bytes in each section.

If you insert something into the program, the address of everything after that is shifted up. But the parts of the program that contain references to the program and data locations are not updated, they continue to point to the original addresses. Also, the table that contains the sizes of all the sections is no longer correct, because you increased the size of whatever section you modified.

0
Kaz On

The format of a machine-language executable file is based on hard offsets, rather than on parsing a byte stream (like textual program source code). When you insert a byte somewhere, the file format continues to reference information which follows the insertion point at the original offsets.

Offsets may occur in the file format itself, such as the header which tells the loader where things are located in the file and how big they are.

Hard offsets also occur in machine language itself, such in instructions which refer to the program's data or in branch instructions.

Suppose an instruction says "branch 200 bytes down from where we are now", and you insert a byte into those 200 bytes (because a character string happens to be there that you want to alter). Oops; the branch still covers 200 bytes.

On some machines, the branch couldn't even be 201 bytes even if you fixed it up because it would be misaligned and cause a CPU exception; you would have to add, say, four bytes to patch it to 204 (along with a myriad other things needed to make the file sane).