Why does inserting characters into an executable binary file cause it to "break" ?
And, is there any way to add characters without breaking the compiled program?
Background
I've known for a long time that it is possible to use a hex editor to change code in a compiled executable file and still have it run as normal...
Example
As an example in the application below, Facebook
could be changed to Lacebook
, and the program will still execute just fine:
But it Breaks with new Characters
I'm also aware that if new characters are added, it will break the program and it won't run, or it will crash immediately. For example, adding My
in front of Facebook
would achieve this:
What I know
- I've done some work with
C
and understand that code is written in human readable, compiled, and linked into an executable file. - I've done introductory studies of assembly language and understand the concepts about data, commands, and pointers being moved around
- I've written small programs for Windows, Mac and Linux
What I don't know
- I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
- I understand why having extra characters in a text string of the binary file would cause problems
What I'd like to know
- Why do the extra characters cause the program to break?
- What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
- Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
Modern operating systems just map the file into memory. They don't bother loading pages of it until it's needed.
Because they put all the other information in the file in the wrong place, so the loader winds up loading the wrong things. Also, jumps in the code wind up being to the wrong place, perhaps in the middle of an instruction.
It depends on exactly what gets screwed up. It may be that you move a header and the loader notices that some parameters in the header have invalid data.
Probably not reliably. At a minimum, you'd need to reliably identify sections of code that need to be adjusted. That can be surprisingly difficult, particularly if someone has attempted to make it so deliberately.