I am trying to find references about different designs of metamorphic generators can someone point me to the right direction. I have gone through some papers in ACM but couldn't find what I am looking for.
Metamorphic generator
1.7k views Asked by user222094 AtThere are 2 answers
If you refer to metamorphic engines, I unfortunately don't know about any good references. I think this stems from the subject still being taboo due to how it's usually used by virus writers. I think this is unjustified though, as the technique is interesting on its own merit. I've always been fascinated by self-modifying and self-repairing systems. And one could also say it is slightly related to the AI-field.
For the uninformed, a metamorphic engine is an executable file which changes every byte and instruction in itself such that while the new file content is unique compared to the previous generation, the overall algorithm is the same. Anti-virus software vendors had major trouble identifying viruses when the technique was first used by viruses, as simply identifying viruses by signature wasn't effective when each generation was different. The introduction of polymorphic and metamorphic viruses marked the era where anti-virus software switched from identification by signatures to heuristics. That is, instead of looking at the exact code or byte stream, you rather try to deduce what the code does.
One will run into several problems when implementing such a thing, which depend on the executable format used, and the CPU architecture:
Some RISC architectures can't hold full 32-bit immediates, so the code segment will inevitably hold data pools for immediates, which is fetched with a double lookup. That is a serious show stopper, because you need a way to separate code from data unambiguously. That is, some data values can be legally represented as code, and vice-versa.
If your program links against dynamic libraries like say, the C runtime, you also need to recalculate the information used by relocation, which is non-trivial.
And the biggest problem is that such programs tend to grow exponentially in size for each new generation. If the inital "simplifier" algorithm (described below) does a poor job, more and more garbage code is added. And "poor job" kind of means that it does not manage to simplify the code back to its original flawlessly. Any extra 'bloat' from the previous generation accumulates.
The general technique works as follows:
The application has to read itself, and parse the executable format (ELF, COFF, a.out, PE). Then for each group of N instructions, it tries to simplify the algorithm. For example, an addition of value X followed by a subtraction by value X is effectively a noop and can be ignored. a*b+a*c
can be simplified to a*(b+c)
, saving one instruction. So this simplifier finds the bare skeleton of the overall algorithm, since it previously went through metamorphism.
Following that, you obfuscate the code again by doing the reverse. Take N instructions and replace them with something else which does the same thing. Other stages involves splitting up data immediates into several parts, obfuscating strings and splitting up code into several new functions, and moving the code around. All this is done while keeping track of code and data references. Then finally, the code is assembled and linked back to its form as an executable file.
It's mind-bogglingly complex. For true hardcore assembly coders only. You have been warned.
look for engines written by virii writers:
also search on google for "Project Bukowski"