Encoding multiple instructions in the same machine code

242 views Asked by At

I am curious if this has been done before and not necessarily whether it has practical value (although the spatial efficiency gains would be obvious). Has encoding multiple instructions within the same machine code ever been done? For example:

(This is completely made up)

0xAEA2 -> add R3 0xA2

0xEAE6 -> mov R1 0xE6

0xAAAA ...

Reinterpreting the machine code by shifting one nibble to the left turns into:

0xEA2E -> mov R1 0x2E

0xAE5A -> add R3 0x5A

3

There are 3 answers

1
Leandro Caniglia On

In the implementation of Smalltalk 78 (one of the predecessors of all modern Smalltalk dialects) there was a two byte store argument 0 bytecode whose second byte was load argument 0. In this way code that jumped to the beginning of the store instruction executed a store while jumps to the middle of it (!) resulted in a load. You can learn more about this in Reviving Smalltalk-78 - Bert Freudenberg - IWST 2014 -around minute 30-

0
Jim Mischel On

This is not new. Years ago we used to have small code optimization contests. For example, the code below implements the C functions strcpy, strncpy, and stpcpy in just 42 bytes. I wrote this in 1993. 16-bit 8086 assembly language, C-callable, with parameters passed on the stack, and the caller cleans up the stack.

Note how the entry point for strcpy is just the first byte of an instruction, which loads the next two bytes into the AX register. And the db 3Ch is the first byte of another instruction, which consumes the next byte (the or al), and then executes the STC instruction that is the second byte of the or al,0F9h instruction executed by strncpy.

It's instructive to create a listing file to get the opcodes, and then trace what happens at each of the three entry points.

These kinds of tricks came in handy when we were patching existing code in place. Sometimes we could make a binary patch to a .COM file without changing the addresses of any critical parts. That was important when we had things that must be 16-byte (or larger) aligned, and we didn't want to take the hit of wasting 15 bytes of dead space just so we could add another instruction. Oh, the games you'd play when you only had 64 K bytes to work with.

   Ideal
   Model Small,c
   CodeSeg

   Public strcpy,strncpy,stpcpy
 ;
 ; 42 bytes

 ;
 ; char * strcpy (char *dest, char *src);
 ;
 strcpy:
   db 0B8h       ;mov ax,immed
 ;
 ; char * stpcpy (char *dest, char *src);
 ;
 stpcpy:
   mov al,0Ch    ;0Ch is the opcode for OR AL,immediate
   mov cx,0ffffh ;make max count
   db 3Ch        ;cmp al,immediate
                 ;stpcpy  - CF set, ZF set
                 ;strcpy  - CF set, ZF clear
 ;
 ; char * strncpy (char *dest, char *src, unsigned len);
 ;
 strncpy:
   or al,0F9h    ;strncpy - CF clear, ZF clear
                 ;0F9h is the opcode for STC,
                 ;which is executed by strcpy and stpcpy
   pop dx        ;return address in DX
   pop bx        ;dest string in BX
   pop ax        ;source string in AX
   jc l0         ;if strncpy
   pop cx        ;then get length in CX
   push cx       ;and fixup stack
 l0:
   push ax       ;more stack fixup
   push bx       ;save return value
   push si       ;gotta save SI
   xchg si,ax    ;SI points to source string
   lahf          ;save flags for exit processing
 l1:
   lodsb         ;get character
 l2:
   jcxz Done     ;done if count = 0
   mov [bx],al   ;store character
   inc bx        ;bump dest
   or al,al      ;if character is 0 or
   loopnz l1     ;if at end of count, then done
   sahf          ;restore flags
   ja l2         ;for strncpy(), must loop until count reached
 Done:
   pop si        ;restore SI
   pop ax        ;return value in AX
   jnz AllDone   ;done if not stpcpy
   xchg ax,bx    ;otherwise return pointer to
   dec ax        ;end of string
 AllDone:
   call dx       ;return to caller

 End

I remember spending hours coming up with that one just to stump a friend of mine who was always beating me in these contests. He spent a few minutes looking it over and shaved another byte from it.

0
xmojmr On

The beginners' guide to Redcode, Version 1.22, Copyright 1997-2004 Ilmari Karonen:

...

Core War (or Core Wars) is a programming game where assembly programs try to destroy each other in the memory of a simulated computer. The programs (or warriors) are written in a special language called Redcode, and run by a program called MARS (Memory Array Redcode Simulator)...

The fun of this game is based on reinterpreting of the machine code after modifying it by operations similar to your "shifting of nibbles".

See http://www.corewars.org/information.html for more...