[TL;DR: the following JVM bytecode instructions seems not to work:
iconst_0
istore 6
...sequential
iinc 6 1
jsr L42
...
; L42
iload 6
ifeq L53 ; Always branches!!!
astore 8
iinc 6 -1
; L53
LDC 100
ISUB ; ERROR, returnAddress is at the top of the stack
A test .class can be found here (with slightly more complex logic). If you want to know more about why I'm seeing these instructions, please keep reading.]
I'm writing a Whitespace compiler targeting JVM bytecode. Although being an esoteric language, Whitespace describes an interesting set of assembly instructions to a stack machine, which maps nicely to the JVM.
Whitespace has labels, which are both targets for jump (goto/jump-if-zero/jump-if-negative) and function calls. The relevant instructions (with names given by me, in the spec they are given as combinations of space, tabs and newlines) are:
mark <label>
: sets a label for the following instructionjump[-if-neg|-if-zero] <label>
: jumps unconditionally or conditionally to the given labelcall <label>
: call a function pointed by labelend <label>
: ends a function, returning to the caller.
My compiler outputs the whole Whitespace program in the main method of a class. The simplest way to implement call
and end
is using the JSR
and RET
opcodes, which are made to implement subroutines. After a JSR
operation the stack will contain a returnAddress
reference that should be stored in a variable for later use in end
.
However, as mark
can be either call
-ed or jump
-ed into, the stack may or may not contain the returnAddress
reference. I decided to use a boolean variable (call-bit, at address 6) to store how the mark was reached, and then test if it should store the top of the stack into a local variable (return-address, at address 8). The implementation for each instruction is as follows:
; ... initialization
iconst_0
istore 6 ; local variable #6 holds the call bit
# call
iinc 6 1 ; sets the call bit
jsr Lxxx ; jumps to the given label, pushing a returnAddress to the stack
# mark
; Lxxx
iload 6 ; loads the call bit
ifeq Lxxx-end ; SHOULD jump to mark's end if the call bit is not set
; call bit is set: mark was call-ed and returnAddress is in the stack
astore 8 ; stores returnAddress to local variable #8
iinc 6 -1 ; resets the call bit
; Lxxx-end
# end
ret 8 ; returns using the stored returnAddress
The problem: ifeq
ALWAYS branches. I also tried reversing the logic (call-bit -> jump-bit, ifeq->ifne), and even simply switching to ifne
(which would be wrong)... but the if always branches to the end. After a call, the returnAddress
stays in the stack and the next operation blows up.
I've used ASM's analyzer to watch the stack to debug all this, but have just asserted this behavior and can't find what I'm doing wrong. My one suspicion is that there's more to iinc
, or to ifeq
than my vain philosophy can imagine. I'll admit that I've only read the instruction set page and ASM's pertinent documentation for this project, but I'd hope that someone can bring a solution from the top of their mind.
In this folder there are the relevant files, including the executable class and the original Whitespace, as well as the output of javap -c
and ASM analysis.
Found a possible reason: the problem is not during execution, but with the verifier. When it seemed that it "always branched", was in fact the verifier testing all possible outcomes of an
if
so it could be sure the stack would look like the same. My code relies on a reference (returnAddress
) maybe or maybe not being present on the stack and the verifier can't check that.That said, the sample code does not run with the
-noverify
flag, but other, simpler examples that failed verification did execute correctly.