I'm writing an RPC library for AVR and need to pass a function address to some inline assembler code and call the function from within the assembler code. However the assembler complains when I try to call the function directly.
This minimal example test.cpp illustrates the issue (in the actual case I'm passing args and the function is an instantiation of a static member of templated class):
void bar () {
return;
}
void foo() {
asm volatile (
"call %0" "\n"
:
: "p" (bar)
);
}
Compiling with avr-gcc -S test.cpp -o test.S -mmcu=atmega328p
works fine but when I try to assemble with avr-gcc -c test.S -o test.o -mmcu=atmega328p
avr-as complains:
test.c: Assembler messages:
test.c:38: Error: garbage at end of line
I have no idea why it writes "test.c", the file it is referring to is test.S, which contains this on line 38:
call gs(_Z3barv)
I have tried all even remotely sensible constraints on the paramter to the inline assembler that I could find here but none of those I tried worked.
I imagine if the gs() part was removed, everything should work, but all constraints seem to add it. I have no idea what it does.
The odd thing is that doing an indirect call like this assembles just fine:
void bar () {
return;
}
void foo() {
asm volatile (
"ldi r30, lo8(%0)" "\n"
"ldi r31, hi8(%0)" "\n"
"icall" "\n"
:
: "p" (bar)
);
}
The assembler produced looks like this:
ldi r30, lo8(gs(_Z3barv))
ldi r31, hi8(gs(_Z3barv))
icall
And avr-as doesn't complain about any garbage.
There are several issues with the code:
Issue 1: Wrong Constraint
The correct constraint for a call target is
"i"
, thus known at link-time.Issue 2: Wrong % print-modifier
In order to print an address suitable for a call, use
%x
which will print a plain symbol withoutgs()
. Generating a linker stub at this place by means ofgs()
is not valid syntax, hence "garbage at end of line". Apart from that, as you are callingbar
directly, there is no need for linker stub (at least not for this kind of symbol usage).Issue 3:
call
instruction might not be availableTo factor out whether a device supports
call
or justrcall
, there is%~
which prints a singler
if justrcall
is available, and nothing ifcall
is available.Issue 4: The Call might clobber Registers or have other Side-Effects
It's unlikely that the call has no effects on registers or on memory whatsoever. If you description of the inline asm does not match some side-effects of the code, it's likely that you will get wrong code sooner or later.
Taking it all together
Let's assume you have a function
bar
written in assembly that takes two 16-bit operands in R22 and R26, and computes a result in R22. This function does not obey the avr-gcc C/C++ calling convention, so inline assembly is one way to interface to such a function. Forbar
we cannot write a correct prototype anyways, so we just provide a prototype so that we can use symbolbar
. Register X has constraint"x"
, but R22 has no own register constraint, and therefore we have to use a local asm register:Generated code for ATmega32 + optimization:
So what's that "generate stub"
gs()
thing?Suppose the C/C++ code is taking the address of a function. The only sensible thing to do with it is to call that function, which will be an indirect call in general. Now an indirect call can target 64KiW = 128KiB at most, so that on devices with > 128KiB of code memory, special means must be taken to indirectly call a function beyond the 128KiB boundary. The AVR hardware features an SFR named
EIND
for that purpose, but problems using it are obvious. You'd have to set it prior to a call and then reset it somehow somewhere; all evil things would be necessary.avr-gcc takes a different approach: For each such address taken, the compiler generates
gs(func)
. This will just resolve tofunc
if the address is in the 128KiB range. If not,gs()
resolves to an address in section.trampolines
which is located close to the beginning of flash, i.e. in the lower 128KiB..trampolines
containts a list of directJMP
s to targets beyond the 128KiB range.Take for example the following C code:
The __asm is used to keep the compiler from optimizing the indirect call to a direct one. Then run
For the matter of brevity, we just define symbol
far_func
per command line. The assembly dump inmain.s
shows thatfar_func
might require a linker stub:The final executable listing in
main.lst
then shows that the stub is actually generated and used:main loads Z=0x0072 which is a word address for byte address 0x00e4, i.e. the code is indirectly jumping to 0x00e4, and from there it jumps directly to 0x24680.