How do labels execute in Assembly?

2.8k views Asked by At

I am pretty new to assembly, so please go easy on me, but I am confused about labels. I just don't get how they work. Are they like functions, or does one end and the other begins? What I mean by that is say there are 2 labels

Label1: 
    ; random code
Label2:
    ; some more random code

Would Label1 execute then end, or does it then move to Label2 and execute that as well? Sorry if this is an extremely basic question, but io just don't really undertsand it.

2

There are 2 answers

0
old_timer On

Labels are addresses (and note that function names in high level languages are labels basically).

loop:
  nop
  nop
  jump loop

this allows you to write code that has two features. One is, what is the address of loop, is it 0x12345 is it 0x8000, etc. Second is if my pseudo-code instruction set here does the jump instruction operate on fixed addresses so it would need to know 0x12345 or 0x8000 to fully encode the instruction or is it relative where without labels the programmer would have to count instruction bytes. If nops were one byte each and the jump itself is relative and let's say three bytes and the pc offset is relative to the end of the instruction then you would have to say jump -5 in your code then if you added or removed instructions in the loop you would have to re-calculate the jump offsets with no mistakes.

Another case, that is again an address thing, is for external references:

fun:
  call more_fun
  ...

If you are in modern times where you can have more than one file of source code become objects and then get linked. (some tools still support this, was more common when I started to be able to have a single asm file with .org etc in there and no external references only forward vs backward references) Then you can have external references, whether the instruction set uses an absolute or relative addressing in order to complete the creation of the machine code for this call that address needs to be resolved. So using a label makes this much easier on the programmer and as with local references the tools can compute all of these offsets or addresses for you.

Edit

.thumb

    nop
    nop
    nop
loop:
    nop
    nop
    nop
    b loop
    
Disassembly of section .text:

00000000 <loop-0x6>:
   0:   46c0        nop         ; (mov r8, r8)
   2:   46c0        nop         ; (mov r8, r8)
   4:   46c0        nop         ; (mov r8, r8)

00000006 <loop>:
   6:   46c0        nop         ; (mov r8, r8)
   8:   46c0        nop         ; (mov r8, r8)
   a:   46c0        nop         ; (mov r8, r8)
   c:   e7fb        b.n 6 <loop>

so the second column of numbers (both columns are in hex) is the machine code. You can see that there is no machine code for the label, it is just a label a marking on a box it describes the thing on the box is not the contents of it's own box.

And I will just tell you the branch to loop encoding the lower bits are an offset and there are a lot of ones in that value because it is a negative number (pc-relative branch backward).

The above addresses are unlinked.

One file:

.thumb
    nop
    nop
loop:
    bl more_fun
    b loop

Another:

.thumb
.thumb_func
.globl more_fun
more_fun:
    bx lr

unlinked we see the disassembly of the call (bl branch link in this instruction set) has basically a placeholder for an offset.

Disassembly of section .text:

00000000 <loop-0x4>:
   0:   46c0        nop         ; (mov r8, r8)
   2:   46c0        nop         ; (mov r8, r8)

00000004 <loop>:
   4:   f7ff fffe   bl  0 <more_fun>
   8:   e7fc        b.n 4 <loop>

once linked

Disassembly of section .text:

00001000 <loop-0x4>:
    1000:   46c0        nop         ; (mov r8, r8)
    1002:   46c0        nop         ; (mov r8, r8)

00001004 <loop>:
    1004:   f000 f801   bl  100a <more_fun>
    1008:   e7fc        b.n 1004 <loop>

0000100a <more_fun>:
    100a:   4770        bx  lr

the bl instruction has changed to have the pc relative offset.

All of this magic is done by the tools and we only have to keep track of labels. This is the same for the instructions themselves we can use human readable/writeable mnemnonics:

bx lr

instead of machine code:

0x4770

In our programs.

0
Peter Cordes On

Labels don't execute, they're metadata that lets you reference that location from somewhere else. e.g. as a branch target, or to load data from there. They're not part of the machine code, and CPUs don't know about them. Think of them as zero-width markers that let you refer to this byte-position from elsewhere.

They do not have any implicit association with the following bytes, or the interval until the next label. You can even have multiple labels in the same place if you want. (Compilers may do that when auto-generating labels, like in a simple function whose body reduces to just a loop, they'll

Execution will simply fall through a label, exactly like a C goto label inside a C function. Or a case 'x': label inside a switch - remember you need a break to not fall through to the next case.

Functions (and scopes) are high-level concepts. Labels (to define symbols) are one of the tools that asm provides to make it possible to implement functions. (Along with instructions like call and ret to jump and save a return address.) As opposed to a big pile of spaghetti code that just jumps around between arbitrary points, like gotos within one huge function - apparently this was typical in the bad old days before proponents of "structured programming" pointed out how much easier it was to engineer larger programs in terms of functions and if/else blocks, restricting the way you use jumps in asm to line up with those concepts. "Function" isn't a first-class concept in raw machine code, or in most assembly languages. (MASM has a proc keyword you can use instead of just a label.)


For data: In C, an array like static char foo[] = {1,2,3}; will compile to something like this:

 foo:
    .byte 0, 1, 2

Notice that the label address has the same address as the first element, and foo+1 is the address of the 2nd byte.

But equivalently,

foo: .byte 0, 1
foo2: .byte 2
      .byte 3

Doing it this way puts a label on &foo[2] so you can reference that directly if you want to, but you could also consider the whole 0..3 range of bytes as being one array. This can be more useful for strings where you might want to address the suffix of one string separately. e.g. instead of a separate .asciz "\n" you might just stick a label on the newline at the end of another string.


Related Q&As: examples of fall-through of labels