Function that takes a char array and 2 indices; swapping the chars in those indices

402 views Asked by At

This is my function prototype:

char* swap(char* array, int index1, int index2);

This is my assembly code:

segment .text
   global swap

swap:
   mov r14,[rdi+rsi]
   mov r15,[rdi+rdx]
   
   mov [rdi+rsi],r15        ;this line segfaults
   mov [rdi+rdx],r14
   
   mov rax,rdi
   
   ret

The lines mov [rdi+rsi],r15 and mov [rdi+rdx],r14 give me a segfault; I'm not sure where I'm going wrong

The calling function:

 #include <stdio.h>
 #include <stdlib.h>

 extern char* swapLetters(char* str, int indexA, int indexB);

 int main()
 {    
    char* st= "E X A M P L E";
    printf("Before swap: \t%s\n", st);

    char * res = swap(st, 2 ,10);

    printf("After swap: \t%s\n", res);
    return 0;
}

Expected output:

Before swap: E X A M P L E

After swap: E L A M P X E

2

There are 2 answers

2
Michael Petch On BEST ANSWER

The primary problem is that your st variable is defined as a pointer to a string literal.

char* st= "E X A M P L E";

String literals in the C language are considered read-only. To modify such a string is undefined behaviour. What happens is unknown and will be specific to the compiler and the environment it runs in. Your environment is raising an exception when you go to write that memory in the assembly code. On most modern OSes using modern compilers the string literals are placed in memory that isn't writeable so that it will generate an exception, which is what happened in your case.

If you wish to create a character array in writeable memory you can define st this way:

char st[] = "E X A M P L E";

Issues with the Assembly Code

One issue is that your indices to the function swap are int. In 64-bit GCC/CLANG int is 32-bits. If you pass 32-bit signed int to the assembly code the top 32-bits may have garbage in them. Given that your indices are never negative you should use an unsigned type and preferably one that is 64-bit. I would recommend the size_t type instead. size_t will be unsigned and 64-bit in size in x86-64 code, so when passed to the assembly code you don't need to sign/zero extend the index values to 64-bits before using them. I'm recommending changing swap to be:

char* swap(char* array, size_t index1, size_t index2)

If you keep index1 and index2 as signed integers (int) the beginning of your assembly code would have to use MOVSX on both ESI and EDX registers. That code would look like:

swap:
    movsx rsi, esi        ; Sign extend 32-bit index1 parm in ESI to 64-bits
    movsx rdx, edx        ; Sign extend 32-bit index2 parm in EDX to 64-bits
    ; rest of function here

If you were to have used 32-bit unsigned int for index and index2 you would have had to zero extend the 32-bit values with:

    mov esi, esi          ; Zero extend 32-bit index1 parm in ESI to 64-bits
    mov edx, edx          ; Zero extend 32-bit index2 parm in EDX to 64-bits
    ; rest of function here

When the destination of an operation is a 32-bit register in 64-bit mode, the CPU automatically zeros the upper 32-bits of the destination register. Moving a 32-bit register like ESI to itself will clear the upper 32-bits of RSI. This is the same for all the general purpose registers.


RBX, RBP, and R12–R15 are non-volatile registers according to the x86-64 System V ABI. If your function modifies them their contents have to be preserved. You can push them on the stack and pop their original values off the stack when finished. The preferred way is to use one of the volatile registers that don't need to preserved like R8-R11, RAX, RCX, RDX, RDI, RSI.


When you move data to/from memory using a 64-bit register then 64 bits (8 bytes) will be transferred. As an example:

mov r14,[rdi+rsi]

Moves the 8 bytes starting at memory address [rdi+rsi] and moves it to 64-bit register R14. The write later on does something similar but updates 8 bytes in memory rather than one byte. Updating 8 bytes of data could smash the stack if the array of characters were placed on the stack, which happens to be the case in your code and environment.

When using the numbered registers R8 to R15 you can reference the low 8 bits by placing a b suffix on the end of the register name (w is for 16-bit word, d is for 32-bit double word). A complete chart of all the registers names in NASM/YASM syntax for 64-bit mode are:

enter image description here

mov r14,[rdi+rsi] would be written as mov mov r14b,[rdi+rsi] to move a single byte. You would have to make that change to each of the other moves as well.


Assuming you change index1 and index2 to have type size_t (or uin64_t) your assembly code could have been written as :

segment .text
global swap

swap:
   push r14           ; Save non-volatile registers we overwrite
   push r15

   mov r14b,[rdi+rsi] ; Move one byte from [rdi+rsi] to R14B. R14B is lower 8 bits of R14
   mov r15b,[rdi+rdx] ; Move one byte from [rdi+rdx] to R15B. R15B is lower 8 bits of R15
   mov [rdi+rsi],r15b ; Move the byte in R15B to [rdi+rsi]
   mov [rdi+rdx],r14b ; Move the byte in R14B to [rdi+rdx]
   mov rax,rdi

   pop r15            ; Restore non-volatile registers
   pop r14
   ret

If you were to use the other volatile registers rather than the non-volatile ones the code could have been simplified to:

segment .text
global swap

swap:    
   mov al,[rdi+rsi]   ; Move one byte from [rdi+rsi] to AL. AL is lower 8 bits of RAX
   mov cl,[rdi+rdx]   ; Move one byte from [rdi+rdx] to CL. CL is lower 8 bits of RCX
   mov [rdi+rsi],cl   ; Move the byte in CL to [rdi+rsi]
   mov [rdi+rdx],al   ; Move the byte in AL to [rdi+rdx]
   mov rax,rdi

   ret

In this case we use the lower 8 bits of the volatile registers RAX(AL) and RCX(CL) to do the swap. Since we don't have to preserve these registers there is no need to save and restore them.

6
ryyker On

Part of the problem here is that an area of non-writable memory is being used to write to, it will not work. (There are also other correctness problems with the asm, see @MichaelPetch's answer.)

When this is created:

char* st= "E X A M P L E";

Because it creates a string literal, the pointer st refers to a memory location that is not writable.

If created as:

char st[] = "E X A M P L E";

st stored in writable memory and its contents are the characters, instead of just holding a pointer to a read-only string literal.