Overhead when creating/expanding structure in function call

75 views Asked by At

I have to create two versions of the same function: one with all parameters listed, one with parameters passed as a struct. The number of parameters is arbitrary. I implement the functionality in only one of them, the other is just calling it with expanded parameters or initialized structure.

Is there a difference in the overhead between the two versions below?

Version 1

int functionWithStructure(MyStructure a)
{
    return functionWithMultipleParams(a.Myparam1, a.Myparam2);
}

int functionWithMultipleParams(int param1, int param2)
{
    return /* implement something */;
}

Version 2

int functionWithMultipleParams(int param1, int param2)
{
    return functionWithStructure((MyStructure) {param1, param2});
}

int functionWithStructure(MyStructure a)
{
    return /* implement something */;
}
1

There are 1 answers

0
Jan Schultke On

You can't say that one version is always better than the other. Sometimes it is better to pack parameters into a struct, and sometimes it is worse.

In the x86_64 ABI, there is a difference between passing 2x int and a single struct parameter.

  • in the former case, each int is passed via a separate register edi, esi
  • in the latter case, the struct members are packed into a single register rdi

As a rule of thumb, a struct is better when we perform operations with the whole struct (like passing it to other functions), whereas separate parameters are better when using them in separate ways.

Positive Cost struct

struct point {
    int x;
    int y;
};

int sum(int x, int y) {
    return x + y;
}

int struct_sum(struct point p) {
    return p.x + p.y;
}

Which produces: (GCC 13 -O2)

sum:
        lea     eax, [rdi+rsi]
        ret
struct_sum:
        mov     rax, rdi
        shr     rax, 32
        add     eax, edi
        ret

You can see that sum simply computes the sum of rdi and rsi, whereas struct_sum first has to unpack the operands into separate registers, since they both start in rdi.

Negative Cost struct

struct point {
    int x;
    int y;
};

struct point lowest_bit(int x, int y) {
    return (struct point) {x & 1, y & 1};
}

struct point struct_lowest_bit(struct point p) {
    return (struct point) {p.x & 1, p.y & 1};
}

Which procudes: (clang trunk -O2)

lowest_bit:
        and     edi, 1
        and     esi, 1
        shl     rsi, 32
        lea     rax, [rdi + rsi]
        ret
struct_lowest_bit:
        movabs  rax, 4294967297
        and     rax, rdi
        ret

Note: GCC doesn't find this optimization for some reason.

In this case, it's better for both members to be packed into rdi, because performing & 1 with either one of them can be parallelized this way.


Also see: C++ Weekly - Ep 119 - Negative Cost Structs (C++ video, but equally applies to C due to similar ABI).