Something like the following:
struct Vec2
{
int x, y;
};
struct Bounds
{
int left, top, right, bottom;
};
int main()
{
Vec2 topLeft = {5, 5};
Vec2 bottomRight = { 10, 10 };
Bounds bounds;
//___Here is copy operation
//___Note they're not in contiguous order, harder for the compiler?
bounds.left = topLeft.x;
bounds.bottom = bottomRight.y;
bounds.top = topLeft.y;
bounds.right = bottomRight.x;
}
Those four assignments could be done like so:
memcpy(&bounds, &topLeft, sizeof(Vec2));
memcpy(&bounds.right, &bottomRight, sizeof(Vec2));
I'm wondering two things:
- Are compilers usually able to optimise in this way?
- Are four int copies the same as two int pair copies, as copying memory is O(n)?
I got the following disassembly results for the four copies:
bounds.left = topLeft.x;
00007FF642291034 mov dword ptr [bounds],5
bounds.bottom = bottomRight.y;
00007FF64229103C mov dword ptr [rsp+2Ch],0Ah
bounds.top = topLeft.y;
00007FF642291044 mov dword ptr [rsp+24h],5
bounds.right = bottomRight.x;
00007FF64229104C mov dword ptr [rsp+28h],0Ah
And confusingly, the two memcpys are different instructions for the first one and second one, I don't understand this:
memcpy(&bounds, &topLeft, sizeof(Vec2));
00007FF64229105E mov rbx,qword ptr [topLeft] // This is only one instruction
memcpy(&bounds.right, &bottomRight, sizeof(Vec2));
00007FF642291063 mov rdi,qword ptr [bottomRight] // Compared to 6?
00007FF642291068 mov qword ptr [bounds],rbx
00007FF64229106D mov qword ptr [rsp+28h],rdi
00007FF642291072 jmp main+7Eh (07FF64229107Eh)
00007FF642291074 mov rdi,qword ptr [rsp+28h]
00007FF642291079 mov rbx,qword ptr [bounds]
Any modern compiler supporting threads has to consider instruction dependencies and reordering. With that technology in place, it will quickly discover that there are no dependencies in the set of instructions you have, which means they can be reordered in linear memory order, and then combined.
Not that it likely matters; the CPU cache will just load the whole cache line on first access, and flush the whole cache line at some later point. It's these operations which take time, not the CPU operations themselves.