The following is the Microsoft CRT implementation of memcmp:
int memcmp(const void* buf1,
const void* buf2,
size_t count)
{
if(!count)
return(0);
while(--count && *(char*)buf1 == *(char*)buf2 ) {
buf1 = (char*)buf1 + 1;
buf2 = (char*)buf2 + 1;
}
return(*((unsigned char*)buf1) - *((unsigned char*)buf2));
}
It basically performs a byte by byte comparision.
My question is in two parts:
- Is there any reason to not alter this to an int by int comparison until
count < sizeof(int)
, then do a byte by byte comparision for what remains? - If I were to do 1, are there any potential/obvious problems?
Notes: I'm not using the CRT at all, so I have to implement this function anyway. I'm just looking for advice on how to implement it correctly.
Another idea is to optimize for the processor cache and fetching. Processors like to fetch in large chunks rather than individual bytes at random times. Although the internal workings may already account for this, it would be a good exercise anyway. Always profile to determine the most efficient solution.
Psuedo code:
For more information, search the web for "Data Driven Design" and "data oriented programming".
Some processors, such as the ARM family, allow for conditional execution of instructions (in 32-bit, non-thumb) mode. The processor fetches the instructions but will only execute them if the conditions are satisfied. In this case, try rephrasing the comparison in terms of boolean assignments. This may also reduce the number of branches taken, which improves performance.
See also loop unrolling.
See also assembly language.
You can gain a lot of performance by tailoring the algorithm to a specific processor, but loose in the portability area.