I was looking at functions from GNU C library, glibc-2.18, and this is the code I found for strncmp.c
Looking at it I don't understand why it's written this way. Is this loop unrolling? Why not use 5 or 10 instead of 4? And why did they write like this instead of using a more straightforward approach?
/* Compare no more than N characters of S1 and S2,
returning less than, equal to or greater than zero
if S1 is lexicographically less than, equal to or
greater than S2. */
int
STRNCMP (const char *s1, const char *s2, size_t n)
{
unsigned char c1 = '\0';
unsigned char c2 = '\0';
if (n >= 4)
{
size_t n4 = n >> 2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
} while (--n4 > 0);
n &= 3;
}
while (n > 0)
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
n--;
}
return c1 - c2;
}
Can someone explain the logic behind the code?
Thanks.
Yes it is a partially unrolled loop.
The 4:
1) The general size: There is always a trade-off between more branching (small number) and larger code-size (large number).
2) The specific size (4, not 5): As Joachim Isaksson points out the
>>
before the loop would need to be changed to a division if chosing 5 instead of 4. He says that on some (eg embedded) CPU's that becomes significant. (Often we only think about what the loop costs, not its setup, but he is probably right because most strings tend to be small!)Note that if you wanted to unroll a loop to any number (regardless if power of 2 or not) then you could - instead of doing a division and maintaining the
n4
- you could compute the end-pointer ofs1
and checks1
versus it at the end: