Can someone explain why this function is the way it is?

109 views Asked by At

I was looking at functions from GNU C library, glibc-2.18, and this is the code I found for strncmp.c

Looking at it I don't understand why it's written this way. Is this loop unrolling? Why not use 5 or 10 instead of 4? And why did they write like this instead of using a more straightforward approach?

/* Compare no more than N characters of S1 and S2,
   returning less than, equal to or greater than zero
   if S1 is lexicographically less than, equal to or
   greater than S2.  */
int
STRNCMP (const char *s1, const char *s2, size_t n)
{
  unsigned char c1 = '\0';
  unsigned char c2 = '\0';

  if (n >= 4)
    {
      size_t n4 = n >> 2;
      do
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
    } while (--n4 > 0);
      n &= 3;
    }

  while (n > 0)
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
    return c1 - c2;
      n--;
    }

  return c1 - c2;
}

Can someone explain the logic behind the code?

Thanks.

3

There are 3 answers

4
Bernd Elkemann On

Yes it is a partially unrolled loop.

The 4:

1) The general size: There is always a trade-off between more branching (small number) and larger code-size (large number).

2) The specific size (4, not 5): As Joachim Isaksson points out the >> before the loop would need to be changed to a division if chosing 5 instead of 4. He says that on some (eg embedded) CPU's that becomes significant. (Often we only think about what the loop costs, not its setup, but he is probably right because most strings tend to be small!)

Note that if you wanted to unroll a loop to any number (regardless if power of 2 or not) then you could - instead of doing a division and maintaining the n4 - you could compute the end-pointer of s1 and check s1 versus it at the end:

char* s1end = s1 + n;
do {...} while(s1 < s1end);
0
randomusername On

This way there are fewer branches (if statements) and it allows for the compiler to optimize for pipe-lining.

0
luk32 On

Yes this is loop unrolling.

Why 4 and not 5 or 10. Probably because this is intended to be cache friendly and make sure the calls are aligned to memory pages/rows.

Question could be why not 8, 16 or 32. Maybe code size, maybe it's easier to generate optimized assembly for some older processors when it's 4. Or the actual code would be too large to fit typical instruction cache.