The following code counts the number of bits of value '1' in the array 'buf':
long bit_count(long *arr, int size)
{
int w,b;
unsigned long word=0;
long n = 0;
for(w = 0; w<size ; w++){
word = arr[w];
while(word != 0){
n += word & 1;
word >>= 1;
}
}
return n;
}
int main() {
char buf[] = {"There is always one more!"}; //(1)
int length = strlen(buf)>>3; //(2)
long *a = (long*) (buf + length); //(3)
//char * b = (char *)a; // this is a comment. //(4)
int len = strlen((char*)a); //(5)
long total_count = bit_count((long*) buf, length); //(6)
*a = (*a) & (((long)1<<(len*8))-1); //(7)
char *c = (char*)a; //(8)
total_count += bit_count(a,1); //(9)
printf("bits number %ld\n",total_count); //(10)
return 0;
}
That's my understanding of the code: (please correct me if I'm wrong, I want to understand it at the deepest level)
In line (2) they compute the number of letters in buf which is 25 and it's equivalent to 25 bytes. cause the size of char is 1 byte. then by shifting that number 3 bits right they dividing it by 8, 8 defines 8 bytes = 1 long type. so how many "longs" this buf array contains.
line (3) - 'a' points to a value that contains the casting of the remainder of buf to (long *)
- in line (4) I saw that b* value will be "!" which is exactly the remainder of 1 byte in the array. but I don't really understand the casting to (long *)
the value I'm getting is different if I check
long check = (long) (buf + length);
so what exactly happens when I do - {(long*)buf}
is my first question.
in line (6) I count the bits in the first part of the array (by longs) and in line (9) I count the bits in the remainder.
what I don't understand is - what happens in line (7)?
There is nothing tricky about the code, but it is not very good. Actually, it is very bad, and not because of style but because it corrupts memory.
This line calculates the whole number of
long
objects that fit in the string, assuming thatlong
is eight bytes (/ sizeof(long)
should be used instead of>>3
) and thatint
is big enough to hold the number (size_t
should be used instead ofint
).This seems to intend to set
a
to point to the fragment at the end of the string that is not long enough to contain anotherlong
object. However, becauselength
is a number oflong
objects andbuf
acts as a pointer tochar
, its arithmetic is incorrect. It calculatesbuf
pluslength
char
objects when it intends to calculatebuf
pluslength
long
objects. It ought to use(long *) buf + length
. Also, it assumes that achar *
with arbitrary alignment can be sensibly converted to along *
, which is not guaranteed by the C standard.This calculates the number of bytes in the fragment. It is a waste because the code previously calculated the length of the string. It could have saved that length and taken the quotient and remainder modulo the size of
long
rather than callingstrlen
again.This calculates the number of bits set in the main portion of the buffer, that is, the portion that is treated as some whole number of
long
objects. As with the earlier conversion, it assumes that achar *
can be converted to along *
, and, further, the routine it calls uses that pointer to readlong
objects from an object not defined as an array oflong
, thus violating C aliasing rules.Recall that
a
points to some number of trailing bytes in the string.(long)1<<(len*8)
moves a bit to just above that number of bytes, assuming bytes are eight bits (CHAR_BIT
should be used instead of8
). (For example, if there are two bytes, this creates the value 1000016.) Then subtracting one creates a bit mask for the bytes (FFFF16 in the example). Then*a
attempts to read along
from end of the string, after which applying the mask with&
reduces the value read to just the bytes in the string. This not only violates C’s aliasing rules again but also reads outside the memory ofbuf
, which has undefined behavior. Finally, the statement writes the updated value to*a
, which is even worse—if you got away with reading outside ofbuf
, now the code is writing outside ofbuf
, causing corruption somewhere unknown.This line is useless since
c
is never used.This counts the number of bits set in the
long
that was updated in (7).In the
bit_count
routine, we find the code is counting bits one-by-one. This is absurd. Violating the C rules about aliasing might have made sense to gain some performance benefit of processing wholelong
objects instead of individualchar
objects if the specific C implementation being used allowed the aliasing with unaligned addresses. But this code just does bit-by-bit processing; it does not gain any advantage from some bit-count (also known as population-count) instruction on the processor.Furthermore, the
bit_count
routine recognizes that it ought to useunsigned long
for the bit count, as it definesword
asunsigned long
. But it fails to recognize it ought to have usedunsigned long
throughout, as thelong
could have a trap representation that would cause the code to fail.It also defines
b
without ever using it.