Java BitSet wrong conversion from/to byte array

1.9k views Asked by At

Working with BitSets I have a failing test:

BitSet bitSet = new BitSet();
bitSet.set(1);
bitSet.set(100);
logger.info("BitSet: " + BitSetHelper.toString(bitSet));
BitSet fromByteArray = BitSetHelper.fromByteArray(bitSet.toByteArray());
logger.info("fromByteArray: " + BitSetHelper.toString(bitSet));
Assert.assertEquals(2, fromByteArray.cardinality());
Assert.assertTrue(fromByteArray.get(1));  <--Assertion fail!!! 
Assert.assertTrue(fromByteArray.get(100)); <--Assertion fail!!!

To be more weird I can see my String representation of both BitSets:

17:34:39.194 [main] INFO  c.i.uniques.helper.BitSetHelperTest - BitSet: 00000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000
17:34:39.220 [main] INFO  c.i.uniques.helper.BitSetHelperTest - fromByteArray: 00000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000

Are equals! What's happening here??

The used methods on this example are:

public static BitSet fromByteArray(byte[] bytes) {
        BitSet bits = new BitSet();
        for (int i = 0; i < bytes.length * 8; i++) {
            if ((bytes[bytes.length - i / 8 - 1] & (1 << (i % 8))) > 0) {
                bits.set(i);
            }
        }
        return bits;
    }

And the method used to get the String representation:

public static String toString(BitSet bitSet) {
        StringBuffer buffer = new StringBuffer();
        for (byte b : bitSet.toByteArray()) {
            buffer.append(String.format("%8s", Integer.toBinaryString(b & 0xFF)).replace(' ', '0'));
        }
        return buffer.toString();
    }

Some one could explain what's going on here?

1

There are 1 answers

0
Sotirios Delimanolis On BEST ANSWER

Note that BitSet has a valueOf(byte[]) that already does this for you.

Inside your fromByteArray method

for (int i = 0; i < bytes.length * 8; i++) {
    if ((bytes[bytes.length - i / 8 - 1] & (1 << (i % 8))) > 0) {
        bits.set(i);
    }
}

you're traversing your byte[] in reverse. On the first iteration,

bytes.length - i / 8 - 1

will evaluate to

8 - (0 / 8) - 1

which is 7, which will access the most significant byte. This is the one containing the 100th bit from your original bitset. Viewed from the reverse side, this is the fourth bit. And if you check the bits set in your generated BitSet, you'll notice the 5th and 98th (there might be an off by one bug here) bits are set.

But the byte[] returned by toByteArray() contains

a little-endian representation of all the bits in this bit set

You need to read the byte[] in the appropriate order

for (int i = 0; i < bytes.length * 8; i++) {
    if ((bytes[i / 8] & (1 << (i % 8))) > 0) {
        bits.set(i);
    }
}