NAN Box Negative Int

79 views Asked by At

I have been followings this article which explains nan boxing https://piotrduperas.com/posts/nan-boxing and tried to implement it in my own "language".

typedef union {
    uint64_t as_uint;
    double as_double;
} Atom;

#define NANISH      0x7ffc000000000000 /* distinguish "our" NAN with one additional bit */
#define NANISH_MASK 0xffff000000000000 /* [SIGN/PTR_TAG] + 11*[EXP] + 2*[NANISH] + 2*[TAG] */

#define BOOL_MASK   0x7ffe000000000002  /* 2 ms + and 2 ls */
#define NULL_VALUE  0x7ffe000000000000  /* 0b*00 */
#define TRUE_VALUE  (BOOL_MASK | 3)     /* 0b*11 */
#define FALSE_VALUE (BOOL_MASK | 2)     /* 0b*10 */

#define INT_MASK 0x7ffc000000000000 /* use all of mantisa bits for integer */
#define SYM_MASK 0xfffc000000000000 /* pointers have sign bit set */
#define STR_MASK 0xfffe000000000000 /* on x86-64 ptr* is at max 48 bits long */
#define OBJ_MASK 0xfffd000000000000 /* which is small enought to put in mantysa */
#define PTR_MASK 0xf000000000000000

/* predicates */
#define DOUBLP(v) ((v.as_uint & NANISH) != NANISH)
#define NULLP(v)  ((v.as_uint == NULL_VALUE)
#define BOOLP(v)  ((v.as_uint & BOOL_MASK) == BOOL_MASK)
#define PTRP(v)   ((v.as_uint & PTR_MASK) == PTR_MASK)
#define INTP(v)   ((v.as_uint & NANISH_MASK) == INT_MASK)
#define STRP(v)   ((v.as_uint & NANISH_MASK) == STR_MASK)
#define SYMP(v)   ((v.as_uint & NANISH_MASK) == SYM_MASK)
#define OBJP(v)   ((v.as_uint & NANISH_MASK) == BOJ_MASK)

/* get value */
#define AS_DOUBL(v) (v.as_double)
#define AS_BOOL(v)  ((char)(v.as_uint & 0x1))
#define AS_INT(v)   ((int32_t)(v.as_uint))
#define AS_PTR(v)   ((char *)((v).as_uint & 0xFFFFFFFFFFFF))

/* add tag mask */
#define TO_VEC(p) ((uint64_t)(p) | VEC_MASK)
#define TO_STR(p) ((uint64_t)(p) | STR_MASK)
#define TO_SYM(p) ((uint64_t)(p) | SYM_MASK)
#define TO_MAP(p) ((uint64_t)(p) | MAP_MASK)
#define TO_SET(p) ((uint64_t)(p) | SET_MASK)
#define TO_INT(i) ((uint64_t)(i) | INT_MASK)

There are some additional objects that I added for my own usage but the idea should be the same.

int main() {
    Atom atom;
    atom.as_uint = TO_INT(-3);
    printf("%d\n", AS_INT(atom));
    printf("%d\n", INTP(atom));
    printf("%x\n", AS_INT(atom));

}

output:

-3
0
fffffffd

So from my understanding the negative int is stored in U2 system which explains why all the bits got inverted and this representation doesn't match the INT_MASK, I was thinking about changing the INT_MASK to 0xfffff but it conflict then with the original representation of unsigned int (and other masks). Have I misunderstood something from the article? What's the correct value for INT_MASK?

1

There are 1 answers

0
anatolyg On BEST ANSWER

I guess the real question here is:

The INTP doesnt work properly and return 0 for negative ints

It doesn't work because of a bug. The author of the article didn't care about negative integers, he only verified that the idea works — that is, there is space to hold 32 bits of data. Negative numbers interfere with the code because they already have some of the tag-bits set to 1. To set tag-bits to a desired value, first clear them to 0, and then do bitwise-OR with the value.

The same is actually true for pointers — they can have binary 1111 in most significant bits, but such pointers are typically reserved to the OS kernel. Just like for integers — the bug exists, but manifests only for less-used values.

#define CLEAR_TAG_BITS(x) ((uint64_t)(p) & ~NANISH_MASK)
#define TO_STR(p) (CLEAR_TAG_BITS(p) | STR_MASK)
#define TO_INT(i) (CLEAR_TAG_BITS(i) | INT_MASK)
...