I have been followings this article which explains nan boxing https://piotrduperas.com/posts/nan-boxing and tried to implement it in my own "language".
typedef union {
uint64_t as_uint;
double as_double;
} Atom;
#define NANISH 0x7ffc000000000000 /* distinguish "our" NAN with one additional bit */
#define NANISH_MASK 0xffff000000000000 /* [SIGN/PTR_TAG] + 11*[EXP] + 2*[NANISH] + 2*[TAG] */
#define BOOL_MASK 0x7ffe000000000002 /* 2 ms + and 2 ls */
#define NULL_VALUE 0x7ffe000000000000 /* 0b*00 */
#define TRUE_VALUE (BOOL_MASK | 3) /* 0b*11 */
#define FALSE_VALUE (BOOL_MASK | 2) /* 0b*10 */
#define INT_MASK 0x7ffc000000000000 /* use all of mantisa bits for integer */
#define SYM_MASK 0xfffc000000000000 /* pointers have sign bit set */
#define STR_MASK 0xfffe000000000000 /* on x86-64 ptr* is at max 48 bits long */
#define OBJ_MASK 0xfffd000000000000 /* which is small enought to put in mantysa */
#define PTR_MASK 0xf000000000000000
/* predicates */
#define DOUBLP(v) ((v.as_uint & NANISH) != NANISH)
#define NULLP(v) ((v.as_uint == NULL_VALUE)
#define BOOLP(v) ((v.as_uint & BOOL_MASK) == BOOL_MASK)
#define PTRP(v) ((v.as_uint & PTR_MASK) == PTR_MASK)
#define INTP(v) ((v.as_uint & NANISH_MASK) == INT_MASK)
#define STRP(v) ((v.as_uint & NANISH_MASK) == STR_MASK)
#define SYMP(v) ((v.as_uint & NANISH_MASK) == SYM_MASK)
#define OBJP(v) ((v.as_uint & NANISH_MASK) == BOJ_MASK)
/* get value */
#define AS_DOUBL(v) (v.as_double)
#define AS_BOOL(v) ((char)(v.as_uint & 0x1))
#define AS_INT(v) ((int32_t)(v.as_uint))
#define AS_PTR(v) ((char *)((v).as_uint & 0xFFFFFFFFFFFF))
/* add tag mask */
#define TO_VEC(p) ((uint64_t)(p) | VEC_MASK)
#define TO_STR(p) ((uint64_t)(p) | STR_MASK)
#define TO_SYM(p) ((uint64_t)(p) | SYM_MASK)
#define TO_MAP(p) ((uint64_t)(p) | MAP_MASK)
#define TO_SET(p) ((uint64_t)(p) | SET_MASK)
#define TO_INT(i) ((uint64_t)(i) | INT_MASK)
There are some additional objects that I added for my own usage but the idea should be the same.
int main() {
Atom atom;
atom.as_uint = TO_INT(-3);
printf("%d\n", AS_INT(atom));
printf("%d\n", INTP(atom));
printf("%x\n", AS_INT(atom));
}
output:
-3
0
fffffffd
So from my understanding the negative int is stored in U2 system which explains why all the bits got inverted and this representation doesn't match the INT_MASK, I was thinking about changing the INT_MASK to 0xfffff but it conflict then with the original representation of unsigned int (and other masks). Have I misunderstood something from the article? What's the correct value for INT_MASK?
I guess the real question here is:
It doesn't work because of a bug. The author of the article didn't care about negative integers, he only verified that the idea works — that is, there is space to hold 32 bits of data. Negative numbers interfere with the code because they already have some of the tag-bits set to 1. To set tag-bits to a desired value, first clear them to 0, and then do bitwise-OR with the value.
The same is actually true for pointers — they can have binary 1111 in most significant bits, but such pointers are typically reserved to the OS kernel. Just like for integers — the bug exists, but manifests only for less-used values.