Purpose of integer literal suffix in left shift

571 views Asked by At

In C, many operations employ bit shifting, in which an integer literal is often used. For example, consider the following code snippet:

#define test_bit(n, flag) (1UL << (n) & (flag))

As far as I know, the integer literal suffix UL is supposed to suppress unwanted behavior in a shift, e.g. sign-extending a signed integer may result in multiple bits being set. However, if the case is doing a left shift only, as shown above, do we still need the integer literal suffix?

As a left shift won't cause unintended behavior, I can't figure what its purpose is. Code like the above often appears in projects such as Linux kernel, which makes me think that there must be a need for it. Does anyone know the purpose of the UL suffix in this case?

3

There are 3 answers

2
ikegami On BEST ANSWER

Sign extending only applies to right shifts, so that's not applicable.


<< is defined as follows:

C23 §6.5.7 ¶4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, wrapped around. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

There are two ways in which left-shifting values can result in undefined behaviour based on E1:[1]

  • E1 has a signed type and negative value.
  • E1 has a signed type and nonnegative value, and E1 × 2E2 is unrepresentable.

In our case, E1 is a positive value, so the former isn't applicable. However, the latter could apply depending on the type of E1.

Let's look at what results we get for different types on two systems.

  • System "L" has a 32-bit int and a 64-bit long (e.g. Linux on x86-64).
  • System "W" has a 32-bit int and a 32-bit long (e.g. Windows on x86-64).
Implementation Usage Result on "L" Result on "W"
1 << (n) test_bit( 31, flag ) Undefined behaviour Undefined behaviour
1L << (n) test_bit( 31, flag ) ok (since long is 64 bits) Undefined behaviour
1U << (n) test_bit( 31, flag ) ok ok
1U << (n) test_bit( 63, flag ) Incorrect result
1L << (n) test_bit( 63, flag ) Undefined behaviour
1UL << (n) test_bit( 63, flag ) ok

So, assuming you want to be able to test any of the bits of flag

  • 1U is needed if flag can be a signed int or an unsigned int or shorter.
  • 1UL is needed if flag can also be a signed long or an unsigned long.

  1. Undefined behaviour can also result based on the value of E2. This happens if E2 is negative, equal to the width of E1, or greater than the width of E1. This puts a constraint on the valid values for test_bit's first argument.
5
Nate Eldredge On

If your int is 32 bits, and you have

#define test_bit(n, flag) ((1 << (n)) & (flag))

then test_bit(31, flag) has undefined behavior because of the signed integer overflow. Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

Making the 1 an unsigned type avoids the UB. Making it unsigned long (which is what 1UL achieves) allows the same macro to be used for masks that are wider. For instance, on a system where int is 32 bits and long is 64, by using 1UL you can safely use bits up through test_bit(63, flag).

3
John Bollinger On

the integer literal "UL"

... is not called an "integer literal". That term is not used in C language specification at all, and it risks confusion with "integer constant", of which the whole 1UL would be an example. Informally, you might call a (whole) integer constant an "integer literal", but not the "UL" part alone. The "UL" by itself can be called a "suffix", and if we draw names from the formal grammar in the language spec then we could call it an "integer suffix" more specifically.

IMHO, the integer literal "UL" is suppose to suppress unwanted shift, e.g. sign-extending a signed integer may result in multiple bits being set.

The primary purpose of expressing an integer constant with a suffix is to control its data type. 1UL has type unsigned long int, whereas unsuffixed 1 has type int.

However, if the case is, doing logical left shift only, as shown above, do we still need the integer literal?

In a bitwise shift operation, the type of the left operand is the type of the result, and it can affect the value of the result. It can also affect whether evaluating the expression even has defined behavior at all.

In the case of the particular macro you present, whether it is important to use 1UL instead of just 1 depends on how the macro is used. But there are wholly reasonable uses for which using 1UL produces the desired effect but just 1 does not.

As a left shift won't cause un-intended behavior,

Did you mean undefined behavior? A left shift of a value of a signed type (even a positive one) absolutely can have undefined behavior as far as C is concerned. And in cases where the behavior is undefined, you are totally unjustified in assuming the result will be what you expect or intend.

I can't figure what is its purpose then.

Even if we ignore questions around signedness, if int and long int are different size, as often they are, then 1U << n may yield a different, well defined result than 1UL does. In such a case, it is unreasonable to expect 1 << n to evaluate to the same result that 1UL << n does, undefined behavior notwithstanding.