Integral promotion/conversion: why should I care about the name of the resulting type?

298 views Asked by At

I have been trying to wrap my head around the C99 rules of integral promotion and usual arithmetic conversions of integral types. After burning a few neurons, I came out with a set of rules of my own, which are a lot simpler and yet, I believe, equivalent to the official ones:

Update: for the purpose of this question, I start by defining “physical type” as follows

Definition: two integral types are the same physical type if they have the same size and signedness.

If you think there is something wrong with this definition, then you probably have a good answer for question 2 blow.

Simplified promotion/conversion rules

type ranking: among two integral types T1 and T2, the "best" is:

  • whichever is larger
  • if they have the same size, whichever is unsigned
  • if they have the same size and signedness, either of them, as they are physically the same anyway.

integral promotion: a value of type T should be promoted to

promoted(T) = best(T, int)

usual arithmetic conversions of integral types: before evaluating a binary operator on types T1 and T2, the arguments should be converted to a suitable common type which is:

common(T1, T2) = best(T1, T2, int)

Caveat: although I believe my rules give the correct physical type, they may not provide the correct type name in cases where a single type has different names. For example, on systems where int==long, the official rules say

common(unsigned int, long) = unsigned long

whereas my rules say it's unsigned int (which is physically the same anyway). But this should not be a problem, since the names do not really matter. Or do they?

After this long prelude, here comes the real question, which is two-fold:

Question 1: Are my rules correct?

I read the official ones several times, but I still find them confusing. Thus, I may have misunderstood something. If I am wrong, please provide an example where the official rules and my rules yield different types. I mean: different physical types, not just different types that are physically the same.

A real world example would be preferred. If none can be found, a theoretical example would be OK if the hypothetical C environment is described with enough detail to be convincing (sizes of the relevant types, etc.).

If I am correct here, then the second question becomes relevant.

Question 2: Why should I care about the names of the promoted/converted types?

If I am correct, then the obvious question is "Why did the people in the standards committee write so complicated rules?". They only answer I can think of is that they wanted to specify not only the physical types yielded by the promotion/conversion, but also the proper way to name those types. But then, why did they care? Those types are only used internally by the compiler, it does not matter how we name them as long as we understand what they physically are. Is there something wrong with this reasoning? Can you think of a situation where T1 and T2 are physically the same and yet it matters to know whether things are automatically promoted or converted to T1 rather than T2? Again, a real world example would be preferred, otherwise a theoretical example would do if it is detailed enough.

Rationale

(section added on 2014-11-10, to address some comments)

When searching those topics, I have always seen them discussed in the context of the behavior of arithmetic operators, and more specifically the results returned by those operators. For example, the expression -1L < 1U is problematic, because it is true on systems where longs are really longer than ints, but false otherwise. I believe it is a good thing to understand this sort of problems, yet a bad thing to need a complex ruleset to do so. Hence this effort to build a simpler ruleset that reliably gives the same results.

I fully understand that my rules are useless to anyone who finds the real ones simple enough. I also understand, and respectfully disagree with, those who express the opinion that relying on anything but the official rules is inherently bad. My rules would nevertheless have their usefulness, should they help nobody but me.

About my personal bias: As a physicist, I value simplicity very high. I am used to deal with theories that are not – nor meant to be – the ultimate truth, yet they prove immensely useful, and safe to use as long as you understand their limits of applicability. In any given situation, the best theory is not the most complete: it's the simplest one that is still applicable. For example: I would not use quantum gravity to compute the period of a simple pendulum. My posting this question here is an attempt to get expert opinion on the limits of applicability of the rules above.

So far what I have is:

  • the varargs case (tanks, mafso), which seems to be the only situation in C99 where these rules are, at least in principle, not applicable
  • the _Generic keyword (thanks, Pascal Cuoq) which, being a C11 feature, is slightly out of scope
  • the C++11 auto keyword, which is further out of scope, but interesting nonetheless in that it would bring to the table the (otherwise irrelevant) concerns about aliasing rules.
2

There are 2 answers

1
Pascal Cuoq On

Regarding your first question, I think the answer is “yes”: on all normal or even slightly exotic platforms, your proposed rules yield a type of the same representation as the standard's rules.

Regarding your second question, here are two situations where the “names” of the types matter (and I am only using your terminology for clarity; in the phraseology of the standard, long and int are incompatible types even when they happen to be the same size):

C11's _Generic construct: a long expression does not match the int case even if both are 32-bit representations of integers.

Strict aliasing: the compiler is allowed to generate code that assumes that an int variable does not change when you modify a long lvalue. In particular, statements 1 and 3 in the code below can be optimized to return 1;:

{
  long *p;
  int x;
  …
  x = 1; /* 1 */
  *p = 2;
  return x; /* 3 */
}

Incidentally, the standard does not allow printf("%d", 1L) or printf("%ld", 1) either even when both are the same size, although it will happen to work on most platform (I do not include this as a significant example because it would not be a significant change to the standard to decree it should work when the types have the same representation, unlike the two examples above).

0
supercat On

Because various platforms started to use C as a language before efforts were made to officially standardize it, C compilers for various unusual platforms implemented things in various interesting ways. Rather than compel compilers for such platforms to use new rules which would be incompatible with already-existing code, the authors of the standard attempted to write in lots of wrinkles and nuances to accommodate the behaviors of all those weird architectures.

In practice, on most platforms one could get by with a vastly simplified version of the rules. There is no way to compile-time assert the equivalence of types which have identical ranges (e.g. I know of nothing in the Standard that would require 32-bit unsigned int values to use the same byte ordering as 32-bit unsigned long) and the types of pointers to distinct types must be considered semantically distinct even if the types they point to are physically identical, but as noted elsewhere the only situation where a compiler would be allowed to have rvalues of types unsigned int and unsigned long behave differently when both have the same representation would be when those types are passed as variadic arguments, a situation which is a lot messier than should be, and which led to the downfall of the long double type.

If variadic functions' prototypes could have specified "convert all integer values to X, and all floating-point values to type Y", and "vararg.h" and "stdarg.h" had remained as distinct features (the former being used for functions without such prototypes and the latter for functions with them) that would have avoided the need to worry about whether an int32_t is an int or a long. Too late to avoid the problems resulting from the lack of that feature, though it might nonetheless be worth adding to alleviate problems going forward.

Perhaps someday someone will split off a separate "Normative C" language which uses simplified rules that yield the same behaviors as C on modern platforms, but could only be implemented on obscure platforms if they conform to modern usage (e.g. machines with sign/magnitude hardware would be required to compute i+j as (int)(((i ^ 0x80000000u)+j)^0x80000000u)), in which case aspects of the standard which are only applicable to quirky machines could be omitted.