I have been trying to wrap my head around the C99 rules of integral promotion and usual arithmetic conversions of integral types. After burning a few neurons, I came out with a set of rules of my own, which are a lot simpler and yet, I believe, equivalent to the official ones:
Update: for the purpose of this question, I start by defining “physical type” as follows
Definition: two integral types are the same physical type if they have the same size and signedness.
If you think there is something wrong with this definition, then you probably have a good answer for question 2 blow.
Simplified promotion/conversion rules
type ranking: among two integral types T1 and T2, the "best" is:
- whichever is larger
- if they have the same size, whichever is unsigned
- if they have the same size and signedness, either of them, as they are physically the same anyway.
integral promotion: a value of type T should be promoted to
promoted(T) = best(T, int)
usual arithmetic conversions of integral types: before evaluating a binary operator on types T1 and T2, the arguments should be converted to a suitable common type which is:
common(T1, T2) = best(T1, T2, int)
Caveat: although I believe my rules give the correct physical type, they may not provide the correct type name in cases where a single type has different names. For example, on systems where int==long
, the official rules say
common(unsigned int, long) = unsigned long
whereas my rules say it's unsigned int
(which is physically the same anyway). But this should not be a problem, since the names do not really matter. Or do they?
After this long prelude, here comes the real question, which is two-fold:
Question 1: Are my rules correct?
I read the official ones several times, but I still find them confusing. Thus, I may have misunderstood something. If I am wrong, please provide an example where the official rules and my rules yield different types. I mean: different physical types, not just different types that are physically the same.
A real world example would be preferred. If none can be found, a theoretical example would be OK if the hypothetical C environment is described with enough detail to be convincing (sizes of the relevant types, etc.).
If I am correct here, then the second question becomes relevant.
Question 2: Why should I care about the names of the promoted/converted types?
If I am correct, then the obvious question is "Why did the people in the standards committee write so complicated rules?". They only answer I can think of is that they wanted to specify not only the physical types yielded by the promotion/conversion, but also the proper way to name those types. But then, why did they care? Those types are only used internally by the compiler, it does not matter how we name them as long as we understand what they physically are. Is there something wrong with this reasoning? Can you think of a situation where T1 and T2 are physically the same and yet it matters to know whether things are automatically promoted or converted to T1 rather than T2? Again, a real world example would be preferred, otherwise a theoretical example would do if it is detailed enough.
Rationale
(section added on 2014-11-10, to address some comments)
When searching those topics, I have always seen them discussed in the context of the behavior of arithmetic operators, and more specifically the results returned by those operators. For example, the expression -1L < 1U
is problematic, because it is true on systems where longs are really longer than ints, but false otherwise. I believe it is a good thing to understand this sort of problems, yet a bad thing to need a complex ruleset to do so. Hence this effort to build a simpler ruleset that reliably gives the same results.
I fully understand that my rules are useless to anyone who finds the real ones simple enough. I also understand, and respectfully disagree with, those who express the opinion that relying on anything but the official rules is inherently bad. My rules would nevertheless have their usefulness, should they help nobody but me.
About my personal bias: As a physicist, I value simplicity very high. I am used to deal with theories that are not – nor meant to be – the ultimate truth, yet they prove immensely useful, and safe to use as long as you understand their limits of applicability. In any given situation, the best theory is not the most complete: it's the simplest one that is still applicable. For example: I would not use quantum gravity to compute the period of a simple pendulum. My posting this question here is an attempt to get expert opinion on the limits of applicability of the rules above.
So far what I have is:
- the varargs case (tanks, mafso), which seems to be the only situation in C99 where these rules are, at least in principle, not applicable
- the
_Generic
keyword (thanks, Pascal Cuoq) which, being a C11 feature, is slightly out of scope- the C++11
auto
keyword, which is further out of scope, but interesting nonetheless in that it would bring to the table the (otherwise irrelevant) concerns about aliasing rules.
Regarding your first question, I think the answer is “yes”: on all normal or even slightly exotic platforms, your proposed rules yield a type of the same representation as the standard's rules.
Regarding your second question, here are two situations where the “names” of the types matter (and I am only using your terminology for clarity; in the phraseology of the standard,
long
andint
are incompatible types even when they happen to be the same size):C11's
_Generic
construct: along
expression does not match theint
case even if both are 32-bit representations of integers.Strict aliasing: the compiler is allowed to generate code that assumes that an
int
variable does not change when you modify along
lvalue. In particular, statements 1 and 3 in the code below can be optimized toreturn 1;
:Incidentally, the standard does not allow
printf("%d", 1L)
orprintf("%ld", 1)
either even when both are the same size, although it will happen to work on most platform (I do not include this as a significant example because it would not be a significant change to the standard to decree it should work when the types have the same representation, unlike the two examples above).