Where does the word "dereferencing" come from?

773 views Asked by At

This question will draw information from the draft N1570, so C11 basically.

Colloquially, to dereference a pointer means to apply the unary * operator to a pointer. There is only one place where the word "dereferencing" exists in the draft document (no instance of "dereference"), and it is in a footnote:

102) [...]

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime

As far as I can see, the unary * operator is actually called the "indirection operator", as evidenced by §6.5.3.2:

6.5.3.2 Address and indirection operators

4 The unary * operator denotes indirection. [...]

Simiarily, it is explicitly called the indirection operator in Annex §J.2:

— The value of an object is accessed by an array-subscript [], member-access . or −>, address &, or indirection * operator or a pointer cast in creating an address constant (6.6).

So is it correct to talk about "dereferencing pointers" in C or is this being excessively pedantic? Where does the terminology come from? (I can kinda give a pass on [] being called "deferencing" due to §6.5.2.1)

4

There are 4 answers

4
Peter - Reinstate Monica On

Kernighan and Ritchie, The C Programming Language, 2nd ed., 5.1:

The unary operator * is the indirection or dereferencing operator; [...] ''pointer to void'' is used to hold any type of pointer but cannot be dereferenced itself.

0
John Bode On

I do not know the exact etymology, but one can consider a pointer value (in the generic sense, not the C/C++-specific meaning) as "referencing" another object in memory; that is, p refers to x. When we use p to obtain the value stored in x, we are bypassing that reference, or de-referencing p.

0
Morpfh On

K&R v1

If one look at The C Programming Language, in first edition, (1978), the term “indirection” is used.

Examples

2.12 Precedence and Order of Evaluation

[…]

Chapter 5 discusses * (indirection) and & (address of).

,

7.2 Unary operators

[…]

The unary * operator means indirection: the expression must be a pointer, and the result is an lvalue referring to the object to which the expression points.

It is also listed in INDEX as e.g.

* indirection operator 89, 187

A longer excerpt from section 5.1

5.1 Pointers and Addresses

      Since a pointer contains the address of an object, it is possible to access the object “indirectly” through the pointer. Suppose that x is a variable, say an int, and that px is a pointer, created in some as yet unspecified way. The unary operator c gives the address of an object, so the statement

px = &x;

assigns the address of x to the variable px; px is now said to “point to” x. The & operator can be applied only to variables and array elements; constructs like &(x+1 ) and &3 are illegal. It is also illegal to take the address of a register variable.

    The unary operator * treats its operand as the address off the ultimate target, and accesses that address to fetch the contents. Thus if y is alos an int,

y = *px;

assigns to y the contents of whatever px points to. So the sequence

px = &x;
y = *px;

assigns the same value to y as does

y = x;

K&R v2

In second edition the term dereferencing comes in.

5.1 Pointers and Addresses

The unary operator * is the indirection or dereferencing operator; when applied to a pointer, it accesses the object the pointer points to. Suppose that x and y are integers and ip is a pointer to int. This artificial sequence shows how to declare a pointer and how to use & and *:

[…]


Prior usage

The term is however ("much") older as can be seen in e.g.

A survey of some issues concerning abstract data types, 1974. E.g pp24/25. Here stated in the connection with ALGOL 68, PASCAL, SIMULA 67.

The mechanism by which pointers are transformed into values by a language is known as 'dereferencing', a form of coercion (discussed later). Consider the statement

 p := q;

Depending upon the types of p and q, there are several possible interpretations.

Let '@' be a dereferencing operator (i.e. if p points to j , then @p is the same as j) and '#' be a referencing operation (i.e. if p points to j , then p is the same as #j). The following table indicates the possible actions a language might take to perform the assignment:

                       |                                         
                       |   type of p                             
                       |                                         
                       |   t         ref t     ref ref t . . .   
                       |                                         
        ---------------------------------------------------------
                       |                                         
           t           |  p←q        p←#q       p←##q            
                       |             @p←q       @p←#q            
                       |                        @@p←q            
type                   |                                         
of                     |                                         
q          ref t       |  p←@q       p←q        p←#q             
                       |             @p←@q      @p←q             
                       |                        @@p←@q           
                       |                                         
                       |                                         
           ref ref t   |  p←@@q      p←@q       p←q              
             .         |             @p←@@q     @p←@q            
             .         |                        @@p←@@q          
             .         |                                         
                       |                                         
                       |                                         

[…]


Coining

There are several other examples of its usage. Exactly where and when it was coined I am not able to find though (at least not yet). (The 1974 paper is at least interesting.)


For the fun of it it can also often be useful to look at mailing lists such as net.unix-wizards. An example from Peter Lamb at Melbourne Uni (11/28/83):

Dereferencing NULL pointers is yet another example of idiots who write 'portable' code, assuming however, that THEIR machine is the only one on which it will ever run: the same sorts of people who designed cpio with binary headers. Even on a VAX, dereferencing NULL will get you garbage: sure, *(char *)NULL and *(short *)NULL return you 0, but *(int *)NULL will give you 1024528128 !!!!.

[…]


Ed1. Addition

Not mentioning “dereferencing” but still; An interesting read is Ritchie: The Development of the C Language ✝

Here the term “indirection” is also consistently used – but/and/etc. the connection between the languages are somewhat detailed. The use of the term is thus interesting in view of e.g. papers like the 1974 one mentioned above.

As an example on indirection as concept and the syntax read e.g. pp 12 ev.

    An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing.

[…]

There are two effects occurring. Most important, C has a relatively rich set of ways of describing types (compared, say, with Pascal). Declarations in languages as expressive as C– Algol 68, for example – describe objects equally hard to understand, simply because the objects themselves are complex. A second effect owes to details of the syntax. Declarations in C must be read in an ‘inside-out’ style that many find difficult to grasp [Anderson 80].


In this conjunction it is likely also worth mentioning ANSI C89 and mentions like:

3.1.2.5 Types

A pointer to void may not be dereferenced, although such a pointer may be converted to a normal pointer type which may be dereferenced.

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, or the address of an object that has automatic storage duration when execution of the block in which the object is declared and of all enclosed blocks has terminated.

(I have to re-read some of these documents now.)

3
Serge Ballesta On

Because in the good old days of K&R C, the language only passed parameters by value. So pointers were used to simulate a pass parameters by reference. And people (incorrectly) spoke of taking a reference to a variable for constructing a pointer to a variable.

And the dereferencing of a pointer was the opposite operation.

Now C++ uses true references that are distinct from pointers, but the word dereference is still used (even if it is not really correct).