I'm reading Bjarne's paper: Multiple Inheritance for C++.
In section 3, page 370, Bjarne said that "The compiler turns a call of a member function into an "ordinary" function call with an "extra" argument; that "extra" argument is a pointer to the object for which the member function is called."
I'm confused by the extra this argument. Please see the following two examples:
Example 1:(page 372)
class A {
int a;
virtual void f(int);
virtual void g(int);
virtual void h(int);
};
class B : A {int b; void g(int); };
class C : B {int c; void h(int); };
A class c object C looks like:
C:
----------- vtbl:
+0: vptr --------------> -----------
+4: a +0: A::f
+8: b +4: B::g
+12: c +8: C::h
----------- -----------
A call to a virtual function is transformed into an indirect call by the compiler. For example,
C* pc;
pc->g(2)
becomes something like:
(*(pc->vptr[1]))(pc, 2)
The Bjarne's paper told me the above conclusion. The passing this
point is C*.
In the following example, Bjarne told another story which totally confused me!
Example 2:(page 373)
Given two classes
class A {...};
class B {...};
class C: A, B {...};
An object of class C can be laid out as a contiguous object like this:
pc--> -----------
A part
B:bf's this--> -----------
B part
-----------
C part
-----------
Calling a member function of B given a C*:
C* pc;
pc->bf(2); //assume that bf is a member of B and that C has no member named bf.
Bjarne wrote: "Naturally, B::bf() expects a B* (to become its this pointer)." The compiler transforms the call into:
bf__F1B((B*)((char*)pc+delta(B)), 2);
Why here we need a B* pointer to be the this
?
If we just pass a *C pointer as the this
, we can still access the members of B correctly I think. For example, to get the member of class B inside B::bf(), we just need to do something like: *(this+offset). this offset can be known by the compiler. Is this Right?
Follow up questions for example 1 and 2:
(1) When it's a linear chain derivation (example 1), why the C object can be expected to be at the same address as the B and in turn A sub-objects? There is no problem to use a C* pointer to access class B's members inside the function B::g in example 1? For example, we want to access the member b, what will happen in runtime? *(pc+8)?
(2) Why can we use the same memory layout (linear chain derivation) for the multiple-inheritance? Assuming in example 2, class A
, B
, C
have exactly the same members as the example 1. A
: int a
and f
; B
: int b
and bf
(or call it g
); C
: int c
and h
. Why not just use the memory layout like:
-----------
+0: a
+4: b
+8: c
-----------
(3) I've wrote some simple code to test the differences between the linear chain derivation and multiple-inheritance.
class A {...};
class B : A {...};
class C: B {...};
C* pc = new C();
B* pb = NULL;
pb = (B*)pc;
A* pa = NULL;
pa = (A*)pc;
cout << pc << pb << pa
It shows that pa
, pb
and pc
have the same address.
class A {...};
class B {...};
class C: A, B {...};
C* pc = new C();
B* pb = NULL;
pb = (B*)pc;
A* pa = NULL;
pa = (A*)pc;
Now, pc
and pa
have the same address, while pb
is some offset to pa
and pc
.
Why the compile make these differences?
Example 3:(page 377)
class A {virtual void f();};
class B {virtual void f(); virtual void g();};
class C: A, B {void f();};
A* pa = new C;
B* pb = new C;
C* pc = new C;
pa->f();
pb->f();
pc->f();
pc->g()
(1) The first question is about pc->g()
which relates to the discussion in example 2. Does the compile do the following transformation:
pc->g() ==> g__F1B((*B)((char*)pc+delta(B)))
Or we have to wait for the runtime to do this?
(2) Bjarne wrote: On entry to C::f
, the this
pointer must point to the beginning of the C
object (and not to the B
part). However, it is not in general known at compile time that the B
pointed to by pb
is part of a C
so the compiler cannot subtract the constant delta(B)
.
Why we cannot know the B
object pointed to by pb
is part of a C
at the compile time? Based on my understanding, B* pb = new C
, pb
points to a created C
object and C
inherits from B
, so a B
pointer pb points to part of C
.
(3) Assume that we do not know B
pointer to by pb
is part of a C
at the compile time. So we have to store the delta(B) for the runtime which is actually stored with the vtbl. So the vtbl entry now looks like:
struct vtbl_entry {
void (*fct)();
int delta;
}
Bjarne wrote:
pb->f() // call of C::f:
register vtbl_entry* vt = &pb->vtbl[index(f)];
(*vt->fct)((B*)((char*)pb+vt->delta)) //vt->delta is a negative number I guess
I'm totally confused here. Why (B*) not a (C*) in (*vt->fct)((B*)((char*)pb+vt->delta))
???? Based on my understanding and Bjarne's introduction at the first sentence at 5.1 section an 377 page, we should pass a C* as this
here!!!!!!
Followed by the above code snippet, Bjarne continued writing: Note that the object pointer may have to be adjusted to po int to the correct sub-object before looking for the member pointing to the vtbl.
Oh, Man!!! I totally have no idea of what Bjarne tried to say? Can you help me explain it?
Consider
B
in isolation: the compiler needs to be able to compile code alaB::bf(B* this)
. It doesn't know what classes might be further derived fromB
(and the introduction of derived code might not happen until long afterB::bf
is compiled). The code forB::bf
won't magically know how to transform a pointer from some other type (e.g.C*
) to aB*
it can use to access data members and RunTime Type Info (RTTI / virtual dispatch table, typeinfo).Instead, the caller has the responsibility of extracting a valid
B*
to theB
sub-object in whatever actual runtime type is involved (e.g.C
). In this case, theC*
holds the address of the start of the overallC
object which likely matches the address of theA
sub-object, and theB
sub-object is some fixed but non-0 offset further into memory: it's that offset (in bytes) that must be added to theC*
in order to get a validB*
with which to callB::bf
- that adjustment is done when the pointer is cast fromC*
type toB*
type.Linear derivation B : A and C : B can be thought of as successively tacking B-specific fileds on the end of A, then C-specific fields on the end of B (which is still B-specific fields tacked on the end of A). So the whole thing looks like:
Then, when we talk about a "B" we're talking about all the embedded A fields as well as the additions, and for "C" there's still all the A and B fields: they all start at the same address.
Regarding
*(pc+8)
- that's right (given the understanding that we're adding 8 bytes to the address, and not the usual C++ behaviour of adding multiples of the pointee's size).No reason - that's exactly what happens... the same memory layout. The difference is that the B subobject doesn't consider
A
to be a part of itself. It's now like this:So when you call
B::bf
it wants to know where theB
object starts - thethis
pointer you provide should be at "+4" in the above list; if you callB::bf
using aC*
then the compiler-generated calling code will need to add that 4 in to form the implicitthis
paramter toB::bf()
.B::bf()
can't simply be told whereA
orC
start at +0:B::bf()
knows nothing about either of those classes and doesn't know how to reachb
or its RTTI if you give it a pointer to anything other than its own +4 address.