Inheritance and method overriding in C - how to make it defined behaviour

222 views Asked by At

I have my custom little OOP-esque inheritance functionality, something like this:

// base class
struct BaseTag;

typedef struct {
    int (*DoAwesomeStuff)(struct BaseTag* pInstance);
} S_BaseVtable;

typedef struct BaseTag{
    S_BaseVtable* pVtable;
    int AwesomeValue;
} S_Base;

// child class
struct ChildTag;

typedef struct {
    S_BaseVtable Base;
    void (*SomeOtherStuff)(struct ChildTag* pInstance);
} S_ChildVTable;

typedef struct ChildTag {
    S_Base BaseClass;
    int EvenAwesomerValue;
} S_Child;

Now let's say I have a Child class constructor where the Base class vtable is overridden with the child vtable:

void Child_ctor(S_Child* pInstance) {
    Base_ctor((S_Base*) pInstance);
    pInstance.BaseClass.pVtable = (S_BaseVtable*) &MyChildVTable;
}

Also in this child vtable, I want to override the DoAwesomeStuff() method from base class with a method like this:

int Child_DoAwesomeStuff(struct BaseTag* pInstance) {
    S_Child* pChild = (S_Child*) pInstance; // undefined behaviour
    return pChild->EvenAwesomerValue;
}

I have seen this pattern in variations occasionally, but I see some problems with it. My main questions are

  • How can I access the S_ChildVtable from a child instance that is hidden behind a S_BaseVtable pointer?
  • How can I properly cast the pInstance argument of Child_DoAwesomeStuff() to an S_Child* type?

As far as I understand the C standard, casting from S_Child* to S_Base* (and the corresponding vtable types) is okay as the first member of S_Child is an S_Base instance. But vice versa it is undefined behaviour.

Would something like S_Child* pChild = (S_Child*)((char*) pInstance) be legal and defined?


Edit

My question was a bit unclear and misleading. It's not the cast itself that I think is UB, but dereferencing pChild after it was cast from pInstance.

I browsed through the C11 standard again to find some reference but not it's not so clear to me anymore.

6.3.2.3/7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned (68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

So I guess my question really is - What mechanics need to be in place so that it is ensured that S_Base and S_Child are correctly aligned?

3

There are 3 answers

11
Gabriel Staples On

Undefined behavior: memory usage in C, when it is or is not undefined behavior

1st, some background study: let us understand what is and is not undefined behavior when managing memory in C

As is frequently the case in programming, there are a lot of nuances to discuss. So, let me try to address the edits to your question.

My question was a bit unclear and misleading. It's not the cast itself that I think is UB, but dereferencing pChild after it was cast from pInstance.

In C, casting is undefined behavior for a variety of reasons, but not in the casts you are doing in your question. See the comments below this answer for more insight.

Dereferencing is undefined behavior for a few reasons as well, including these two main ones I will talk about which may be most relevant to your question:

  1. you are dereferencing out-of-bounds memory that is unowned by your program/object, or
  2. you are reading uninitialized memory/values (even if your program does properly own that memory)

Consider the following examples:

  1. Example 1: pointing to memory our program does not own is undefined behavior

    1. Undefined behavior: on any machine

      // arbitrarily point to some address in memory, and assume it's an 8-bit
      // unsigned integer
      uint8_t * p = (uint8_t*)0x1234; // undefined behavior if this address is
                                      // outside all memory addresses
                                      // currently owned by your program
      
      // now dereference this pointer and assign a value to this integer
      *p = 1; // undefined behavior (whether reading OR writing here) because
              // you are accessing memory that your program does not own nor
              // control!
      
    2. NOT undefined behavior: on an ATmega328 8-bit microcontroller (ex: Arduino Uno)

      uint8_t * p = (uint8_t*)0x23; // not undefined behavior, because this 
                                    // address belongs to a well-defined
                                    // hardware register used by this mcu
      
      // now dereference this pointer and assign a value to this integer
      *p = 1; // NOT undefined behavior because the ATmega328 datasheet 
              // (https://ww1.microchip.com/downloads/aemDocuments/documents/MCU08/ProductDocuments/DataSheets/40001906C.pdf)
              // indicates on p445 that address 0x23 is the PINB hardware
              // register, which allows you to read from or toggle IO pins.
              // Writing a 1 here actually toggles the output of GPIO pin B0. 
      

      Note that the proper way to do the above is this (example file: "/Arduino 1.8.13/hardware/tools/avr/avr/include/avr/iom328pb.h"):

      #define PINB    (*(volatile uint8_t *)(0x23))
      #define PINB7   7
      #define PINB6   6
      #define PINB5   5
      #define PINB4   4
      #define PINB3   3
      #define PINB2   2
      #define PINB1   1
      #define PINB0   0
      
      PINB = 1 << PINB0;
      
  2. Example 2: using memory we don't own, and/or that is uninitialized, is undefined behavior

    1. Undefined behavior: on any machine
      uint32_t * pu32 = (uint32_t*)0x1234; // ok
      uint32_t u1;
      
      u1 = *pu32; // Undefined behavior! Reading memory our program doesn't 
                  // own
      
      *pu32 = 0;  // Undefined behavior! Writing to memory our program doesn't
                  // own
      
      pu32 = &u1; // ok: pointing our pointer to valid memory our program owns
      
      uint32_t u2;
      u2 = u1;    // Undefined behavior! Reading an undefined value from u1.
      *pu32 = u1; // Undefined behavior! Reading an undefined value from u1.
      
      u1 = *pu32; // Undefined behavior! Our program DOES own this memory 
                  // that pu32 points to now, but the value stored there is
                  // undefined/uninitialized.
      
    2. NOT undefined behavior: on any machine
      uint32_t * pu32;
      uint32_t u1;
      pu32 = &u1; // ok: our ptr now points to valid memory
      *pu32 = 7;  // set u1 to 7
      u1 = 8;     // set u1 to 8
      uint32_t u2 = u1;     // set u2 to 8
      uint32_t u3 = *pu32;  // set u3 to 8 (since pu32 points to u1)
      
  3. Example 3: using a memory pool our program does own is not undefined behavior

    1. NOT undefined behavior: on any machine
      uint8_t memory_pool_of_bytes[4]; // ok
      // ok: pointing our uint32_t* pointer to use this memory pool of bytes
      uint32_t * pu32 = (uint32_t *)memory_pool_of_bytes; 
      
      *pu32 = 1000000; // ok; our program owns this memory!
      

Now, with the above knowledge learned, let's look back at your question:

My question was a bit unclear and misleading. It's not the cast itself that I think is UB, but dereferencing pChild after it was cast from pInstance.

The answer to this is: "it depends on whether or not you dereferencing valid (owned, and already-initialized if reading it) vs invalid (not owned, or not initialized) memory.

Consider the following:

// create a base
S_Base base;
Child_DoAwesomeStuff(&base); // Undefined behavior inside this func??? Maybe!

// vs:

// create a child
S_Child child; 
Child_DoAwesomeStuff((S_Base*)&child); // Undefined behavior inside this func??? 
                                       // No! This is fine. 

Let's go deeper to explore the 1st case where there is maybe undefined behavior.

S_Base base;            // ok: statically allocate a chunk of memory large 
                        // enough to hold an `S_Base` type.
S_Base* pBase = &base;  // ok: create a pointer to point to our memory above.
S_Child* pChild = (S_Child*)pBase; // **technically** ok, but a very bad idea 
                                   // because it **could lead to** undefined
                                   // behavior later! `pChild` does NOT point
                                   // to a "valid complete object of the target
                                   // type".
pChild->BaseClass.AwesomeValue = 7; // fine, because this is owned memory!
pChild->EvenAwesomerValue; // UNDEFINED BEHAVIOR! This is NOT owned memory! We
                           // just read outside the memory we statically 
                           // allocated in the first line above!

So, is the (S_Child*)pBase; cast undefined behavior? No! But it is dangerous! Is accessing owned memory within pChild undefined behavior? No! We own it. Our program allocated it. But, is accessing memory outside what our program owns (ex: pChild->EvenAwesomerValue) undefined behavior? Yes! We do not own that memory. It is similar to the many undefined cases I went through above.

C++ has solved the dangerous behavior above by having the dynamic_cast<>() conversion which will allow casting a parent type to a child type. It will then dynamically, at run-time, check to see if the resulting object "is a valid complete object of the target type". If it discovers it is not, it sets the resulting pointer to nullptr to notify you of that. In C, you have to just track these things manually yourself.

"What mechanics need to be in place so that it is ensured that S_Base (parent) and S_Child are correctly aligned?"

This one's easy: just put your S_Base struct at the very beginning of your S_Child struct and they are automatically aligned. Now, a pointer to your S_Child object points to the exact same address as a pointer to the S_Base object within it, since the child contains the base object.

They are automatically aligned so long as you don't use any alignment or padding keywords or compiler extensions to change things. Padding is automatically added by the compiler after struct members, as needed, never before the first member. See more on that here: Structure padding and packing.

Simple example (without any virtual table polymorphism function stuff):

typedef struct parent_s
{
    int i;
    float f;
} parent_t;

typedef struct child_s 
{
    parent_t parent; // parent (base) member MUST be 1st within the child
                     // to be properly aligned with the start of the child!
    int i;
    float f;
} child_t;

child_t child;
parent_t parent;

parent_t* p_parent = &child; // ok; p_parent IS a "valid complete object of the
                             // target [parent] type", since the child's
                             // allocated memory blob does indeed encompass the
                             // parent's
child_t* p_child = &child; // ok; p_child is a "valid complete object of 
                           // the target [child] type"
child_t* p_child = (child_t*)&parent; // DANGEROUS! Technically this cast is 
                                      // *not* undefined behavior *yet*, but it
                                      // could lead to it if you try to access
                                      // child members outside the memory blob 
                                      // created for the parent. 
                                      // 
                                      // p_child is NOT a "valid complete object
                                      // of the target [child] type".

For the last (dangerous) cast above, C++ would allow you to have a dynamic cast which would fail at runtime if and only if you called it with C++ dynamic_cast syntax, and checked for errors, like this:

child_t* p_child = dynamic_cast<child_t*>(&parent);
if (p_child == nullptr)
{
    printf("Error: dynamic cast failed. p_child is NOT a \"valid complete "
           "object of the target [child_t] type.\"");
    // do error handling here
}

Key takeaway:

Once you first get alignment by putting the parent right at the beginning inside the child, basically just think of each object as a memory blob, or memory pool. If the memory pool you have (are pointing to) is larger than the expected size based on the pointer type pointing to it, you're fine! Your program owns that memory. But, if the memory pool you have (are pointing to) is smaller than the expected size based on the pointer type pointint to it, you're not fine! Accessing memory outside your allocated memory blob is undefined behavior.

In the case of OOP and parent/child relationships, the child object must always be larger than the parent object because it contains a parent object within it. So, casting a child to a parent type is fine, since the child type is larger than the parent type and the child type holds the parent type first in its memory, but casting a parent type to a child type is not fine unless the memory blob being pointed to was created initially as a child of that child type.

Now, let's look at this in C++ and compare to your C example.

Inheritance and parent <--> child type casting in C++ and C

So long as the pInstance pointer being passed to Child_DoAwesomeStuff() was actually constructed initially as an S_Child object, then casting the pointer back to an S_Child pointer (S_Child*) is not undefined behavior. It would only be undefined behavior if you attempt to cast a pointer to an object that was constructed originally as a struct BaseTag (aka S_Base) type to a child pointer type.

This is how C++ works too, with dynamic_cast<>() (which I mention in my answer here).

Example C++ code from https://cplusplus.com/doc/tutorial/typecasting/ under the "dynamic_cast" section is below.

In the C++ code below, notice that both pba and pbb are pointers to the base type (Base *), yet, pba is actually constructed as a Derived (child) type via new Derived, whereas pbb is actually constructed as a Base (base, or parent) type via new Base.

Therefore, casting pba to Derived* is perfectly valid, since it truly is that type, but casting pbb to Derived* is not valid, since it is not truly that type. C++'s dynamic_cast<Derived*>(pbb) call catches this undefined behavior at run-time, detecting that the returned type is not a fully-formed Derived type, and returns a nullptr, which is equal to 0, so you get the print that says Null pointer on second type-cast.

Here is that C++ code:

// dynamic_cast
#include <iostream>
#include <exception>
using namespace std;

class Base { virtual void dummy() {} };
class Derived: public Base { int a; };

int main () {
  try {
    Base * pba = new Derived;
    Base * pbb = new Base;
    Derived * pd;

    pd = dynamic_cast<Derived*>(pba);
    if (pd==0) cout << "Null pointer on first type-cast.\n";

    pd = dynamic_cast<Derived*>(pbb);
    if (pd==0) cout << "Null pointer on second type-cast.\n";

  } catch (exception& e) {cout << "Exception: " << e.what();}
  return 0;
}

Output:

Null pointer on second type-cast.

Similarly, your C code has the same behavior.

Doing this is valid:

// create a child
S_Child child; 
// treat it like a base (ok since `S_Base` is at the beginning of it--since the
// child contains a base object)
S_Base* pBase = (S_Base*)&child;
// Now obtain the child back from the base pointer
S_Child* pChild = (S_Child*)pBase; // ok, since pBase really points to a 
                                   // child object

But doing this is not ok:

// create a base
S_Base base;
// Get a pointer to it
S_Base* pBase = &base;
// Now try to magically obtain a child from a base object
S_Child* pChild = (S_Child*)pBase; // NOT ok! **May lead to** undefined behavior 
                                   // when dereferencing, since pBase really
                                   // points to a base object!

So, for your specific function:

// Note: I replaced `struct BaseTag*` with `S_Base*` for readability
int Child_DoAwesomeStuff(S_Base* pInstance) {
    S_Child* pChild = (S_Child*) pInstance;
    return pChild->EvenAwesomerValue;
}

This is fine:

// create a child
S_Child child; 

Child_DoAwesomeStuff((S_Base*)&child); // ok

But this is not!:

// create a base
S_Base base;

Child_DoAwesomeStuff(&base); // NOT ok! **May lead to** undefined behavior 
                             // inside this func!

My thoughts on enforcing OoP (Object Oriented Programming) and inheritance in C

Just a warning though: passing around pointers and storing pointers to vtables and functions and things inside C structs will make tracing your code and trying to understand it very difficult! No indexer that I am aware of (Eclipse included, and Eclipse has the best indexer I've ever seen), can trace back to which function or type was assigned to a pointer in your code. Unless you're doing this stuff just for a learning exercise, or to bootstrap your own C++ language from scratch in C (again, for learning), I recommend against these patterns.

If you want "object-oriented" C with inheritance and all, don't do it. If you want "object-based" C, via opaque pointers/structs for basic private-member encapsulation and data hiding, that's just fine! Here's how I prefer to do that: Option 1.5 ("Object-based" C Architecture).

Last note: you probably know more about virtual tables (vtables) than I do. At the end of the day, it's your code, so do whichever architecture you want, but I don't want to be working in that code base :).

See also

  1. https://cplusplus.com/doc/tutorial/typecasting/ - excellent article on typecasting! See in particular the "dynamic_cast" section, and the code snippet therein.
  2. Structure padding and packing
  3. [my answer] When should static_cast, dynamic_cast, const_cast, and reinterpret_cast be used?
  4. https://en.wikipedia.org/wiki/Undefined_behavior
0
John Bollinger On

So I guess my question really is - What mechanics need to be in place so that it is ensured that S_Base and S_Child are correctly aligned?

TL;DR: no special mechanics are required to cover conversions between pointers to those types that are valid in your inheritance framework.


Alignment is described in C17 6.2.8, "Alignment of objects", and touched on in many other places in the spec.

Although the language spec does not explicitly speak to the question, we can observe that the alignment requirement of a structure type must be at least as strict as that of its most strictly-aligned member, else the implementation cannot ensure that all members of all instances will be correctly aligned. Because your S_Child has a member of type S_Base, the former cannot have a weaker alignment requirement than the latter, so conversion of a valid S_Child * to type S_Base * can never run afoul of incorrect alignment.

It is possible for S_Child to have a stricter alignment requirement than S_Base does, but this is not an issue you need to worry about in practice. The only case for conversion of an S_Base * to type S_Child * that is semantically valid in your inheritance system is when the original S_Base * points to the first member of an S_Child. In that case, you can rely on the fact that

A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa.

(C17 6.7.2.1/15)

Of course, this applies in both directions, so it provides additional (even better) support for the S_Child * to S_Base * case.

Pretty much the same thing applies on the side of your vtables, since you are structuring them analogously to the data-member structures.

Would something like S_Child* pChild = (S_Child*)((char*) pInstance) be legal and defined?

The cast to char * is valid if pInstance is a valid pointer to any object type, but it does not gain you anything with respect to the conversion of the result to type S_Child *. You might as well just write

S_Child *pChild = (S_Child *) pInstance;

, which is perfectly fine in all the cases you (should) care about.

0
supercat On

The C Standard treats support for many "inheritance-style" idioms as a quality-of-implementation issue. Implementations which are intended solely for tasks which would not involve such inheritance need not support it, but all or nearly all implementations can be configured to support such constructs. In clang and gcc, they may be supported by using the -fno-strict-aliasing compilation option.

Note that under C89, the idiomatic way of allowing structures to be used interchangeably was to have them start with a common initial sequence. While some people may argue that C99 was intended to break code using this idiom, that would imply that C99 was written in gross violation of the Committee's charter. If the authors of C99 were intending to uphold their charter, they would have intended that programs that would benefit from the CIS guarantees would be processed in a manner that supports it, and implementations that don't support it would only be used for tasks that wouldn't benefit from it.

Using the Common Initial Sequence approach, derived structures would start with the same members as their parent structures. If a structure type and all structures that are derived from it all start with a member having the same name and a distinctive type, then functions which would expect a pointer to a type compatible with the structure type could pass it with a consistent syntax e.g. &foo->header. It may be useful to have macros which would syntactically accept a pointer to any struct following the pattern and wrap it to call an actual function e.g.

struct woozle { struct woozleHeader *woozle_hdr; int x, y; };
struct derived_woozle { struct woozleHeader *woozle_hdr; int x, y; double z; };
int do_use_woozle(struct woozleheader **p, int x, int y);
#define use_woozle(it, x, y) do_woozle(&(it)->woozle_hdr, (x), (y))

Using macros in this way is a bit ugly, but it will allow code to say use_woozle(ptr, x, y); when ptr is a pointer to any object which is derived from woozle and follows the pattern, while rejecting attempts to pass other things. By contrast, using a void* arguments or casting arguments to a struct woozle would bypass type checking that would otherwise usefully catch many mistakes, such as passing pointers with the wrong level of indirection.