c++ creating cyclically dependent objects with raw pointers

194 views Asked by At

I'm trying to create a connected graph and to perform certain computations on it. To do that, from each node in this graph, I need to access its neighbors and to access its neighbor's neighbors from its neighbor and so forth. This inevitably creates many (useful) cyclic dependencies.

Below is a simplified example with 3 mutually connected nodes (like the 3 vertices of a triangle), and I'm not sure if this method is a good way to do it, particularly if the clean-up leaves any memory leaks :

#include <iostream>
#include <vector>

class A {
public:
    int id;
    std::vector<A*> partners;
 
    A(const int &i) : id(i) {
        std::cout << id << " created\n";
    }
    ~A() {
        std::cout << id << " destroyed\n";
    }
};

bool partnerUp(A *a1, A *a2) {
    if (!a1 || !a2)
        return false;

    a1->partners.push_back(a2);
    a2->partners.push_back(a1);

    std::cout << a1->id << " is now partnered with " << a2->id << "\n";

    return true;
}

int main() {
    std::vector<A*> vecA;
    vecA.push_back(new A(10));
    vecA.push_back(new A(20));
    vecA.push_back(new A(30));
 
    partnerUp(vecA[0], vecA[1]);
    partnerUp(vecA[0], vecA[2]);
    partnerUp(vecA[1], vecA[2]);

    for (auto& a : vecA) {
        delete a;
        a = nullptr;
    }
    vecA.clear();
 
    return 0;
}

I'm also aware that I can use shared_ptr + weak_ptr to complete the task, but smart pointers come with an overhead and I'd love to avoid that whenever possible (I also hate to use .lock() all the time to access the data, but that doesn't really matter). I rewrote the code using smart pointers as follows, and I'd like to know what are the differences between the 2 pieces of code (outputs of the two codes are identical).

#include <iostream>
#include <vector>
#include <memory>

using namespace std;

class A {
public:
    int id;
    vector<weak_ptr<A>> partners;
 
    A(const int &i) : id(i) {
        cout << id << " created\n";
    }
    ~A() {
        cout << id << " destroyed\n";
    }
};

bool partnerUp(shared_ptr<A> a1, shared_ptr<A> a2) {
    if (!a1 || !a2)
        return false;

    a1->partners.push_back(a2);
    a2->partners.push_back(a1);

    cout << a1->id << " is now partnered with " << a2->id << "\n";

    return true;
}

int main() {
    vector<shared_ptr<A>> vecA;
    vecA.push_back(make_shared<A>(10));
    vecA.push_back(make_shared<A>(20));
    vecA.push_back(make_shared<A>(30));
 
    partnerUp(vecA[0], vecA[1]);
    partnerUp(vecA[0], vecA[2]);
    partnerUp(vecA[1], vecA[2]);

    return 0;
}
1

There are 1 answers

11
dyp On BEST ANSWER

You can prevent memory leaks by using a principle of ownership: At every point, there needs to be an owner who is responsible for freeing the memory.

In the first example, the owner is the main function: It undoes all the allocations.

In the second example, each graph node has shared ownership. Both vecA and the linked nodes share ownership. They are all responsible in the sense that they all call free if necessary.

So in this sense, both versions have a relatively clear ownership. The first version is even using a simpler model. However: The first version has some issues with exception safety. Those are not relevant in this small program, but they will become relevant once this code is embedded into a larger application.

The issues come from transfer of ownership: You perform an allocation via new A. This does not clearly state who the owner is. We then store this into the vector. But the vector itself won't call delete on its elements; it merely call destructors (no-op for a pointer) and deletes its own allocation (the dynamic array/buffer). The main function is the owner, and it frees the allocations only at some point, in the loop at the end. If the main function exits early, for example due to exception, it won't perform its duties as the owner of the allocations - it won't free the memory.

This is where the smart pointers come into play: They clearly state who the owner is, and use RAII to prevent issues with exceptions:

class A {
public:
    int id;
    vector<A*> partners;
 
    // ...
};

bool partnerUp(A* a1, A* a2) {
    // ...
}

int main() {
    vector<unique_ptr<A>> vecA;
    vecA.push_back(make_unique<A>(10));
    vecA.push_back(make_unique<A>(20));
    vecA.push_back(make_unique<A>(30));
 
    partnerUp(vecA[0].get(), vecA[1].get());
    partnerUp(vecA[0].get(), vecA[2].get());
    partnerUp(vecA[1].get(), vecA[2].get());

    return 0;
}

The graph can still use raw pointers, since the ownership is now solely the responsibility of the unique_ptr, and those are owned by vecA, and that is owned by main. Main exits, destroys vecA, and this destroys each of its elements, and those destroy the graph nodes.

This is still not ideal, though, because we use one indirection more than necessary. We need to keep the address of the graph nodes stable, since they're being pointed to from the other graph nodes. Hence we should not use vector<A> in main: if we resize that via push_back, this changes the addresses of its elements - the graph nodes - but we might have stored those addresses as graph relations. That is, we can use vector but only as long as we haven't created any links.

We can use deque even after creating links. A deque keeps the addresses of the elements stable during a push_back.

class A {
public:
    int id;
    vector<A*> partners;

    // ...

    A(A const&) = delete; // never change the address, since it's important!

    // ...
};

bool partnerUp(A* a1, A* a2) {
    // ...
}

int main() {
    std::deque<A> vecA;
    vecA.emplace_back(10);
    vecA.emplace_back(20);
    vecA.emplace_back(30);
 
    partnerUp(&vecA[0], &vecA[1]);
    partnerUp(&vecA[0], &vecA[2]);
    partnerUp(&vecA[1], &vecA[2]);
 
    return 0;
}

The actual problem of deletion in a graph is when you don't have a data structure like your vector in main: It is possible to just keep pointers to one or several nodes from which you can reach all other nodes in main. In that case, you need graph traversal algorithms to delete all nodes. This is where it gets more complicated and hence more error prone.

In terms of ownership, here the graph itself would have ownership of its nodes, and main has ownership of just the graph.

int main() {
    A* root = new A(10);
 
    partnerUp(root, new A(20));
    partnerUp(root, new A(30));
    partnerUp(root.partners[0], root.partners[1]);

    // now, how to delete all nodes?
 
    return 0;
}

Why would the second approach be recommended?

Because it follows a widespread, simple pattern that reduces the likelyhood of a memory leak. If you always use smart pointers, there'll always be an owner. There's just no opportunity for a bug that drops ownership.

However, with shared pointers, you can form cycles where multiple elements are kept alive because they own each other in a cycle. E.g. A owns B and B owns A.

Therefore, the typical rule-of-thumb recommendations are:

  • Use a stack object, or if not possible, use a unique_ptr or if not possible, use a shared_ptr.
  • For multiple elements, use a container<T>, or container<unique_ptr<T>> or container<shared_ptr<T>> in that order.

These are rules of thumb. If you have time to think about it, or some requirements like performance or memory consumption, it can make sense to define a custom ownership model. But then you also need to invest the time to make that safe and test it. So it should really give you a great benefit to be worth all the effort needed to make it safe. I would recommend against assuming that shared_ptr is too slow. This needs to be seen in the context of the application and usually measured. It's just too tricky to get custom ownership concepts right. In one of my examples above, you need to be very careful with resizing the vector, for example.