I'm trying to create a connected graph and to perform certain computations on it. To do that, from each node in this graph, I need to access its neighbors and to access its neighbor's neighbors from its neighbor and so forth. This inevitably creates many (useful) cyclic dependencies.
Below is a simplified example with 3 mutually connected nodes (like the 3 vertices of a triangle), and I'm not sure if this method is a good way to do it, particularly if the clean-up leaves any memory leaks :
#include <iostream>
#include <vector>
class A {
public:
int id;
std::vector<A*> partners;
A(const int &i) : id(i) {
std::cout << id << " created\n";
}
~A() {
std::cout << id << " destroyed\n";
}
};
bool partnerUp(A *a1, A *a2) {
if (!a1 || !a2)
return false;
a1->partners.push_back(a2);
a2->partners.push_back(a1);
std::cout << a1->id << " is now partnered with " << a2->id << "\n";
return true;
}
int main() {
std::vector<A*> vecA;
vecA.push_back(new A(10));
vecA.push_back(new A(20));
vecA.push_back(new A(30));
partnerUp(vecA[0], vecA[1]);
partnerUp(vecA[0], vecA[2]);
partnerUp(vecA[1], vecA[2]);
for (auto& a : vecA) {
delete a;
a = nullptr;
}
vecA.clear();
return 0;
}
I'm also aware that I can use shared_ptr
+ weak_ptr
to complete the task, but smart pointers come with an overhead and I'd love to avoid that whenever possible (I also hate to use .lock() all the time to access the data, but that doesn't really matter). I rewrote the code using smart pointers as follows, and I'd like to know what are the differences between the 2 pieces of code (outputs of the two codes are identical).
#include <iostream>
#include <vector>
#include <memory>
using namespace std;
class A {
public:
int id;
vector<weak_ptr<A>> partners;
A(const int &i) : id(i) {
cout << id << " created\n";
}
~A() {
cout << id << " destroyed\n";
}
};
bool partnerUp(shared_ptr<A> a1, shared_ptr<A> a2) {
if (!a1 || !a2)
return false;
a1->partners.push_back(a2);
a2->partners.push_back(a1);
cout << a1->id << " is now partnered with " << a2->id << "\n";
return true;
}
int main() {
vector<shared_ptr<A>> vecA;
vecA.push_back(make_shared<A>(10));
vecA.push_back(make_shared<A>(20));
vecA.push_back(make_shared<A>(30));
partnerUp(vecA[0], vecA[1]);
partnerUp(vecA[0], vecA[2]);
partnerUp(vecA[1], vecA[2]);
return 0;
}
You can prevent memory leaks by using a principle of ownership: At every point, there needs to be an owner who is responsible for freeing the memory.
In the first example, the owner is the
main
function: It undoes all the allocations.In the second example, each graph node has shared ownership. Both
vecA
and the linked nodes share ownership. They are all responsible in the sense that they all call free if necessary.So in this sense, both versions have a relatively clear ownership. The first version is even using a simpler model. However: The first version has some issues with exception safety. Those are not relevant in this small program, but they will become relevant once this code is embedded into a larger application.
The issues come from transfer of ownership: You perform an allocation via
new A
. This does not clearly state who the owner is. We then store this into the vector. But the vector itself won't call delete on its elements; it merely call destructors (no-op for a pointer) and deletes its own allocation (the dynamic array/buffer). Themain
function is the owner, and it frees the allocations only at some point, in the loop at the end. If the main function exits early, for example due to exception, it won't perform its duties as the owner of the allocations - it won't free the memory.This is where the smart pointers come into play: They clearly state who the owner is, and use RAII to prevent issues with exceptions:
The graph can still use raw pointers, since the ownership is now solely the responsibility of the
unique_ptr
, and those are owned byvecA
, and that is owned bymain
. Main exits, destroysvecA
, and this destroys each of its elements, and those destroy the graph nodes.This is still not ideal, though, because we use one indirection more than necessary. We need to keep the address of the graph nodes stable, since they're being pointed to from the other graph nodes. Hence we should not use
vector<A>
in main: if we resize that viapush_back
, this changes the addresses of its elements - the graph nodes - but we might have stored those addresses as graph relations. That is, we can usevector
but only as long as we haven't created any links.We can use
deque
even after creating links. Adeque
keeps the addresses of the elements stable during apush_back
.The actual problem of deletion in a graph is when you don't have a data structure like your
vector
in main: It is possible to just keep pointers to one or several nodes from which you can reach all other nodes in main. In that case, you need graph traversal algorithms to delete all nodes. This is where it gets more complicated and hence more error prone.In terms of ownership, here the graph itself would have ownership of its nodes, and main has ownership of just the graph.
Why would the second approach be recommended?
Because it follows a widespread, simple pattern that reduces the likelyhood of a memory leak. If you always use smart pointers, there'll always be an owner. There's just no opportunity for a bug that drops ownership.
However, with shared pointers, you can form cycles where multiple elements are kept alive because they own each other in a cycle. E.g. A owns B and B owns A.
Therefore, the typical rule-of-thumb recommendations are:
unique_ptr
or if not possible, use ashared_ptr
.container<T>
, orcontainer<unique_ptr<T>>
orcontainer<shared_ptr<T>>
in that order.These are rules of thumb. If you have time to think about it, or some requirements like performance or memory consumption, it can make sense to define a custom ownership model. But then you also need to invest the time to make that safe and test it. So it should really give you a great benefit to be worth all the effort needed to make it safe. I would recommend against assuming that
shared_ptr
is too slow. This needs to be seen in the context of the application and usually measured. It's just too tricky to get custom ownership concepts right. In one of my examples above, you need to be very careful with resizing the vector, for example.