Google Sparsehash uses realloc() on type which is not trivially copyable

802 views Asked by At

Consider this simple program:

#include <string>
#include <sparsehash/dense_hash_map>

int main()
{
    google::dense_hash_map<std::string, int> map;
    map["foo"] = 0;
}

Compiling with GCC 8.2 and -Wclass-memaccess (or -Wall) produces a warning:

sparsehash/internal/libc_allocator_with_realloc.h:68:40: warning:
‘void* realloc(void*, size_t)’ moving an object of non-trivially copyable type
    ‘struct std::pair<const std::__cxx11::basic_string<char>, int>’;
use ‘new’ and ‘delete’ instead [-Wclass-memaccess]
    return static_cast<pointer>(realloc(p, n * sizeof(value_type)));
                                ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

The questions are:

  1. Is it undefined behavior?
  2. Can you suggest a fix or workaround which can be applied to the application code (not by changing Sparsehash or avoiding its use)?
  3. (Bonus points) can you construct a program which actually misbehaves due to this (using std::string or your own nontrivial type)? So far I haven't seen any problems in code using std::string as the key type, despite the fact that std::string must be quite a commonly used key type.

I've filed an issue here: https://github.com/sparsehash/sparsehash/issues/149

3

There are 3 answers

0
Deduplicator On BEST ANSWER
  1. Yes, it is undefined behavior.
    But don't despair yet, iff std::string doesn't store any internal pointers in your implementation, nor registers them anywhere, it will "work" anyway; making a bitwise copy will be equivalent to move-construction at the destination and destructing the source.
    Such is the case for most (not all) string-implementations, SSO or not.

  2. If you might use a type not guaranteed to be trivially destructively-moveable, use a different allocator (last template-argument) to avoid bitwise-moves.

  3. Making a program blow up due to invalid moveing by bitwise copy is trivial.
    Use this type with google::dense_hash_map:

    class bang {
        bang* p;
    public:
        bang() : p(this) {}
        bang(bang const&) : bang() {}
        bang& operator=(bang const&) { return *this; }
        ~bang() { if (p != this) std::abort(); }
    };
    
10
Oliv On

I suppose this code anticipates the maybe c++20 class attribute trivially relocatable. In essence, this is an object which memory location can safely be changed. In c++ parlance, this is an object that can safely be copied by coping the object representation and the program keeps the expected behavior as long as the copied object is not accessed any more, not even for destruction.

For example, this code might not be specified as "undefined behavior" by the C++20 standard:

alignas(string) unsigned char buffer[sizeof(string)];
auto p = new(buffer) string{"test"};
alignas(string) unsigned char buffer2[sizeof(string)];
memcpy(buffer,buffer2,sizeof(string));//create a new string object copy of *p;
auto np = reinterpret_cast<string*>(buffer2);
(*np)[0]="r";
// the object at p shall not be accessed, not even destroyed.

A type should not be trivially relocatable if it has a non static data member that refers to any part of itself:

struct fail{
    int a;
    int b;
    int* selected;
    fail(bool is_a,int i){
       if (is_a){ a=i; selected=&a;}
       else { b=i; selected=&b;}
       }
     };

Some implementation of linked list container could also not be trivially relocatable, for example if the container hold a member which is the root node. So dense_hash_map shall just not be used with those kind of self memory referencing types.

0
SJHowe On

1. Is it undefined behavior? Yes. You should never copy objects using realloc(), because sometimes they have internal pointers that point to a resource. The problem comes later when 2 different objects have their destructors run. Now a double deallocation occurs of the same resource, a complete no no.

2. Can you suggest a fix or workaround which can be applied to the application code (not by changing Sparsehash or avoiding its use)?

Try

#include <memory>

and change line

google::dense_hash_map<std::string, int> map;

to

google::dense_hash_map<std::string, int, std::hash<std::string>, std::equal_to<std::string>, std::allocator> map;

Now, it won't use google's allocator libc_allocator_with_realloc

3. (Bonus points) can you construct a program which actually misbehaves due to this (using std::string or your own nontrivial type)? So far I haven't seen any problems in code using std::string as the key type, despite the fact that std::string must be quite a commonly used key type.

Not easily. Because you are trying to cause undefined behavour. In your test program, I would feed strings that are at least 32 characters in length, so the small string optimisation does not kick in. And there are tests that can be done in gcc's heap to see if it has gone corrupt. See 1