Can I do a zero-copy std::string allocation in C++ from a const char * array?

5.2k views Asked by At

Profiling of my application reveals that it is spending nearly 5% of CPU time in string allocation. In many, many places I am making C++ std::string objects from a 64MB char buffer. The thing is, the buffer never changes during the running of the program. My analysis of std::string(const char *buf,size_t buflen) calls is that that the string is being copied because the buffer might change after the string is made. That isn't the problem here. Is there a way around this problem?

EDIT: I am working with binary data, so I can't just pass around char *s. Besides, then I would have a substantial overhead from always scanning for the NULL, which the std::string avoids.

5

There are 5 answers

3
phonetagger On BEST ANSWER

Binary data? Stop using std::string and use std::vector<char>. But that won't fix your issue of it being copied. From your description, if this huge 64MB buffer will never change, you truly shouldn't be using std::string or std::vector<char>, either one isn't a good idea. You really ought to be passing around a const char* pointer (const uint8_t* would be more descriptive of binary data but under the covers it's the same thing, neglecting sign issues). Pass around both the pointer and a size_t length of it, or pass the pointer with another 'end' pointer. If you don't like passing around separate discrete variables (a pointer and the buffer’s length), make a struct to describe the buffer & have everyone use those instead:

struct binbuf_desc {
    uint8_t* addr;
    size_t len;
    binbuf_desc(addr,len) : addr(addr), len(len) {}
}

You can always refer to your 64MB buffer (or any other buffer of any size) by using binbuf_desc objects. Note that binbuf_desc objects don’t own the buffer (or a copy of it), they’re just a descriptor of it, so you can just pass those around everywhere without having to worry about binbuf_desc’s making unnecessary copies of the buffer.

3
Ben Voigt On

There is no portable solution. If you tell us what toolchain you're using, someone might know a trick specific to your library implementation. But for the most part, the std::string destructor (and assignment operator) is going to free the string content, and you can't free a string literal. (It's not impossible to have exceptions to this, and in fact the small string optimization is a common case that skips deallocation, but these are implementation details.)

A better approach is to not use std::string when you don't need/want dynamic allocation. const char* still works just fine in modern C++.

4
James McNellis On

If the string isn't going to change and if its lifetime is guaranteed to be longer than you are going to use the string, then don't use std::string.

Instead, consider a simple C string wrapper, like the proposed string_ref<T>.

0
shargors On

Seems that using const char * instead of std::string is the best way to go for you. But you should also consider how you are using strings. It may be possible that there could be going on implicit conversion from char pointers to std::string objects. This could happen during function calls, for example.

3
armel On

Since C++17, std::string_view may be your way. It can be initialized both from a bare C string (with or without a length), or a std::string

There is no constraint that the data() method returns a zero-terminated string though.

If you need this "zero-terminated on request" behaviour, there are alternatives such as str_view from Adam Sawicki that looks satisfying (https://github.com/sawickiap/str_view)