Are basic_string literals faster or handled better at compile-time?

534 views Asked by At

While skimming over the draft of C++14/C++1y (n3690) I noticed the introduction of the basic_string litertal suffixes in section §21.7:

inline namespace literals {
inline namespace string_literals {
  // 21.7, suffix for basic_string literals:
  string operator "" s(const char *str, size_t len);
  u16string operator "" s(const char16_t *str, size_t len);
  u32string operator "" s(const char32_t *str, size_t len);
  wstring operator "" s(const wchar_t *str, size_t len);
}
}

My questions are:

  • Is there a possibility to be faster at run-time with basic_string literals?
  • Is my "naive" implementation totally wrong?
  • Can the layout of data in ROM be different with basic_string literals, or any other difference at compile-time versus run-time?

Background

I know that this allows the direct use of string literals like this:

std::string s1 = "A fabulous string"s;

void sfunc(std::string arg);

int main() {
    sfunc("argument"s);
}

But what is the advantage of that over relying on the conversion constructor string(const char*)?

The "old" code would look:

std::string s1 = "A fabulous string";  // c'tor string(const char*)

void sfunc(std::string arg);

int main() {
    sfunc("argument");   // auto-conversion via same c'tor
}

As far as I can see the implementation of operator "" s() would basically look like this:

std::string operator "" s(const char* lit, size_t sz) {
    return std::string(lit, sz);
}

So, just the use of the same c'tor. And my guess is, that has to be done at run-time, am I wrong?

Edit: As Nicol Bolas pointed out correctly below my example does not use the same constructor, but the one with the additional length -- which is very useful for the construction, obviously. This leaves with me the question: Is this better for the compiler to putting string literals into ROM, or something similar at compile-time?

3

There are 3 answers

0
Jonathan Wakely On BEST ANSWER
  • Is there a possibility to be faster at run-time with basic_string literals?

As already stated, the string length is known and automatically passed to the constructor.

  • Is my "naive" implementation totally wrong?

No, it's correct.

  • Can the layout of data in ROM be different with basic_string literals, or any other difference at compile-time versus run-time?

Probably not, because the relevant basic_string constructor is not constexpr so won't be eligible for static initialization, so probably can't be put in ROM and has to be done at run-time.

2
Nicol Bolas On

So, just the use of the same c'tor.

OK, let's see how that would look:

string fromLit = "A fabulous string"s;
string fromBare = string("A fabulous string");

See anything missing in fromBare? Let me spell it out for you:

string fromBare = string("A fabulous string"/*, NOTHING*/);

Yeah, you can't get the length of the string without... getting it's length. Which means that fromBare will have to iterate through the literal to find the \0 character. At runtime. fromLit will not; the compiler provides the string's length as a compile-time determined parameter. Any compiler worth using will just bake the length into the executable code.

And even if that wasn't the case, it's still better for other reasons. Consider this:

void SomeFunc(const std::string &);
void SomeFunc(const char *);

SomeFunc("Literal");
SomeFunc("Literal"s);
SomeFunc(std::string("Literal"));

The last two do the same thing (minus the point I made before), but one of them is much shorter. Even if you employ using std::string (or foolishly using namespace std;), the second one is still shorter. Yet it's clear exactly what's going on.

1
David Stone On

It provides more compile-time safety.

Consider How do you construct a std::string with an embedded null?

The only way to construct a std::string from a string literal that contains a null character is to either specify the size of the string literal (error prone), use initializer_list syntax (verbose) or to do some sort of loop with multiple calls to push_back (even more verbose). However, with the literal constructor, the size is automatically passed in for you, removing a possible source of error.