What is string_view?

102.6k views Asked by At

string_view was a proposed feature within the C++ Library Fundamentals TS(N3921) added to C++17

As far as i understand it is a type that represent some kind of string "concept" that is a view of any type of container that could store something viewable as a string.

  • Is this right ?
  • Should the canonical const std::string& parameter type become string_view ?
  • Is there another important point about string_view to take into consideration ?
2

There are 2 answers

23
Kerrek SB On BEST ANSWER

The purpose of any and all kinds of "string reference" and "array reference" proposals is to avoid copying data which is already owned somewhere else and of which only a non-mutating view is required. The string_view in question is one such proposal; there were earlier ones called string_ref and array_ref, too.

The idea is always to store a pair of pointer-to-first-element and size of some existing data array or string.

Such a view-handle class could be passed around cheaply by value and would offer cheap substringing operations (which can be implemented as simple pointer increments and size adjustments).

Many uses of strings don't require actual owning of the strings, and the string in question will often already be owned by someone else. So there is a genuine potential for increasing the efficiency by avoiding unneeded copies (think of all the allocations and exceptions you can save).

The original C strings were suffering from the problem that the null terminator was part of the string APIs, and so you couldn't easily create substrings without mutating the underlying string (a la strtok). In C++, this is easily solved by storing the length separately and wrapping the pointer and the size into one class.

The one major obstacle and divergence from the C++ standard library philosophy that I can think of is that such "referential view" classes have completely different ownership semantics from the rest of the standard library. Basically, everything else in the standard library is unconditionally safe and correct (if it compiles, it's correct). With reference classes like this, that's no longer true. The correctness of your program depends on the ambient code that uses these classes. So that's harder to check and to teach.

Note that if C++17's std::string_view is created from/for a std::string, then as soon as said std::string gets out-of-scope the said std::string_view's behavior will be undefined.
Also, the Qt framework renamed QStringRef to QStringView, but both said Qt classes have similar out-of-scope behavior as std::string_view, just instead "undefined" it's a dangling-QString-pointer.

0
tmaj On

(Educating myself in 2021)

From Microsoft's <string_view>:

The string_view family of template specializations provides an efficient way to pass a read-only, exception-safe, non-owning handle to the character data of any string-like objects with the first element of the sequence at position zero. (...)

From Microsoft's C++ Team Blog std::string_view: The Duct Tape of String Types from August 21st, 2018 (retrieved 2021 Apr 01):

string_view solves the “every platform and library has its own string type” problem for parameters. It can bind to any sequence of characters, so you can just write your function as accepting a string view:

void f(wstring_view); // string_view that uses wchar_t's

and call it without caring what stringlike type the calling code is using (and > for (char*, length) argument pairs just add {} around them) (...)

(...)

Today, the most common “lowest common denominator” used to pass string data around is the null-terminated string (or as the standard calls it, the Null-Terminated Character Type Sequence). This has been with us since long before C++, and provides clean “flat C” interoperability. However, char* and its support library are associated with exploitable code, because length information is an in-band property of the data and susceptible to tampering. Moreover, the null used to delimit the length prohibits embedded nulls and causes one of the most common string operations, asking for the length, to be linear in the length of the string.

(...)

Each programming domain makes up their own new string type, lifetime semantics, and interface, but a lot of text processing code out there doesn’t care about that. Allocating entire copies of the data to process just to make differing string types happy is suboptimal for performance and reliability.