Why does std::optional not use a sentinel value to represent an empty optional?

298 views Asked by At

I know this ship has sailed due to ABI breakage it would require, but I wonder why originally implementations did not decide to use some magic bit pattern for std::string, std::vector, etc... to signal empty optional value. This obviously requires some "free" value in the bit representation of the type, so for example it would not work for int, size_t but many complex types have certain representations that can never occur for valid instance.

C++ could even specify some way to opt in for your type, e.g. by specializing

std::nullopt_sentinel

for user defined types.

My best guess is that complexity of specifying/implementing this(because you would need to implement sentinels for many many types in std::) was much greater than just using a bool always, but I wonder if there are other reasons.

Similar answer for Rust that seems to implement this sometimes

https://stackoverflow.com/a/16515488/700825

2

There are 2 answers

8
user17732522 On

The standard library and any user specializing std::optional for their own types are already free to implement it that way for types where it is possible (see [namespace.std] p2).

Nothing in the specification of std::optional mandates use of a bool flag to indicate the presence of a value. How the empty state is identified is an implementation detail. The user of std::optional can't rely on observing anything else about the internal state if the optional is empty.

0
Nicol Bolas On

Let's say that a type can identify that it has this "magic bit pattern" that represents "not a value". And let's say that optional<T> will be implemented differently depending on whether T advertises this.

So here's a question: is this "magic bit pattern" actually a valid value of type T?

If the answer to this question is "yes", then we have a problem. Because it is a valid value of type T, a user can create an object of type T with that value, right? So... what happens if they do optional<T> t(invalid_value);? Is t considered to be "engaged, but containing a value" or is t considered to be an unengaged optional?

If it's the former, then how can optional<T> recognize the difference between optional<T> t{} and optional<T> t(invalid_value);? The bit pattern of these two ts is identical, so they must behave identically, right?

So if it's the latter, then you're telling me that this code:

T t = something();
optional<T> ot(t);
if(ot)
{
  //Do something with ot
}

It is possible for you to never do something with ot based on what something returns? That's nonsensical. From the structure of the code, you handed it a t, and optional<T> is supposed to store a T if you give it one. That's what the type is for.

So, going back to the question, now this "magic bit pattern" must be considered to be not a valid value of type T. That is, you cannot create a T that has this "magic bit pattern."

Now, how do you implement this? When you construct an unengaged optional<T>, it has to treat the storage of T as being just a bag of bits. It must initialize this bag of bits with a value acquired from some T-based interface. And then, to test if the optional<T> is engaged, it has to compare all of those bits to the unengaged bits.

This now requires that T must not be trivially copyable. Or trivially default constructible. Otherwise, a T could be created that has the invalid value.

But more than that, you're missing one of the key reasons to even want this: space efficiency.

See, most of the "many complex types [that] have certain representations that can never occur for valid instance" are types that are substantially bigger than a bool. Even a vector<int> is 3x the size of a bool (including the padding for 64-bit pointers), let alone a std::string with small-string optimization. Because most of these types are significantly larger than the tracking data used to represent them, the space efficiency argument isn't particularly convincing.

Most of the cases where you want this space efficiency are things like non-nullable pointers or enumerators. That is, you want a null pointer value to act like an unengaged optional<T*>, and you want an enumeration value that isn't one of the enumerators to act like an unengaged optional<E>. These are small types, comparable to the size of bool, so the current optional implementation would double their size. The space savings is significant.

The larger the T, the less meaningful the space savings is.

But you'll notice one other thing. The larger the T is, the longer it takes to test if optional<T> is unengaged. If T is 24 bytes in size, you have to compare 3 8-byte words to see if it is engaged.

Again, doing this only makes sense if T is small. The bigger the T, the worse it gets.