Why (if that is the case) does the standard say that copying uninitialized memory with memcpy is UB?

434 views Asked by At

When a class member cannot have a sensible meaning at the moment of construction, I don't initialize it. Obviously that only applies to POD types, you cannot NOT initialize an object with constructors.

The advantage of that, apart from saving CPU cycles initializing something to a value that has no meaning, is that I can detect erroneous usage of these variables with valgrind; which is not possible when I'd just give those variables some random value.

For example,

struct MathProblem {
  bool finished;
  double answer;

  MathProblem() : finished(false) { }
};

Until the math problem is solved (finished) there is no answer. It makes no sense to initialize answer in advance (to -say- zero) because that might not be the answer. answer only has a meaning after finished was set to true.

Usage of answer before it is initialized is therefore an error and perfectly OK to be UB.

However, a trivial copy of answer before it is initialized is currently ALSO UB (if I understand the standard correctly), and that doesn't make sense: the default copy and move constructor should simply be able to make a trivial copy (aka, as-if using memcpy), initialized or not: I might want to move this object into a container:

v.push_back(MathProblem());

and then work with the copy inside the container.

Is moving an object with an uninitialized, trivially copyable member indeed defined as UB by the standard? And if so, why? It doesn't seem to make sense.

2

There are 2 answers

8
eerorika On BEST ANSWER

Is moving an object with an uninitialized, trivially copyable member indeed defined as UB by the standard?

Depends on the type of the member. Standard says:

[basic.indet]

When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced ([expr.ass]).

If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

  • If an indeterminate value of unsigned ordinary character type ([basic.fundamental]) or std​::​byte type ([cstddef.syn]) is produced by the evaluation of:

    • the second or third operand of a conditional expression,
    • the right operand of a comma expression,
    • the operand of a cast or conversion ([conv.integral], [expr.type.conv], [expr.static.cast], [expr.cast]) to an unsigned ordinary character type or std​::​byte type ([cstddef.syn]), or
    • a discarded-value expression,

    then the result of the operation is an indeterminate value.

  • If an indeterminate value of unsigned ordinary character type or std​::​byte type is produced by the evaluation of the right operand of a simple assignment operator ([expr.ass]) whose first operand is an lvalue of unsigned ordinary character type or std​::​byte type, an indeterminate value replaces the value of the object referred to by the left operand.

  • If an indeterminate value of unsigned ordinary character type is produced by the evaluation of the initialization expression when initializing an object of unsigned ordinary character type, that object is initialized to an indeterminate value. If an indeterminate value of unsigned ordinary character type or std​::​byte type is produced by the evaluation of the initialization expression when initializing an object of std​::​byte type, that object is initialized to an indeterminate value.

None of the exceptional cases apply to your example object, so UB applies.


with memcpy is UB?

It is not. std::memcpy interprets the object as an array of bytes, in which exceptional case there is no UB. You still have UB if you attempt to read the indeterminate copy (unless the exceptions above apply).


why?

The C++ standard doesn't include a rationale for most rules. This particular rule has existed since the first standard. It is slightly stricter than the related C rule which is about trap representations. To my understanding, there is no established convention for trap handling, and the authors didn't wish to restrict implementations by specifying it, and instead opted to specify it as UB. This also has the effect of allowing optimiser to deduce that indeterminate values will never be read.


I might want to move this object into a container:

Moving an uninitialised object into a container is typically a logic error. It is unclear why you might want to do such thing.

0
supercat On

The design of the C++ Standard was heavily influenced by the C Standard, whose authors (according to the published Rationale) intended and expected that implementations would, on a quality-of-implementation basis, extend the semantics of the language by meaningfully processing programs in cases where it was clear that doing so would be useful, even if the Standard didn't "officially" define the behavior of those programs. Consequently, both standards place more priority upon ensuring that they don't mandate behaviors in cases where doing so might make some implementations less useful, than upon ensuring that they mandate everything that should be supported by quality general-purpose implementations.

There are many cases where it may be useful for an implementation to extend the semantics of the language by guaranteeing that using memcpy on any valid region of storage will, at worst, behave in a fashion consistent with populating the destination with some possibly-meaningless bit pattern with no outside side effects, and few if any where it would be either easier or more useful to have it do something else. The only situations where anyone should care about whether the behavior of memcpy is defined in a particular situation involving valid regions of storage would be those in which some alternative behavior would be genuinely more useful than the commonplace one. If such situations exist, compiler writers and their customers would be better placed than the Committee to judge which behavior would be most useful.

As an example of a situation where an alternative behavior might be more useful, consider code which uses memcpy to copy a partially-written structure, and then uses it to make two copies of that structure. In some cases, having the compiler only write the parts of the two destination structures which had been written in the original may improve efficiency, but that behavior would be observably different from having the first memcpy behave as though it stores some bit pattern to its destination. Note that while such a change would not adversely affect a program's overall behavior if no copies of the uninitialized parts of the structure are ever used in a way that would affect behavior, the Standard has no nice way of distinguishing scenarios that could or could not occur under such a module, and thus leaves all such scenarios undefined.