C++20 deprecates std::filesystem::u8path:
#include <filesystem>
std::string foo();
int main()
{
auto path = std::filesystem::u8path(foo());
}
libstdc++ 13 has a deprecation warning in place:
<source>:7:40: warning: 'std::filesystem::__cxx11::path std::filesystem::__cxx11::u8path(
const _Source&) [with _Source = std::__cxx11::basic_string<char>; _Require = path; _CharT
= char]' is deprecated: use 'path((const char8_t*)&*source)' instead [-Wdeprecated-decla
rations]
7 | auto path = std::filesystem::u8path(foo());
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
The proposed cast path((const char8_t*)&*source) looks like an outright strict aliasing violation to me, and hence UB.
Is that correct? Is GCC making any additional guarantees that make this legal?
And lastly, is there a better workaround if my path is stored in std::string and I don't want to rewrite everything to std::u8string?
In short, there is undefined behavior in your example. However, the actual cause is not a strict aliasing violation, but a precondition violation because of a hypothetical strict aliasing violation.
No undefined behavior due to strict aliasing
There is no strict aliasing violation ([basic.lval] p11) because any access of the characters would happen within the constructor of
std::filesystem::pathor other parts of the filesystem library, and those could be permitted to type-pun in ways that the user can't.(const char8_t*)&*is essentially areinterpret_cast<const char8_t*>of your data.reinterpret_caston its own is valid, even if accessing objects through the pointer wouldn't be. With the resulting pointer, you would call the following constructor:- [fs.class.path]
std::pathconstructor 3The format detection, argument format conversions, and type and encoding conversions for the
pathare all defined mathematically or through prose. For example, the encoding conversion is defined in [fs.path.type.cvt] p3:The implementation has a lot of freedom when it comes to implementing this. The
std::filesystem::pathconstructor could have relaxed aliasing rules for instance.Undefined behavior due to precondition violation
The issue lies in the use of value type:
Your iterator would be of type
const char8_t*, and indirection (*i) would not be valid for it because it would hypothetically violate strict aliasing. Therefore, what you're passing to thepathconstructor has no value type, and the behavior is undefined because of a precondition violation.GCC strict aliasing relaxations between character types
I was unable to find details about this in the GCC documentation, but
char8_tappears to be able to aliaschar:See Compiler Explorer.
Presumably, you are thus relying on compiler extensions.