why std::wofstream do not print all wstring into file?

1.4k views Asked by At

I have a std::wstring whose size is 139,580,199 characters.

For debugging I printed it into file with this code:

std::wofstream f(L"C:\\some file.txt");
f << buffer;
f.close();

After that noticed that the end of string is missing. The created file size is 109,592,584 bytes (and the "size on disk" is 109,596,672 bytes).

Also checked if buffer contains null chars, did this:

size_t pos = buffer.find(L'\0');

Expecting result to be std::wstring::npos but it is 18446744073709551615, but my string doesn't have null char at the end so probably it's ok.

Can somebody explain, why I have not all string printed into file?

1

There are 1 answers

4
James Kanze On BEST ANSWER

A lot depends on the locale, but typically, files on disk will not use the same encoding form (or even the same encoding) as that used by wchar_t; the filebuf which does the actual reading and writing translates the encodings according to its imbued locale. And there is only a vague relationship between the length of a string in different encodings or encoding form. (And the size the system sees doesn't correspond directly to the number of bytes you can read from the file.)

To see if everything was written, check the status of f after the close, i.e.:

f.close();
if ( !f ) {
    //  Something went wrong...
}

One thing that can go wrong is that the external encoding doesn't have a representation for one of the characters. If you're in the "C" locale, this could occur for any character outside of the basic execution character set.

If there is no error above, there's no reason off hand to assume that not all of the string has been written. What happens if you try to read it in another program? Do you get the same number of characters or not?

For the rest, nul characters are characters like any others in a std::wstring; there's nothing special about them, including when they are output to a stream. And 18446744073709551615 looks very much like the value I would expect for std::wstring::npos on a 64 bit machine.

EDIT:

Following up on Mat Petersson's comment: it's actually highly unlikely that the file ends up with less bytes than there are code points in the std::wstring. (std::wstring::size() returns the number of code points.) I was thinking in terms of bytes, not in terms of what std::wstring::size() returns. So the most likely explination is that you have some characters in your string which aren't representable in the target encoding (which probably only supports characters with code points 32-126, plus a few control characters, by default).