C++ - Unicode Newline

795 views Asked by At

I'm having an increasingly frustrating problem in that I'm seemingly unable to print a unicode character (in this case, some braille dots), take it to a newline, and enter more braille dots. I've been looking for answers for a few hours now, and I'm about at my wit's end.

I've tried changing the format for my Unicode characters, changing localities, changing the order, using multiple fstreams, one wide and one normal, and using countless different supposed unicode newline escape sequences. This is repeated as many times as there are characters in a row. At the end of each row, it'll need to have an endline at the end.

wout.open((inputstring + "2.txt"), wofstream::binary | wofstream::trunc); //this only happens once


_setmode(_fileno(stdout), _O_U16TEXT);



switch (i) //will be expanded for more cases
{
case (63):
    cout << "\xFF\xFE"; // UTF-16 BOM
    cout << "\x0A\x28";

}



_setmode(_fileno(stdout), _O_TEXT);

I'm using setmode to switch to and from U16 because other parts of the program use text mode. If this is a problem, I can work around it. When I tried to use

    wout << "\n";

at the end of each row, it changes the output to be half braille characters like I'd expect, half gibberish like "*૾H૾H૾H૾H૾H૾H૾H૾H૾H". When I remove any part to do with printing the braille characters, it prints newlines just fine. I'm at a loss.

1

There are 1 answers

0
1201ProgramAlarm On

The entire file is either 8-bit or 16-bit characters, as determined by the UTF-16 BOM in the first two bytes. You can't change between them. When you write out an 8-bit newline character, that throws off the processing on the rest of the file, as that 8-bit character is combined with the next byte in the file to create a 16-bit character.

If we look at the first few words of your misprinted text string, we have

0020 0022 ff0a 0afe ff28 0afe ff28 0afe

In the (little endian) binary file, these would be ordered as

20 00 22 00 0a ff fe 0a 28 ff fe 0a 28 ff fe 0a

and you can see how that one byte newline combines with the following two byte characters to make unexpected output.

The fix is to always write 16-bit characters to the file.