Don't ignore whitespaces when using wscanf for UTF-8

183 views Asked by At

I am trying to read wide charaters into an array of wchar_t from stdin. However, the negated scanset specifier ([^characters]) for ls does not work preperly as expected.

The goal is that I want every whitespace read into str instead of being ignored. Hence, [^\n] is what I've tried, but with no luck, the result is frustrating and keeps printing garbled text to stdout.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <wchar.h>
#include <wctype.h>
#include <locale.h>

int main(void)
{
    wchar_t str[8];

    if (setlocale(LC_ALL, "en_US.UTF-8") == NULL)  {
        fprintf(stderr, "Failed to set locale LC_ALL = en_US.UTF-8.\n");
        exit(EXIT_FAILURE);
    }

    // correct (but not what I want)
    // whitespaces and EOLs are ignored
    // while (wscanf(L"%7ls", str) != EOF)  {
    //     wprintf(L"%ls", str);
    // }

    // incorrect
    // whitespaces (except EOLs) are properly read into str (what I want)
    // input: 不要忽略白空格 (for instance)
    // output: endless loop (garbled text)
    while (wscanf(L"%7[^\n]ls", str) != EOF)  {
        if (ferror(stdin) && errno == EILSEQ)  {
            fprintf(stderr, "Encountered an invalid wide character.\n");
            exit(EXIT_FAILURE);
        }
        wprintf(L"%ls", str);
    }
}
1

There are 1 answers

0
chux - Reinstate Monica On BEST ANSWER

Don't ignore whitespaces ...
... trying to read wide characters into an array of wchar_t

To read a line of text (all characters, and white-spaces up to '\n') into a wide character string, use fgetws();

#define STR_SIZE 8
wchar_t str[STR_SIZE];

while (fgetws(str, STR_SIZE, str)) {
  // lop off the potential \n if desired
  size_t len = wcslen(str);
  if (len > 0 && str[len-1] == L'\n') {
    str[--len] = L'\0';
  }
  ...
}