Printing a wchar_t as part of a wchar_t* string does not terminate

332 views Asked by At

So, I found a bug in glibc that I like to report. The issue is that printf() counts the wrong width for a grouping character in the no_NO.utf8 locale and thus does not set aside enough padding to the left of the string. I originally spotted this in the shell util printf, but it seems it originates from the original printf function in libc, which I have verified using a little test program.

I haven't dealt in C since university, so I am a bit rusty when creating a test case. My only issue so far is that when using this grouping char as part a string (a wchar_t array), the string is not terminated, and I am not sure what I am doing wrong.

This is the output of my little test driver:

$ gcc printf-test.c && ./a.out 
Using locale nb_NO.utf8
<1 234> (length 7 according to strlen)
<1 234> (length -1 according to wcswidth)

Using locale en_US.utf8
<  1,234> (length 7 according to strlen)
<  1,234> (length 7 according to wcswidth)

Width of character e280af: -1

Width of s0  4: (ABCD)
Width of s1  4: (ABCD)
Width of s2 -1: (

As is obvious, something fishy is going on with the printing in the final string and it is somehow related to how I try to print a string with the multi-byte grouping character used in the nb_NO locale.

The full source:

#define _XOPEN_SOURCE       /* See feature_test_macros(7) */
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
#include <string.h>


void print_num(char *locale){ 
    printf("Using locale %s", locale);
    setlocale(LC_NUMERIC, locale);
    char buf[40];
    sprintf(buf,"%'7d", 1234);
    printf("\n<%s> (length %d according to strlen)\n", buf, (int) strlen(buf));

    wchar_t wbuf[40];
    swprintf(wbuf, 40, L"%'7d", 1234); 
    int wide_width = wcswidth (wbuf, 40);
    printf("<%s> (length %d according to wcswidth)\n", buf, wide_width);
    puts("");
}

int main(){
    print_num("nb_NO.utf8");
    print_num("en_US.utf8");

    // just trying to understand
    wchar_t wc = (wchar_t) 0xe280af; // is this a correct way of specifying the char e2 80 af?
    int width = wcwidth (wc);
    printf("Width of character %x: %d\n", (int) wc, width);

    wchar_t s0[] = L"ABCD";
    wchar_t s1[] = {'A','B','C', 'D', '\0'};
    wchar_t s2[] = {'A',wc,'B', '\0'}; // something fishy
    int widthOfS0 = wcswidth (s0, 4);
    int widthOfS1 = wcswidth (s1, 4);
    int widthOfS2 = wcswidth (s2, 4);
    printf("\nWidth of s0  %d: (%ls)", widthOfS0, s0);
    printf("\nWidth of s1  %d: (%ls)", widthOfS1, s1);
    printf("\nWidth of s2 %d: (%ls)", widthOfS2, s2); // this does not terminate the string

    return 0;
}
1

There are 1 answers

1
SrPanda On

Maybe it is too obvious that you need to use wprintf() to print a wchar_t. Any string you add gets terminated automatically but not if you fill it with individual chars and the cast just changes the size and type it shows to make it "fit", it does not make any kind conversion between number types.

#include <wchar.h>
#include <stdio.h>

#ifndef __STDC_ISO_10646__
    #pragma warning() // 16 bit wchar
#endif

int main(void){

    int ret;
    wchar_t W [] = {                  // 0x80AF
        U'\x42', (wchar_t)0x43, (wchar_t)0xE280AF 
    };

    printf("Num cast %X -> %X \n", 0xE280AF, (wchar_t)0xE280AF);

    wchar_t S1[] = {'A', W[0], 'C',  0};
    wchar_t S2[] = {'A', 'B',  W[1], 0};
    wchar_t S3[] = {'A', W[2], 'C',  0};

    ret = wprintf(L"wstr S1 -> (%ls)", S1);
    printf(" / %i xchars printed \n", ret);

    ret = wprintf(L"wstr S2 -> (%ls)", S2); 
    printf(" / %i xchars printed \n", ret);

    ret = wprintf(L"wstr S3 -> (%ls)", S3);
    printf(" / %i xchars printed \n", ret);

    return 0;
}