So, I found a bug in glibc that I like to report. The issue is that printf()
counts the wrong width for a grouping character in the no_NO.utf8
locale and thus does not set aside enough padding to the left of the string. I originally spotted this in the shell util printf
, but it seems it originates from the original printf
function in libc
, which I have verified using a little test program.
I haven't dealt in C since university, so I am a bit rusty when creating a test case. My only issue so far is that when using this grouping char as part a string (a wchar_t array), the string is not terminated, and I am not sure what I am doing wrong.
This is the output of my little test driver:
$ gcc printf-test.c && ./a.out
Using locale nb_NO.utf8
<1 234> (length 7 according to strlen)
<1 234> (length -1 according to wcswidth)
Using locale en_US.utf8
< 1,234> (length 7 according to strlen)
< 1,234> (length 7 according to wcswidth)
Width of character e280af: -1
Width of s0 4: (ABCD)
Width of s1 4: (ABCD)
Width of s2 -1: (
As is obvious, something fishy is going on with the printing in the final string and it is somehow related to how I try to print a string with the multi-byte grouping character used in the nb_NO
locale.
The full source:
#define _XOPEN_SOURCE /* See feature_test_macros(7) */
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
#include <string.h>
void print_num(char *locale){
printf("Using locale %s", locale);
setlocale(LC_NUMERIC, locale);
char buf[40];
sprintf(buf,"%'7d", 1234);
printf("\n<%s> (length %d according to strlen)\n", buf, (int) strlen(buf));
wchar_t wbuf[40];
swprintf(wbuf, 40, L"%'7d", 1234);
int wide_width = wcswidth (wbuf, 40);
printf("<%s> (length %d according to wcswidth)\n", buf, wide_width);
puts("");
}
int main(){
print_num("nb_NO.utf8");
print_num("en_US.utf8");
// just trying to understand
wchar_t wc = (wchar_t) 0xe280af; // is this a correct way of specifying the char e2 80 af?
int width = wcwidth (wc);
printf("Width of character %x: %d\n", (int) wc, width);
wchar_t s0[] = L"ABCD";
wchar_t s1[] = {'A','B','C', 'D', '\0'};
wchar_t s2[] = {'A',wc,'B', '\0'}; // something fishy
int widthOfS0 = wcswidth (s0, 4);
int widthOfS1 = wcswidth (s1, 4);
int widthOfS2 = wcswidth (s2, 4);
printf("\nWidth of s0 %d: (%ls)", widthOfS0, s0);
printf("\nWidth of s1 %d: (%ls)", widthOfS1, s1);
printf("\nWidth of s2 %d: (%ls)", widthOfS2, s2); // this does not terminate the string
return 0;
}
Maybe it is too obvious that you need to use
wprintf()
to print awchar_t
. Any string you add gets terminated automatically but not if you fill it with individual chars and the cast just changes the size and type it shows to make it "fit", it does not make any kind conversion between number types.