I am using a library which has a function that returns result strings encoded as UTF-16LE (I'm pretty sure) in a standard char *, as well as the number of bytes in the string. I would like to convert these strings to UTF-8. I tried the solution from this question: Convert UTF-16 to UTF-8 under Windows and Linux, in C which says to use iconv, however the result was that both input and output buffers wound up empty. What am I missing?
My input and output buffers are declared and initialized as follows:
char *resbuff=NULL;
char *outbuff=NULL;
int stringLen;
size_t outbytes=1024;
size_t inbytes;
size_t convResult;
...
//some loop and control code here
...
if (resbuff==NULL) {
resbuff=(char *)malloc(1024);
outbuff=(char *)malloc(1024);
}
I then call the library function to fill rebuff with data. Looking at the buffer in the debugger I can see the data in the buffer. For example, if the data is "test", I would see the following looking at the individual indexes of rebuff:
't','\0','e','\0','s','\0','t','\0'
Which I believe is UTF-16LE (other code using the same library would appear to confirm this), and stringlen now equals 8. I then try to convert that to UTF-8 using the following code:
iconv_t conv;
conv=iconv_open("UTF-8", "UTF-16LE");
inbytes=stringLen;
convResult=iconv(conv,&resbuff,&inbytes,&outbuff,&outbytes); //this does return 0
iconv_close(conv);
With the result that outbuff and resbuff both end up as null strings.
Note that I declare stringlen as an int rather than an unsigned long because that is what the library function is expecting.
EDIT: I tweaked my code slightly as per John Bollinger's answer below, but it didn't change the outcome.
EDIT 2: Ultimately the output from this code will be used in Python, so I'm thinking that while it might be uglier, I'll just perform the string conversion there. It just works.
You do not show the declaration or initialization of variables
stringLen
andoutbytes
, and your problem might well lie there. However, this ...... is very troubling. The
iconv()
function expects its third and fifth arguments to be of typesize_t *
, and lying to the compiler via a cast isn't going to make the code actually work if they are in fact different types. You should have something along these lines:Note, too, that you should check the return value to make sure the conversion was complete and successful (in which case the return value will be >= 0). If it is less than zero then the value of
errno
immediately after the call will tell you what kind of problem occurred.Edited to add:
You originally said that the zero bytes were converted, but you now say that
which is not the same thing at all.
The
iconv()
function updates the pointers to the input and output buffers to facilitate converting a long input via multiple calls, the need for that being fairly common. That's why you must pass pointers to those pointers. If you don't want to lose the original values of these pointers then you should make and pass copies; I have updated my code above to demonstrate this.Additionally,
iconv()
returns either an error indicator or a count of irreversibly-converted characters, not a count of the total number of converted characters. For valid UTF-16{,LE,BE} to UTF-8, there should never be any irreversible conversions. A return value of zero indicates that the specified number of input bytes were all successfully and reversibly converted to output bytes.Note also that
resbuff
, at least, never was a C string. The null chars embedded in the data make a string interpretation inappropriate. Depending on how your input and output buffers were initialized, however, it could be that aftericonv()
finishes,*resbuff == '\0'
and*outbuff == '\0'
(referring to your own current code). I'd call those "empty" strings, by the way, not "null" strings. If you do really mean thaticonv()
leavesresbuff == 0
andoutbuff == 0
(i.e. NULL pointers) then that would constitute a bug iniconv()
.