I am trying with below code to convert from shift-jis file to utf-8, but when we open the output file it has corrupted characters, looks like something is missed out here, any thoughts?
// From file
FILE* shiftJisFile = _tfopen(lpszShiftJs, _T("rb"));
int nLen = _filelength(fileno(shiftJisFile));
LPSTR lpszBuf = new char[nLen];
fread(lpszBuf, 1, nLen, shiftJisFile);
// convert multibyte to wide char
int utf16size = ::MultiByteToWideChar(CP_ACP, 0, lpszBuf, -1, 0, 0);
LPWSTR pUTF16 = new WCHAR[utf16size];
::MultiByteToWideChar(CP_ACP, 0, lpszBuf, -1, pUTF16, utf16size);
wstring str(pUTF16);
// convert wide char to multi byte utf-8 before writing to a file
fstream File("filepath", std::ios::out);
string result = string();
result.resize(WideCharToMultiByte(CP_UTF8, 0, str.c_str(), -1, NULL, 0, 0, 0));
char* ptr = &result[0];
WideCharToMultiByte(CP_UTF8, 0, str.c_str(), -1, ptr, result.size(), 0, 0);
File << result;
File.close();

There are multiple problems.
The first problem is that when you are writing the output file, you need to set it to
binaryfor the same reason you need to do so when reading the input.The second problem is that when you are reading the input file, you are only reading the bytes of the input stream and treat them like a string. However, those bytes do not have a terminating null character. If you call
MultiByteToWideCharwith a-1length, it infers the input string length from the terminating null character, which is missing in your case. That means bothutf16sizeand the contents ofpUTF16are already wrong. Add it manually after reading the file:The last problem is that you are using
CP_ACP. That means "the current code page". In your question, you were specifically asking how to convert Shift-JIS. The code page Windows uses for its closes equivalent to what is commonly called "Shift-JIS" is932(you can look that up on wikipedia for example). So use932instead ofCP_ACP:Additionally, there is no reason to create
wstring str(pUTF16). Just usepUTF16directly in theWideCharToMultiBytecalls.Also, I'm not sure how kosher
char *ptr = &result[0]is. I personally would not create a string specifically as a buffer for this.Here is the corrected code. I would personally not write it this way, but I don't want to impose my coding ideology on you, so I made only the changes necessary to fix it:
Also, you have a memory leak --
lpszBufandpUTF16are not cleaned up.