A file contains non-latin content and is encoded in UTF8.
Currently the existing code uses "fopen
" to open the file, parses it and calls my validate
function with the non-latin content and passes data as char*
.
void validate(const char* str)
{
....
}
I have to do some validation on passed char
array.
The application uses Sun C++ 5.11
and which I think doesn't supports unicode
. (I googled for unicode support on Sun C++ 5.11, I didn't get any proper pointers about the unicode support. So I wrote a simple program to check if Sun C++ supports unicode and the program didn't compile).
How do I do the validation on the input char*
? Is it possible using wchar_t
?
This isn't a problem. You only need compiler support for unicode to embed unicode string literals in the code, or for fixed width character types to represent UTF-16 or UTF-32. Your unicode is UTF-8 and comes from user input, so no unicode compiler support should be needed.
The C++ standard library has very few tools for processing unicode. The provided tools primarily consist of conversion between different unicode formats, and even those tools were not available prior to C++11.
Input and output is mostly just copying of bytes, so no significant processing is required to do that. For other processing (which you presumably need for "validation") you will need to implement the tools yourself, or use third party tools. You will need to refer to the ~1000 pages of the unicode standard if you choose to implement yourself: http://www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf
wchar_t
is the native wide character type used for the native wide character encoding of the system. UTF-8 does not use wide code-units.