Special Characters issue in server response

1.6k views Asked by At

When I'm request(GET) to server for content, I'm able to get response as: K??

But actual content is: KòÉ

In order to fix this issue, I'm trying to use UTF-8 format while saving and reading content from file, like below:

//Saving content

   OutputStreamWriter sout = new OutputStreamWriter (new FileOutputStream(new File(path)),Charset.forName("UTF-8"));

   BufferedWriter buff_out= new BufferedWriter(sout); 

    int line = 0;
    while((line = buff_in.read()) != -1) 
            buff_out.write(line);

//Reading content

    InputStream inputStreamRead = new FileInputStream(path);
    StringBuilder stringBuilder = null;

    InputStreamReader inputStreamReader = new InputStreamReader(inputStreamRead, Charset.forName("UTF-8"));
    BufferedReader buffReader= new BufferedReader(inputStreamReader);

    String line;
    stringBuilder = new StringBuilder();
    try 
    {
        while (( line = buffReader.readLine()) != null) 
        {
            stringBuilder.append(line);
            stringBuilder.append('\n');
        }
        Log.d("Main", "Test:: "+stringBuilder.toString());
    } 

With help of above logic, I'm not albe to get the actual content as KòÉ.

I have tried reading bytes too. Can any one help me out of this.

Thanks in advance.

2

There are 2 answers

1
Joop Eggen On

Your code is working correctly, assuming close() being called correctly. One might use try-with-resources:

try (BufferedWriter buff_out = new BufferedWriter(new OutputStreamWriter(
        new FileOutputStream(new File(path)), StandardCharsets.UTF_8))) { 
    ...
} // Automatic close

One also might use:

String path = ...
byte[] content = Files.readAllBytes(Paths.get(path));
String s = new String(content, StandardCharsets.UTF_8);

Using StandardCharsets constants for the standard available CharSets in the distribution of JavaSE means you do not need to handle an UnsupportedEncodingException (UTF-8 being always available).

The error stems from another source. The console (IDE or operating system command line) probably uses the platform encoding, and might not be able to convert those Unicode chars.

Edit the file with a capable programmer's editor like the free NotePad++ (Windows) or JEdit. They can handle encodings.

You can also do a byte dump to check whether the displayed ? indeed is a question mark in the string:

System.out.println(Arrays.toString(string.getBytes(StandardCharsets.UTF_8)));
System.out.println(string.contains("?"));

The not-shown server communication seems the culprit, the server should set the encoding to UTF-8, and the client do the get with header

Accept-Encoding: UTF-8

and read the response in UTF-8. That can be tested by a manual URL in the browser. Check the HTML source to see whether Unicode is not given as entities (&12345;).

0
Jagdish Bhavsar On

You can try to remove it by
Spanned spanned = Html.fromHtml(stringBuilder.toString(), this, null); Try to print the spanned text .