After hours of trial and error and many more spent crawling the web for solutions I am currently at a total loss.
I am successfully using OkHttp to retrieve the source of a webpage in the following way:
Request request = new Request.Builder()
.url(APIURL + Integer.toString(StopIndex) + "/")
.addHeader("Content-Type", "text/html; charset=ISO-8859-1")
.build();
client.newCall(request).enqueue(new Callback() {
@Override
public void onFailure(Call call, IOException e) {
Log.e("OkHttp request issue", e.toString());
}
@Override
public void onResponse(Call call, Response response) throws IOException {
PageSource = response.body().string();
StopActivity.this.runOnUiThread(new Runnable() {
@Override
public void run() {
tv1.setText(PageSource);
}
});
}
});
For testing purposes I am displaying the downloaded String in a TextView and I noticed "�" signs in places where german special letters ("ä", "ö", etc. ) were used. I figured this was an issue with UTF-8 <-> ISO-8859-1 encoding, since the source didn't use "& auml;" or similar but simply "ä" and indeed the target webpage specifies the following:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type" />
I then tried to include the "addHeader" property within the Request.Builder(), but it doesn't change anything with the output. I continued trying weird things with OkHttp interceptors and ByteBuffers, but nothing worked for me, as I was never able to get a hold of the response before it was re-encoded and introduced �s.
How can I tell OkHttp to respect the ISO-8859-1 encoding and prevent it from replacing all special characters ("ä", "ö", "ü", etc. ) with �?
Many thanks in advance and merry Christmas to all of you.
EDIT/ ANSWER:
Using the Guava library from Google I was able to retrieve the correctly encoded page source as follows:
String pageSource = CharStreams.toString(new InputStreamReader(response.body().byteStream(), "ISO-8859-1"));
OkHttp doesn't parse your HTML to read the content-type within it. Instead you need to specify the charset yourself as an argument to
string()
. Even better, get your server to include the proper charset in the response’s content type header.