Apache html response returns gibberish

239 views Asked by At

I'm trying to get an HTML response from a remote website, and I get something like this :

ס×?×? ×?×? ×?×? ×?×?

instead of Hebrew letters or symbols.

Here is my code:

CloseableHttpClient httpclient = HttpClients.custom()
                    .setDefaultCookieStore(cookieStore)
                    .build();

            HttpGet httpget = new HttpGet(URL);
            CloseableHttpResponse response = httpclient.execute(httpget);
            HttpEntity entity = response.getEntity();
            String s=null;
            if (entity != null) {
                 s= EntityUtils.toString(entity);          
            }   

Does anyone know what the problem is?

1

There are 1 answers

0
Evan Knowles On BEST ANSWER

As per the docs,

The content is converted using the character set from the entity (if any), failing that, "ISO-8859-1" is used.

The default charset is being used because you don't provide one, which doesn't map those characters correctly - you should probably use UTF-8 instead. Try this.

s= EntityUtils.toString(entity, "UTF-8");