Getting response headers with Java, encoding issue

693 views Asked by At

I am using Webharvest to download a file from a website and take its original name.

The Java code that I am working with is:

import org.apache.commons.httpclient.Header;
            import org.apache.commons.httpclient.HttpClient;
            import org.apache.commons.httpclient.HttpStatus;
            import org.apache.commons.httpclient.Header;
            import org.apache.commons.httpclient.methods.GetMethod; 

            HttpClient client = new HttpClient();

            BufferedReader br = null;
            StringBuffer result = new StringBuffer();
            String attachName;

            GetMethod method = new GetMethod(attachmentLink.toString());

            int returnCode; 
            returnCode = client.executeMethod(method);
            Header[] headers = method.getResponseHeader("Content-Disposition");
            attachName = headers[0].getValue();
            attachName = new String(attachName.getBytes());

The result in webharvest is:

attachment; filename="Resoluci�n sobre Mesas de Contrataci�n.pdf"

I cant make it take the letter

ó

After I got the value of the header Content-Disposition into variable attachName, I also tried to decode it, but with no luck:

String attachNamef = URLEncoder.encode(attachName, "ISO-8859-1"); 
                      attachNamef = URLEncoder.decode(attachNamef, "UTF-8");

I was able to determine that the response charset is: ISO-8859-1

method.getResponseCharSet()

P.S. When I see the headers in Firefox Firebug - the value is ok: Content-Disposition

attachment; filename="Resolución sobre Mesas de Contratación.pdf"

1

There are 1 answers

0
bsiamionau On

Apache HttpClient doesn't support non-ascii characters in HTTP headers. Taken from documentation:

The headers of a HTTP request or response must be in US-ASCII format. It is not possible to use non US-ASCII characters in the header of a request or response. Generally this is not an issue however, because the HTTP headers are designed to facilite the transfer of data rather than to actually transfer the data itself. One exception however are cookies. Since cookies are transfered as HTTP Headers they are confined to the US-ASCII character set. See the Cookie Guide for more information.