Invalid UTF-8 start byte 0x8b (at char #2, byte #-1)

Question

Invalid UTF-8 start byte 0x8b (at char #2, byte #-1)

3.2k views Asked by AudioBubble At 17 December 2016 at 21:10

I am trying to parse the atom document from the url 'http://self-learning-java-tutorial.blogspot.in/atom.xml'. While parsing the document, I am getting the error 'Invalid UTF-8 start byte 0x8b (at char #2, byte #-1)'.

Abdera abdera = new Abdera();
        Parser parser = abdera.getParser();

        URL url = new URL("http://self-learning-java-tutorial.blogspot.in/atom.xml");

        Document<Feed> doc = parser.parse(url.openStream(), url.toString());
        Feed feed = doc.getRoot();
        System.out.println(feed.getTitle());
        for (Entry entry : feed.getEntries()) {
            System.out.println("\t" + entry.getTitle());
        }
        System.out.println(feed.getAuthor());

Can any one help me, what is this error about and how to resolve this error?

Original Q&A

There are 1 answers

**Fedor Losev** · Accepted Answer · 2016-12-17T22:04:19+00:00

The response from this URL comes GZIP compressed (you must have something special in your system as in standard java 8 it will not send accept gzip by default and for me your code works just fine).

To handle this you can just uncompress the stream. Note, for other urls you may need to handle the case when response comes uncompressed. Also, don't forget to close resources/streams that you open.

Here is a working sample for your url

    Abdera abdera = new Abdera();
    Parser parser = abdera.getParser();

    URL url = new URL(
            "http://self-learning-java-tutorial.blogspot.in/atom.xml");
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setRequestProperty("Accept-Encoding", "gzip");
    conn.connect();

    try {
        String contentEncoding = conn.getContentEncoding();
        boolean isGzip = contentEncoding != null
                && contentEncoding.contains("gzip");
        try (InputStream in = !isGzip ? conn.getInputStream()
                : new GZIPInputStream(conn.getInputStream())) {
            Document<Feed> doc = parser.parse(in, url.toString());
            Feed feed = doc.getRoot();
            System.out.println(feed.getTitle());
            for (Entry entry : feed.getEntries()) {
                System.out.println("\t" + entry.getTitle());
            }
            System.out.println(feed.getAuthor());
        }
    } finally {
        conn.disconnect();
    }

TechQA.

Invalid UTF-8 start byte 0x8b (at char #2, byte #-1)

There are 1 answers

Related Questions in JAVA

Related Questions in APACHE-ABDERA

Popular Questions

Trending Questions