Android - SaxParser error: ParseException: At line 1, column 0: not well-formed (invalid token)

5k views Asked by At

I'm having the following exception when trying to parse some XML:

org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: not well-formed (invalid token)

The main issue is that this has only happened in Android 2.2 or 2.3 devices, but the weirdest thing is that the first time I parse the response it is ok, but all the following tries give me the parsing exception.

My code is as follows:

        URL url = new URL("http://m.ideasmusik.com/rss/?ct=mx");
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        //InputSource is = new InputSource("http://m.ideasmusik.com/rss/?ct=mx");
        //is.setEncoding(HTTP.UTF_8);   

        // Parse content
        MusicRSSParser parser = new MusicHandler.MusicRSSParser(); //DefaultHandler
        XMLReader xr = sp.getXMLReader();
        xr.setContentHandler(parser);
        InputSource in = new InputSource(url.openStream());//is.getByteStream());
        in.setEncoding(HTTP.UTF_8);
        xr.parse(in);

The XML is UTF-8 (I've read that is a common problem to have incorrect encoding).

Any guess on what is going wrong? I thought that it could be something with my handler but it crashes before my logic applies, right after the startDocument() method.

i have tried with Url instead of InputStream with the same result.

EDIT

If I go to Application Management and erase app caché, then it works ok, for the first time. How can it be affecting the parsing??

2

There are 2 answers

2
htafoya On BEST ANSWER

Got it!

The problem is that the RSS has a problem!

Not every browser shows it (when they format it with colors they erase the problem), but the source code begins like:

<?xml version=\"1.0\" encoding=\"UTF-8\"?>
      <rss version=\"2.0\">
          <channel>
               <title>Top Canciones</title>
               <link>m.ideasmusik.com/rss/?ct=mx&</link> ...

The problem is that XML can't have & symbols without being escaped.

All the other symbols were escaped in the document but I think they miss that one because it is in the link tag and not as main content.

Somehow on the first run the SAX parser ignores that..

What I did (while the RSS is fixed) was to get the string response and remove that & manually before parsing the XML. I know that is a horrible solution but it's the quickest and easiest solution for the moment.

0
AppiDevo On

but the weirdest thing is that the first time I parse the response it is ok, but all the following tries give me the parsing exception

I had the same problem. It happens on some devices (e.q. Samsung Galaxy S2) and not only on android 2.3 but also on later on. E.g. on Galaxy S2 (4.4.2) it occurs but on the emulator (4.4.2) it doesn't. The problem is probably with caching the request. After the second request string with XML was written and read again with wrong character(s) encoded.

I solved (after a lot of work;) ) my problem with adding simple setUseCaches(false) on my connection:

    URLConnection conn = url.openConnection();
    conn.setUseCaches(false);