XMLPullParser black diamond question marks with certain characters

Question

XMLPullParser black diamond question marks with certain characters

889 views Asked by Mark K. At 19 September 2024 at 04:14

I'm making an android app, that needs to fetch and parse XML. The class for that was made following the instructions from here http://www.tutorialspoint.com/android/android_rss_reader.htm and the fetcher method looks like this:

public void fetchXML() {
    Thread thread = new Thread(new Runnable() {
        @Override
        public void run() {

            try {
                URL url = new URL(urlString);
                HttpURLConnection conn = (HttpURLConnection) url.openConnection();


                conn.setReadTimeout(10000 /* milliseconds */);
                conn.setConnectTimeout(15000 /* milliseconds */);
                conn.setRequestMethod("GET");
                conn.setDoInput(true);


                // Starts the query
                conn.connect();
                InputStream stream = conn.getInputStream();

                xmlFactoryObject = XmlPullParserFactory.newInstance();
                xmlFactoryObject.setValidating(false);
                xmlFactoryObject.setFeature(Xml.FEATURE_RELAXED, true);
                xmlFactoryObject.setNamespaceAware(true);

                XmlPullParser myparser = xmlFactoryObject.newPullParser();
                //myparser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
                myparser.setInput(new InputStreamReader(stream, "UTF-8"));

                parseXMLAndStoreIt(myparser);
                stream.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });
    thread.start();
}

Parser looks like the one in tutorial, with my parsing logic in it.

As you can see from

 myparser.setInput(new InputStreamReader(stream, "UTF-8"));

I'm using UTF-8 charset. Now when I use getText() method in my parser for example on the word 'Jõhvi', the logcat output is 'J�hvi'. It's the same for other characters of my native language, Estonian, that aren't in English alphabet. I need to use this string as a key and in the user interface, so this isn't acceptable. I'm thinking it's a charset problem, but there is no info at the XML site I'm pulling this from and using

conn.getContentEncoding()

returns null so I'm in the dark here.

Original Q&A

There are 1 answers

**kris larson** · Accepted Answer · 2015-06-23 02:40:55

Content encoding and character encoding are not the same thing.

Content encoding refers to compression such as gzip. Since getContentEncoding() is null, that tells you there's no compression.

You should be looking at conn.getContentType(), because the character encoding can usually be found in the content-type response header.

conn.getContentType() might return something like:

text/xml; charset=ISO-8859-1

so you will have to do some parsing. Look for the character set name after "charset=" but be prepared for the case where the mime type is specified but the charset is not.

TechQA.

XMLPullParser black diamond question marks with certain characters

There are 1 answers

Related Questions in ANDROID

Related Questions in XML

Related Questions in CHARACTER-ENCODING

Related Questions in XMLPULLPARSER

Popular Questions

Popular Tags

Trending Questions