I'm writing an android app in java. The app emulates flashcards, with questions on one side and answers on the other.
I am presently slurping a well-formed (as I believe) .xml document (which is produced by a Qt-based program which has no problem reading the output back in) using the following (fairly standard) code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try
{
DocumentBuilder builder = factory.newDocumentBuilder();
Document dom = builder.parse(new File(diskLocation));
Element pack = dom.getDocumentElement();
NodeList flashCards = pack.getElementsByTagName("flashcard");
for (int i=0; i < flashCards.getLength(); i++)
{
FlashCard flashCard = new FlashCard();
Node cardNode = flashCards.item(i);
NodeList cardProperties = cardNode.getChildNodes();
for (int j=0;j<cardProperties.getLength();j++)
{
Node cardProperty = cardProperties.item(j);
String propertyName = cardProperty.getNodeName();
if (propertyName.equalsIgnoreCase("Question"))
{
flashCard.setQuestion(cardProperty.getFirstChild().getNodeValue());
}
else if (propertyName.equalsIgnoreCase("Answer"))
{
flashCard.setAnswer(cardProperty.getFirstChild().getNodeValue());
}
else if
...etc.
Here is a flashcard for learning xml:
<flashcard>
<Question>What is the entity reference for ' " '?</Question>
<Answer>&quot;</Answer>
<Info></Info>
<Hint></Hint>
<KnownLevel>1</KnownLevel>
<LastCorrect>1</LastCorrect>
<CurrentStreak>4</CurrentStreak>
<LevelUp>4</LevelUp>
<AnswerTime>0</AnswerTime>
</flashcard>
As I understand the standard, '<' and '&' need to be escaped ('>' probably should be), but quotes and apostrophes don't (unless they're in attributes), yet when the question and answer for this card are parsed, they come out as What is the entity reference for '
and &
respectively;
The input seems to follow standards. Is the java XMLDom implementation really not standards-compliant, or am I missing something?
I find it very difficult to believe I'm the only one to have (had) this problem, yet I've searched both google and stack overflow and found surprisingly little of direct relevance.
Thank you for any help!
Rob
Edit: I've just realised the file has a !DOCTYPE, but doesn't start with an <?xml
tag.
I wonder if this makes any difference.
From the standard:
which means that either ' or " MUST be escaped in the content of elements.