I am trying to figure out how to simply exclude the BOM while using the example given by Apache.
I am reading a file from Internal Storage and converting it first into a String
. Then I convert it into ByteArray
so that I get an InputStream
. Then I check with BOMInputStream
for BOMs, since I had errors for "Unexpected Tokens". Now I don't know how to exclude the BOM if I have it.
CODE:
StringBuffer fileContent = new StringBuffer("");
String temp = "";
int ch;
try{
FileInputStream fis = ctx.openFileInput("dataxml");
try {
while( (ch = fis.read()) != -1)
fileContent.append((char)ch);
temp = temp + Character.toString((char)ch);
} catch (IOException e) {
e.printStackTrace();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
InputStream ins = new ByteArrayInputStream(temp.getBytes(StandardCharsets.UTF_8));
BOMInputStream bomIn = new BOMInputStream(ins);
if (bomIn.hasBOM()) {
// has a UTF-8 BOM
}
xpp.setInput(ins,"UTF-8");
parseXMLAndStoreIt(xpp);
ins.close();
The filename is "dataxml", which I store in different Class with openFileOutput
.
I've never used BOMInputStream before but to exclude a byte order mark from the stream you'd just have to read starting at an offset that is one greater than the location of the end of the BOM. Does BOMInputStream have a property indicating the location of the BOM? Also, you can have a look here: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html