Pretty much what the title says. I'm writing code that needs to be able to work both with BOM'ed and non-BOM'ed files. Different parsing options need to be implemented, for now I'm implementing support for parsing CSV files.
Code below is a rough idea of what I'm working with. If need be, I can provide a minimum working example.
class LocalFileAccess {
// ...
// Opens an input stream to the file based on the path passed in constructor.
// Part of a larger interface, can't change the signature.
@Override
public InputStream getInputStream() throws FileNotFoundException {
File file = new File(this.path);
if (!file.isAbsolute()) {
file = getFile(this.base, this.path);
}
return new FileInputStream(file);
}
public void foo() {
try (BOMInputStream inputStream = new BOMInputStream(this.getInputStream())) {
Iterator<String[]> iterator = new CSVReaderBuilder(new InputStreamReader(inputStream, StandardCharsets.UTF_8).build().iterator();
String[] header = iterator.next(); // <- first value is prepended by BOM
} catch (...) { ... }
}
Later in the codebase, when parsing through the values gotten from the Iterator, the first value in the header is prepended with the BOM, which causes my tests to fail. The hacky way is to check for this manually, but I'd rather keep my code clean.
Wrapping the return value of getInputStream() in new BOMInputStream() fixes it. However, replacing new BOMInputStream(this.getInputStream()) in the try-with-resources with just this.getInputStream() breaks it again: the BOM gets through.
I've tried different variations of wrapping only the return value of getInputStream in a BOMInputStream, wrapping only the InputStream in try-with-resources in a BOMInputStream, to no avail. The only solution seems to be wrapping return value of getInputStream in the try-with-resources in a BOMInputStream and I don't understand why.
Why do I need to wrap the input stream in a BOMInputStream twice?
Edit: to clarify: I'm using the Apache Commons IO BOMInputStream.
Not wishing my last comment to imply there's something wrong with Commons
BOMInputStream(since I couldn't believe they'd be incompetent enough to fail to read the stream properly in the absence of a BOM) I decided to test it. As I expected, it's perfectly capable of reading the file with or without BOM:Source:
Data files contents:
Run and output: