i am using CharsetDetector to detect the charset of a text file.
This is the code to detect the charset of the given file:
private String getCharset(File file) {
String charset = "";
try {
InputStream is = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(is);
CharsetDetector cd = new CharsetDetector();
cd.setText(bis);
CharsetMatch cm = cd.detect();
if (cm != null) {
Reader reader = cm.getReader();
charset = cm.getName();
}
bis.close();
is.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return charset;
}
For a ASCII text file it returns UTF-8. ASCII is a subset of UTF-8 but i like to detect ASCII if it is ascii only and UTF-8 if there is a sign which is not in ASCII.
But how can i check it?
First, I reviewed your code and want to put out some hints:
My suggestion would be to improve your method like this:
This method is more efficient because it only requires a single pass through the file and stops as soon as it finds a non-ASCII character. However, it doesn't use the CharsetDetector class. If you need to detect character sets other than ASCII and UTF-8, you might need a more complex solution.
Good Luck!