Java decode windows-1251 rtf to utf-8

1.1k views Asked by At

I have a .rtf file. The file is in windows-1251 encoding.

I need to save this string to another file, and I need to save it in utf-8 encoding. And I need this file to be well-readable in result.

So, I try a lot of variants, I read java-docs, and other sources, I spent 2 days in searching for answer, but still, I can't convert it to well-readable file

Here is a file with that string, that you can download to run my tests

That is image content of file

enter image description here

Here is my java tests, that you can use and try to convert file

This is a short cases of my code from file

@Test
public void windows1251toUtf8() throws IOException {
    //Prepare file
    File dir = new File("/tmp/TESTS/");
    if (!dir.exists() && !dir.mkdirs()) {
        throw new RuntimeException("Cant create destination dir");
    }
    File destination = new File(dir, "test.rtf");
    if (!destination.exists() && !destination.createNewFile()) {
        throw new RuntimeException("Cant create destination file");
    }

    //-----------------------------------------------------------------------------------------

    //Not work
    InputStream inputStream = getClass().getClassLoader().getResourceAsStream("utils/encoding/windows1521File.rtf");
    Scanner sc = new Scanner(inputStream, "WINDOWS-1251");
    StringJoiner stringBuilder = new StringJoiner("\n");
    while (sc.hasNextLine()) {
        stringBuilder.add(sc.nextLine());
    }

    String text = decode(stringBuilder.toString(), "WINDOWS-1251", "UTF-8");

    byte[] bytes = text.getBytes(Charset.forName("UTF-8"));

    Files.write(bytes, destination);


    //-----------------------------------------------------------------------------------------

    //Not work
    URL resource = getClass().getClassLoader().getResource("utils/encoding/windows1521File.rtf");
    String string = FileUtils.readFileToString(new File(resource.getPath()), Charset.forName("WINDOWS-1251"));

    byte[] bytes = convertEncoding(string.getBytes(), "WINDOWS-1251", "UTF-8");

    FileUtils.writeByteArrayToFile(destination, bytes);

    //-----------------------------------------------------------------------------------------

    //Not work
    InputStream inputStream = getClass().getClassLoader().getResourceAsStream("utils/encoding/windows1521File.rtf");

    byte[] bytes = IOUtils.toByteArray(inputStream);
    String s = new String(bytes);
    byte[] bytes2 = s.getBytes("WINDOWS-1251");

    FileUtils.writeByteArrayToFile(destination, bytes2);
}

public static byte[] convertEncoding(byte[] bytes, String from, String to) throws UnsupportedEncodingException {
    return new String(bytes, from).getBytes(to);
}

public static String decode(String text, String textCharset, String resultCharset) {
    if (StringUtils.isEmpty(text)) {
        return text;
    }

    try {
        byte[] bytes = text.getBytes(textCharset);
        ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes);
        byte[] tmp = new byte[bytes.length];
        int n = inputStream.read(tmp);
        byte[] res = new byte[n];
        System.arraycopy(tmp, 0, res, 0, n);
        return new String(res, resultCharset);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

In all cases in result, I catch something like this

enter image description here

Or like this

enter image description here

Is there any way to do conversion?

0

There are 0 answers