Convert Erlang UTF-8 encoded string to java.lang.String

993 views Asked by At

The Java node receives an Erlang string encoded in UTF-8. Its class type is OtpErlangString. If I simply do .toString() or .stringValue() the resulting java.lang.String has invalid codepoints (basically every byte from the Erlang string is considered distinct character).

Now, I want to use new String(bytes, "UTF-8") when creating the Java String but how to get the bytes from the OtpErlangString?

1

There are 1 answers

1
Wacław Borowiec On

It's strange you get OtpErlangString on Java side when you use UTF8 characters. I get object of this type if I use ASCII characters only. If I add at least one UTF8 character, the resulting type is OtpErlangList (which is logical as strings are just lists of ints in Erlang) and then I can use its stringValue() method. So that after sending string form Erlang like:

(waco@host)8> {proc, java1@host} ! "ąćśźżęółńa".
[261,263,347,378,380,281,243,322,324,97]

On Java node I receive and print it with:

OtpErlangList l = (OtpErlangList) mbox.receive();
System.out.println(l.stringValue());

The output is correct:

ąćśźżęółńa

However, if its not the case in your situation, you could try to work it around by forcing OtpErlangList representation by e.g. adding an empty tuple as the very first element of the string list:

(waco@wborowiec)11> {proc, java1@wborowiec} ! [{}] ++ "ąćśźżęółńa".
[{},261,263,347,378,380,281,243,322,324,97]

And on Java side something like:

OtpErlangList l = (OtpErlangList) mbox.receive();
// get rid of an extra tuple
OtpErlangObject[] strArr = Arrays.copyOfRange(l.elements(), 1, l.elements().length);
OtpErlangList l2 = new OtpErlangList(strArr);
System.out.println(l2.stringValue());