Java UTF-8 filenames with IBM JVM (AIX)

5.3k views Asked by At

I'm having trouble understanding the way the IBM JVM's implementation of java.io.File deals with UTF-8 on AIX on the JFS2 filesystem. I suspect there's a system property that I'm overlooking, but I have not yet been able to find it.

Let's assume I have a file named othér (where é is U+00E9 or UTF-8 bytes0xc3 0xa9). The filename is encoded in UTF-8, and was created by a C program:

char filename[] = { 'o', 't', 'h', 0xc3, 0xa9, 'r', 0 };
open(filename, O_RDWR|O_CREAT, 0666);

If I create a Unicode string in Java that is representative of the filename, it fails to open it. Further, if I use File.listFiles() in Java, it insists on treating this as a Latin1 string. For example:

String expectedName = new String(new char[] { 'o', 't', 'h', 0xe9, 'r' });
File expected = new File(expectedName);
if (expected.exists())
    System.out.println(expectedName + " exists");
else
    System.out.println(expectedName + " DOES NOT exist");

for (File child : new File(".").listFiles())
{
    System.out.println(child.getName());
    System.out.print("Chars:");
    for (char c : child.getName().toCharArray())
        System.out.print(" 0x" + Integer.toHexString((int)c));
    System.out.println();
}

The results of this program are:

% java -Dfile.encoding=UTF8 FileTest
othér DOES NOT exist
othér
Chars: 0x6f 0x74 0x68 0xc3 0xa9 0x72

So it appears that my filenames are getting treated as Latin1. I've tried setting the file.encoding system property to UTF8 and the client.encoding.override system property to UTF-8 to no avail. My LANG and LC_ALL settings are en_US.UTF-8:

% echo $LANG
en_US.UTF-8
% echo $LC_ALL
en_US.UTF-8

My system's "Primary Language Environment", as configured by SMIT, is "ISO8859-1". I don't really know the full impact this setting has, but I cannot change it. I suspect that if I could change this to "UTF8 English" then that may fix the problem, but since JFS2 stores filenames in Unicode and Java operates in Unicode internally, I feel like there should be a more general solution to the problem.

Is there another system property to J9 that I can set that will make force it to use UTF-8 filenames regardless of my SMIT setting?

AIX version is 5.2, Java version is IBM J9 (1.5.0), filesystem is JFS2:

rs6000% uname -a
AIX rs6000 2 5 000A9B7C4C00
rs6000% java -version
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pap32dev-20091106a (SR11 ))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc-32 j9vmap3223-20091104 (JIT enabled)
J9VM - 20091103_45935_bHdSMr
JIT  - 20091016_1845_r8
GC   - 20091026_AA)
JCL  - 20091106
rs6000% mount|grep /home
         /dev/hd1         /home            jfs2   Jun 27 16:02 rw,log=/dev/hd8 

Update: this still occurs on Java6:

% java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pap3260sr11-20120806_01(SR11))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260sr11-20120801_118201 (JIT enabled, AOT enabled)
J9VM - 20120801_118201
JIT  - r9_20120608_24176ifx1
GC   - 20120516_AA)
JCL  - 20120713_01
2

There are 2 answers

8
durron597 On BEST ANSWER

I found the answer. I really am trying to help here.

This is a blog post about your actual issue. I promise.

Try running your program with the -Dsun.jnu.encoding=UTF-8 flag set.

7
user18428 On

See here http://www.ibm.com/developerworks/java/jdk/aix/118/README.html for a list of valid AIX locales Your exports should look like this i think

  export LC_ALL=EN_US
  export LANG=EN_US