UTF-8 encoded j_security_check username incorrectly decoded as Latin-1 in Tomcat realm

1.4k views Asked by At

I'm investigating an issue where a username with Latin-1 character is introduced in a login form. The username contains character á. I investigate the server part where I have:

public class MyRealm extends RealmBase implements Realm { public Principal authenticate(String username, String password) { ... actual authentication implemented here } }

If I print out the bytes : username.getBytes() I see that character á has: C3 83 C2 A1 Normally character á in UTF8 encoding shoul have : C3 A1. If I encode this in UTF8 again the I get: C3 83 C2 A1 what my software prints out.

I checked in the network capturing that the username is sent correctly with C3 A1. The login page form's source code is:

        <form name="loginForm" action="j_security_check" method="post" enctype="application/x-www-form-urlencoded">
        <table>
            <tr>
                <td colspan="2" align="right">Secure connection:
                    <input type="checkbox" name="checkbox" class="style5" onclick="javascript:httpHttps();"></td>
            </tr>
            <tr>
                <td class="style5">Login:</td>
                <td><input type="text" name="j_username" autocomplete="off" style="width:150px" /></td>
            </tr>

So I think there's nothing wrong (2 times UTF8 conversion) on the client side. If I decode back two times from UTF8 in the authenticate() function the username then authentication works fine, but I'm afraid to apply this solution to my problem

Where should I look for this encoding of the username in the Realm's authenticate(String username, String password) function ? The server side is running on a linux (RedHat) with httpd-2.2.15 and tomcat6-6.0.24.

1

There are 1 answers

1
tamasp On BEST ANSWER

In your example your form is sending UTF-8 char for 'á' to Tomcat utilizing % encoding (so over the wire it is %C3%A1). However Tomcat will interpret it as Latin1 which is the default encoding for POST.

So Tomcat will store C3A1 as 'á' internally since C3 is 'Ã' and A1 is '¡' in Latin1 encoding.

When you asks for username.getBytes() it will create an UTF-8 encoded byte array, so it looks up the two characters of 'á' in the UTF-8 character set which is C383 C2A1.

The FAQ that describes this in detail and the proposed solution: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q3

Change the Valve of the FormAuthenticator in server.xml to specify characterEncoding="UTF-8"

    <Context path="/YourSercureApp">
            <Valve
            className="org.apache.catalina.authenticator.FormAuthenticator"
            disableProxyCaching="false"
            characterEncoding="UTF-8" />
    </Context>