How can one get raw byte values into a varchar2 column in oracle?

7.3k views Asked by At

We have an old client talking to a legacy server app. In Turkey, it sends text as windows-1254. We store it and send it back.

A row in the database has a name which is "İ", which is a capital I with a dot on top. In windows-1254, this is a 0xdd character and in UTF-8, that is 0xc4b0.

If I look in the database, I see this:

SQL> select dump(name, 16) from thing where other thing;

DUMP(NAME,16)
--------------------------------------------------------------------------------
Typ=1 Len=2: c3,9d

Weird. As someone pointed out in another question, though...

The "İ" character is 0xdd in windows-1254. And it turns out that 0xdd in windows-1252 is the "Ý" character, which is 0xc39d in UTF-8. Hence what we see dumped.

What we think we want to do is this, but it obviously does not work:

SQL> update thing set name = UTL_RAW.CAST_TO_VARCHAR2(UTL_RAW.CONVERT(HEXTORAW('dd'), 'CP1254', 'UTF8')) where otherthing;
update thing set name = UTL_RAW.CAST_TO_VARCHAR2(UTL_RAW.CONVERT(HEXTORAW('dd'), 'CP1254', 'UTF8')) where otherthing
                                                        *
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "SYS.UTL_RAW", line 327

Huh?

Lots of doc discusses the UTL_RAW.CONVERT and nobody shows an example of it actually being used. Why is that? Is there a list of the allowed NLS Character Set Names? I cannot find one. Any suggestions? I can find a list of all the encodings that java knows about, but I cannot find this list for Oracle.

I have a java application that I am using to write data that must be read by the legacy software. That app receives UTF-8. I can successfully translate that to windows-1254 bytes. I could translate those to windows-1252. If I then translate those to UTF-8, I can write that to the database with:

SQL> update this set name = UTL_RAW.CAST_TO_VARCHAR2(hextoraw('c39d')) where otherthing;

1 row updated.

SQL> select dump(name, 16) from thing where otherthing;

DUMP(NAME,16)
--------------------------------------------------------------------------------
Typ=1 Len=2: c3,9d

And the client shows this row as "İ". But, you know, wow. That just seems absurd. But if that is what will work, that might be what has to happen....

1

There are 1 answers

0
Ben On

What's not explained in the docs is that, according to UTL_RAW, a characterset is made up of 3 things; the NLS_LANGUAGE, the NLS_TERRITORY and the characterset itself. To see a list of valid values you can query V$NLS_VALID_VALUES. The full list of languages and territories is also available in the documentation.

This throws up the first of your problems. According to Oracle Win-1254 is not CP1254 but TR8MSWIN1254. Equally, although the UTF8 characterset exists I suspect your database is set up using AL32UTF8. You can double check by querying NLS_DATABASE_PARAMETERS.

So, if this is in Turkish, and you're in Turkey let's assume your language and territory is just that TURKISH_TURKEY.

Prepending this to the characterset now returns what you want:

select utl_raw.convert( hextoraw('dd')
                      , 'TURKISH_TURKEY.AL32UTF8'
                      , 'TURKISH_TURKEY.TR8MSWIN1254'
                      ) as raw_char
  from dual;

RAW_CHAR
-----------------------------------------------------

C4B0

As you've already noted 0xc4b0 is the representation of İ in UTF-8, so you can use UTL_RAW.CAST_TO_VARCHAR2 as you intended 1:

select utl_raw.cast_to_varchar2(
           utl_raw.convert( hextoraw('dd')
                          , 'TURKISH_TURKEY.AL32UTF8'
                          , 'TURKISH_TURKEY.TR8MSWIN1254'
                            )) as new_char
  from dual;

1. I don't have a text based Oracle client that can represent this character; sorry!