23 November 2009

Revenge of the Multibyte Characters

In The Netherlands, where I'm from, we do have Multibyte characters. Names with diacritic characters occur frequently.
In a prior post I've written about using Oracle text to perform diacritic searches.

This post is about the table definition. Every now and then this problem rears its ugly head, that's why I decided to write this little note on it.

Let's look at the problem at hand. First we'll create a table:
SQL> create table test
  2  (name varchar2 (7))
  3  /

Table created.

No problem so far.. but what happens when we try to insert a name with diacritics into this table?

SQL> insert into test values ('Alèx')
  2  /

1 row created.
No problem. Or so it seems... what is the length of the name that I just inserted into the table?

SQL> select length (name)
  2    from test
  3  /


Even though the name contains a diacritic character, the length is four. But what if we insert a name with length seven with diacritic characters?

SQL> rollback
  2  /

Rollback complete.

SQL> insert into test values ('Thérèse')
  2  /
insert into test values ('Thérèse')
ERROR at line 1:
ORA-12899: value too large for column "ALEX"."TEST"."NAME" (actual: 9, maximum: 7)

The error message is not quite clear. It says the name is too long, but when you count them like

T h é r è s e 
1 2 3 4 5 6 7
it should match.... Of course it has to do with the diacritic characters in the name.

We can check this by using the LENGTHB function (notice the "B" at the end of Length)
SQL> with local_name
  2  as
  3  (select 'Thérèse' name
  4     from dual
  5  )
  6  select length (name)
  7       , lengthb (name)
  8    from local_name
  9  /

------------ -------------
           7             9

When we query the length in Bytes, it becomes obvious. The last name we inserted was too long in Bytes. How to fix this? Either increase the length of the column, or change it from BYTE to CHAR (and I don't mean the datatype CHAR)

SQL> drop table test
  2  /

Table dropped.

SQL> create table test
  2  (name varchar2 (7 char))
  3  /

Table created.
Now the column will allow up to seven characters in it regardless of the byte size of the data. Or at least that's how I interpret the documentation.
The name which we couldn't insert with a prior INSERT statement, now comfortably fits into the column:
SQL> insert into test values ('Thérèse')
  2  /

1 row created.

So, when you use Multibyte characters, check you Length Semantics!

You can check your setting with this statement:
SQL> show parameter nls_length_semantics

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
nls_length_semantics                 string      BYTE

And just to clean up the test table.
SQL> drop table test
  2  /

Table dropped.

Oracle Documentation


  1. thanks a ton. it helped me a lot :)

  2. Gr8 article Alex. was very helpful....
    Can u also add about how to identify what Character-set of Oracle is set at our end....

  3. Multi-byte characters had caused me a lot of pain.

    Do you have a suggestion for this problem:

    I have a CLOB field that *might* contains some multi-byte characters, and I need to select in SQL and convert this field into a string for downstream process, currently I am using:

    SELECT DBMS_LOB.SUBSTR( description, 4000, 1 ) FROM table

    But the 4000 in above command is in length of characters, rather than bytes. So I had change to 3000 to handle any multi-byte characters that might have crept into the data.
    The problem is for data that do not contain multibyte character, it might unnecessarily truncated more data than it need to.
    (The 4000 is the string limitation, we can/had to live with that.)

    Is there a way to do something in equivalent of:

    SELECT DBMS_LOB.SUBSTR( description, 4000bytes, 1 ) FROM table

    That way I can get as much data out as possible.

    Note: I am not allowed to create temp tables/views, not using PL/SQL, only SQL SELECT...

  4. Thanks for this article it helped me a lot...