03 July 2009

Splitting a comma delimited string the RegExp way

This is one of those recurring questions on the Oracle Forums of OTN.
How to split a comma delimited string? Of course there are several options how to tackle this problem. One of the most elegant ones, at least I think so, uses a regular expression.

Let's just look at an example

SQL> with test as
2 (select 'ABC,DEF,GHI,JKL,MNO' str from dual
3 )
4 select regexp_substr (str, '[^,]+', 1, rownum) split
5 from test
6 connect by level <= length (regexp_replace (str, '[^,]+')) + 1
7 /

SPLIT
---------------------------------------------------------------------
ABC
DEF
GHI
JKL
MNO

The first part creates some test data using the WITH clause (aka Subquery Factoring). The actual query with the regular expression starts on line 4.
The expression is

regexp_substr (str, '[^,]+', 1, rownum)

The meaning of "[^,]+" in normal English:
Give me one or more characters which are not in the list

The list consists of all characters between the square brackets. Here the "^" (circumflex) indicates "except" or "not in". The "+" means: one or more times.

The arguments of the REGEXP_SUBSTR determine which part of the string to subtract.
The third argument to the REGEXP_SUBSTR functions tells it where to start with the regular expression. The last argument means which occurence to match.

REGEXP_SUBSTR
Multilingual Regular Expression Syntax

11 comments:

  1. Hi Alex,

    Three remarks:
    - You need to increase the level counter by 1 to not miss the 'MNO' value
    - In 11g, you can be even more elegant by using the regexp_count instead of length(regexp_replace(...))
    - It may be the most elegant solution, it is certainly not the fastest one (http://rwijk.blogspot.com/2007/11/interval-based-row-generation.html).

    Regards,
    Rob.

    ReplyDelete
  2. Man, you're fast ;)

    Yes, you are absolutely right (times three). Guess I was too focused on the expression than on checking the results.

    ReplyDelete
  3. Excellent solution - very elegant

    I may not be bright enough to understand the use of rownum, but I am goint to use your solution. Later I will break the query and try to understand it.

    Thanks,
    Mo.

    ReplyDelete
  4. Hi,

    COl_1
    --------
    10
    35
    40
    4
    8

    Given input range parameter format is comma separated: '1,5,10,35,45'

    i am looking for output:
    COl_1 ----- Range_COL_2
    -------------------------------
    10 ----- 5-10
    35 ----- 10-35
    40 ----- 35-45
    4 ----- 1-5
    8 ----- 5-10

    Any suggestions?

    ReplyDelete
  5. I'd probably create a function for it. Something like this:

    SQL> create table t
    2 as
    3 select 10 col from dual union all
    4 select 35 col from dual union all
    5 select 40 col from dual union all
    6 select 4 col from dual union all
    7 select 8 col from dual
    8 ;

    Table created.

    SQL>
    SQL> create or replace
    2 function ranger (p_range in varchar2
    3 ,p_val in number
    4 )
    5 return varchar2
    6 is
    7 retval varchar2(100);
    8 begin
    9 select lo_num||' - '||hi_num
    10 into retval
    11 from (
    12 select num lo_num
    13 , lead (num, 1, num) over (order by num) hi_num
    14 from (
    15 select to_number (regexp_substr (p_range
    16 , '[^,]+'
    17 , 1
    18 , rownum
    19 )) num
    20 from dual
    21 connect by level <= length (regexp_replace (p_range, '[^,]+')) + 1
    22 ))
    23 where p_val > lo_num
    24 and p_val <= hi_num;
    25 return retval;
    26 end ranger;
    27 /

    Function created.

    SQL>
    SQL> select col
    2 , ranger ('1,5,10,35,45', col)
    3 r
    4 from t
    5 /

    COL R
    ---------- ---------------
    10 5 - 10
    35 10 - 35
    40 35 - 45
    4 1 - 5
    8 5 - 10

    ReplyDelete
  6. Cool......Thanks Alex

    ReplyDelete
  7. I have dataset like this -

    ID ASSIGNED_USERS
    --+---------------------+
    1 joey, anthony, marvin
    5 tony, glenn
    8 john

    I'm looking at converting it to this -

    ID ASSIGNED_USERS
    ---+------------------+
    1 joey
    1 anthony
    1 marvin
    5 tony
    5 glenn
    8 john

    Any idea ?

    Thanks.
    AnthonyJ

    ReplyDelete
  8. Instead of posting a possible solution here, I decided to write a new blog entry (long overdue) which shows the SQL statements a lot nicer as well:
    http://nuijten.blogspot.com/2011/08/splitting-comma-delimited-string-regexp.html

    ReplyDelete
  9. Yet one way without regexp
    + if you have few comma separated lines
    http://sprogram.com.ua/en/articles/oracle-select-from-string-varchar2

    ReplyDelete
  10. ID NAME IMAGES
    ----- -------- --------------------------
    1 sony xyz.jpg,abc.jpg,123.jpg
    2 acer aa.jpg,ppqq.jpg
    3 samsung smng.jpg
    4 nokia ff.jpg,qq.jpg,yy.jpg
    5 dell dd.jpg,11.jpg

    now i want output like this..
    ID NAME IMAGES
    ----- -------- --------------------------
    1 sony xyz.jpg
    1 sony abc.jpg
    1 sony 123.jpg
    2 acer aa.jpg
    2 acer ppqq.jpg
    . ...
    . ...
    . ...
    . ...

    please help me...

    ReplyDelete
    Replies
    1. The way to solve this is described here: http://nuijten.blogspot.nl/2011/08/splitting-comma-delimited-string-regexp.html

      Delete