03 July 2009

Splitting a comma delimited string the RegExp way

This is one of those recurring questions on the Oracle Forums of OTN.
How to split a comma delimited string? Of course there are several options how to tackle this problem. One of the most elegant ones, at least I think so, uses a regular expression.

Let's just look at an example

SQL> with test as
2 (select 'ABC,DEF,GHI,JKL,MNO' str from dual
3 )
4 select regexp_substr (str, '[^,]+', 1, rownum) split
5 from test
6 connect by level <= length (regexp_replace (str, '[^,]+')) + 1
7 /

SPLIT
---------------------------------------------------------------------
ABC
DEF
GHI
JKL
MNO

The first part creates some test data using the WITH clause (aka Subquery Factoring). The actual query with the regular expression starts on line 4.
The expression is

regexp_substr (str, '[^,]+', 1, rownum)

The meaning of "[^,]+" in normal English:
Give me one or more characters which are not in the list

The list consists of all characters between the square brackets. Here the "^" (circumflex) indicates "except" or "not in". The "+" means: one or more times.

The arguments of the REGEXP_SUBSTR determine which part of the string to subtract.
The third argument to the REGEXP_SUBSTR functions tells it where to start with the regular expression. The last argument means which occurence to match.

REGEXP_SUBSTR
Multilingual Regular Expression Syntax

28 comments:

  1. Hi Alex,

    Three remarks:
    - You need to increase the level counter by 1 to not miss the 'MNO' value
    - In 11g, you can be even more elegant by using the regexp_count instead of length(regexp_replace(...))
    - It may be the most elegant solution, it is certainly not the fastest one (http://rwijk.blogspot.com/2007/11/interval-based-row-generation.html).

    Regards,
    Rob.

    ReplyDelete
  2. Man, you're fast ;)

    Yes, you are absolutely right (times three). Guess I was too focused on the expression than on checking the results.

    ReplyDelete
  3. Excellent solution - very elegant

    I may not be bright enough to understand the use of rownum, but I am goint to use your solution. Later I will break the query and try to understand it.

    Thanks,
    Mo.

    ReplyDelete
  4. Hi,

    COl_1
    --------
    10
    35
    40
    4
    8

    Given input range parameter format is comma separated: '1,5,10,35,45'

    i am looking for output:
    COl_1 ----- Range_COL_2
    -------------------------------
    10 ----- 5-10
    35 ----- 10-35
    40 ----- 35-45
    4 ----- 1-5
    8 ----- 5-10

    Any suggestions?

    ReplyDelete
  5. I'd probably create a function for it. Something like this:

    SQL> create table t
    2 as
    3 select 10 col from dual union all
    4 select 35 col from dual union all
    5 select 40 col from dual union all
    6 select 4 col from dual union all
    7 select 8 col from dual
    8 ;

    Table created.

    SQL>
    SQL> create or replace
    2 function ranger (p_range in varchar2
    3 ,p_val in number
    4 )
    5 return varchar2
    6 is
    7 retval varchar2(100);
    8 begin
    9 select lo_num||' - '||hi_num
    10 into retval
    11 from (
    12 select num lo_num
    13 , lead (num, 1, num) over (order by num) hi_num
    14 from (
    15 select to_number (regexp_substr (p_range
    16 , '[^,]+'
    17 , 1
    18 , rownum
    19 )) num
    20 from dual
    21 connect by level <= length (regexp_replace (p_range, '[^,]+')) + 1
    22 ))
    23 where p_val > lo_num
    24 and p_val <= hi_num;
    25 return retval;
    26 end ranger;
    27 /

    Function created.

    SQL>
    SQL> select col
    2 , ranger ('1,5,10,35,45', col)
    3 r
    4 from t
    5 /

    COL R
    ---------- ---------------
    10 5 - 10
    35 10 - 35
    40 35 - 45
    4 1 - 5
    8 5 - 10

    ReplyDelete
  6. Cool......Thanks Alex

    ReplyDelete
  7. I have dataset like this -

    ID ASSIGNED_USERS
    --+---------------------+
    1 joey, anthony, marvin
    5 tony, glenn
    8 john

    I'm looking at converting it to this -

    ID ASSIGNED_USERS
    ---+------------------+
    1 joey
    1 anthony
    1 marvin
    5 tony
    5 glenn
    8 john

    Any idea ?

    Thanks.
    AnthonyJ

    ReplyDelete
  8. Instead of posting a possible solution here, I decided to write a new blog entry (long overdue) which shows the SQL statements a lot nicer as well:
    http://nuijten.blogspot.com/2011/08/splitting-comma-delimited-string-regexp.html

    ReplyDelete
  9. Yet one way without regexp
    + if you have few comma separated lines
    http://sprogram.com.ua/en/articles/oracle-select-from-string-varchar2

    ReplyDelete
  10. ID NAME IMAGES
    ----- -------- --------------------------
    1 sony xyz.jpg,abc.jpg,123.jpg
    2 acer aa.jpg,ppqq.jpg
    3 samsung smng.jpg
    4 nokia ff.jpg,qq.jpg,yy.jpg
    5 dell dd.jpg,11.jpg

    now i want output like this..
    ID NAME IMAGES
    ----- -------- --------------------------
    1 sony xyz.jpg
    1 sony abc.jpg
    1 sony 123.jpg
    2 acer aa.jpg
    2 acer ppqq.jpg
    . ...
    . ...
    . ...
    . ...

    please help me...

    ReplyDelete
    Replies
    1. The way to solve this is described here: http://nuijten.blogspot.nl/2011/08/splitting-comma-delimited-string-regexp.html

      Delete
  11. Thx a lot,your example helped me to solve my problem

    ReplyDelete
  12. Write a packaged function that takes a string separated by commas (say a,b,c,d,e) and returns the words between the commas populated in a PLSQL table. The contents of the PLSQL table should be listed via a SELECT statement as rows. So final output of running the select statement should be like:
    Position String
    -------- ------
    1 a
    2 b
    3 c
    4 d
    5 e

    ReplyDelete
  13. Write a stored procedure that accepts parameters of columns of employee table except employee id and inserts a row using dynamic SQL in EMP table. Assuming that Employee ID is generated by a sequence in a database trigger, print the value of the employee id in this stored procedure.

    ReplyDelete
  14. I need a help with a string of type"ABC/FDF,RET/YRT,UYT/ERT" to split in to individual blocks like ABC,FDF,RET etc. I am able to do the comma splitting and getting the string "ABC/FDF" but not sure how to split again. I am not sure about using arrays in pl/sql

    ReplyDelete
    Replies
    1. In order to do that you need to use the technique described here: http://nuijten.blogspot.co.uk/2011/08/splitting-comma-delimited-string-regexp.html

      To show a full example:
      SQL> var input varchar2(150)
      SQL>
      SQL> exec :input := 'ABC/FDF,RET/YRT,UYT/ERT'

      PL/SQL procedure successfully completed.

      SQL>
      SQL> with commas
      2 as
      3 (select regexp_substr (:input, '[^,]+',1, rownum) str
      4 from dual
      5 connect by level <= regexp_count (:input, '[^,]+'))
      6 select regexp_substr (str, '[^\/]+', 1, rn) split
      7 from commas
      8 cross
      9 join (select rownum rn
      10 from (select max (regexp_count(rtrim (str, '/')||'/', '\/')) mx
      11 from commas
      12 )
      13 connect by level <= mx
      14 )
      15 where regexp_substr (str, '[^\/]+', 1, rn) is not null
      16 ;

      SPLIT
      ----------------------------------
      ABC
      FDF
      RET
      YRT
      UYT
      ERT

      6 rows selected.

      The trick is in line 10 where the RTRIM is. This will make sure that there is an extra / at the end of the input string and thus making the count (which generates the number of rows correct)

      > I am not sure about using arrays in pl/sql
      ... yes, well... I am very sure about arrays in PL/SQL.. :) I don't understand what you want with this remark.

      Delete
  15. Please help,

    regexp_substr (str, '[^,]+', 1, rownum) - i Understand that this will separate at ','

    Say my delimiter is '/n', then how to do ?

    I tried but this is taking as character list, not a single string. It is also avoiding all the 'n' as well in the str instead of searching for '/n'

    ReplyDelete
    Replies
    1. As an example string : 'test value/nnice not/nhoney'

      o/p

      test value
      nice not
      honey

      Delete
    2. what you could do is replace the /n with another character (which is not used in the text) and use the normal trick.
      with test as
      (select 'test value/nnice not/nhoney' str from dual)
      select regexp_substr (regexp_replace (str, '(/n)', '!!'), '[^!!]+', 1, rownum)
      from test
      connect by level <= regexp_count(str, '(/n)') + 1

      Delete
  16. I have a string like a,b,c,d,e,f,g,h,... in a column
    i want to split this string into no of rows after breaking into n succesive position like

    col1
    a,b
    b,c
    c,d

    or

    col1
    a,b,c
    d,e,f
    g,h,i

    ..Can u please suggest me a query to handle this scenario


    ReplyDelete
    Replies
    1. what you want is inconsistent; in the first example letters are duplicated, which is not the case in the second.

      Delete
  17. I have a table Employee with columns employee_id number and employee_name varchar. I need a split procedure or function that enters the record
    (according the comma like first name before ','and after the comma name',' then the second ',') like
    string (adam,alex,hales,jordan,jackob) in employee_name column and store in table
    like
    id Name
    1 adam
    2 alex
    3 hales
    4 jordan
    5 jackob
    kindly provide any solution

    ReplyDelete
  18. hi,

    I have string account/a001, dept/d002, location/L003

    and the expected results is
    column1 colum2 column3 column4 column5 column6
    account a001 dept d002 location L003

    ReplyDelete
  19. Hi, I have a comma separated string

    CustomerID,CONCATINATE(Plan_Code,''XYZ''),BillingAddress,COMPOSITE(SellingID,AddressID)

    and I want to split it in the following way

    CustomerID
    CONCATINATE(Plan_Code,''XYZ'')
    BillingAddress
    COMPOSITE(SellingID,AddressID)

    Any ideas?

    ReplyDelete