08 August 2011

Splitting a comma delimited string the RegExp way, Part Two

Over two years ago I wrote about a way to split a comma delimited string using Regular Expresssions. Just a little while ago someone asked how to split it when you have more records involved than just one (as I used in my example).

For this example I use the dataset as AnthonyJ used in the comments. The explanation for the regular expression can be found in the original post.

SQL> with test as
2 (
3 select 1 id, 'joey,anthony,marvin' str from dual union all
4 select 5 id, 'tony,glenn' str from dual union all
5 select 8 id, 'john' str from dual
6 )
7 select id
8 , str
9 , regexp_substr (str, '[^,]+', 1, rn) split
10 from test
11 cross
12 join (select rownum rn
13 from (select max (length (regexp_replace (str, '[^,]+'))) + 1 mx
14 from test
15 )
16 connect by level <= mx
17 )
18 where regexp_substr (str, '[^,]+', 1, rn) is not null
19 order by id
20 ;

---------- ------------------- -------------------
1 joey,anthony,marvin joey
1 joey,anthony,marvin marvin
1 joey,anthony,marvin anthony
5 tony,glenn tony
5 tony,glenn glenn
8 john john

The trick here is in lines 11 through 17. The Cross Join is there to create multiple records, but no more than the longest intended individual words (three in this case).
Because each ID would result in three records, which is alright for ID 1, it will also create three records for ID 5 and ID 8. Line 18 removes these extra records.
If you use Oracle 11g, you can also use REGEXP_COUNT instead of the combination of REGEXP_REPLACE and LENGTH, which would look like this:

join (select rownum rn
from (select max (regexp_count (str, ',') + 1) mx
from test
connect by level <= mx



  1. I was looking for something exactly like this. Thanks a lot, it works like a charm.

  2. Hi there Alex. Thank you for sharing this. I found your post when looking for material for writing about delimited strings in PL/SQL. I am linking to your post in my tutorial (http://oracletuts.net/tutorials/how-to-tokenize-or-parse-a-string-in-plsql/)

    I have been on your blog some times earelier. Keep up the good work!

  3. Thank you very much for this, it was just what I needed.

  4. Thanks... works good!

  5. Thanks, very helpful for me.

    In my case I also needed it to have outer join behaviour so made a slight variation as below:

    with test as
    select 1 id, 'joey,anthony,marvin' str from dual union all
    select 5 id, 'tony,glenn' str from dual union all
    select 8 id, 'john' str from dual union all
    select 9 id, null str from dual
    select id
    , str
    , regexp_substr (str, '[^,]+', 1, rn) split
    from test
    left outer join (select rownum rn
    from (select max (length (regexp_replace (str, '[^,]+'))) + 1 mx
    from test
    connect by level <= mx) splits on splits.rn <= length (regexp_replace (str, '[^,]+'))
    order by id

    This could also be done using an or rn = 1 in the where clause instead

  6. Thank you Anonymous for your suggestion with the Left Outer Join. However, from the example you posted "John" is missing in the split-column. When you amend the second to last line to:
    on splits.rn <= nvl (length(regexp_replace(str,'[^,]+')), 1)
    you will see John show up.
    The NVL was added to that line.

  7. Hi,

    dumb I know but I have a dataset with some records that have a second name in the field EG: Rob (Bobby). So I need to leave Rob in the FNAME field and put Bobby in the AFNAME field.

    I have landed on
    TRIM(REGEXP_REPLACE(FNAME, '\((.*?)\)', '') )'
    To update the FNAME field removing " (Bobby)

    I was trying to use:
    REGEXP_SUBSTR(FNAME, ' \((.*?)\)') but that of course return "(Bobby)"
    which is where I landed on your blog...
    I need an elegant way to only return Bobby without the delimiters.

    Update table
    Set AFNAME= ???,
    SET FNAME= TRIM(REGEXP_REPLACE(FNAME, '\((.*?)\)', '') )'
    Where FNAME like ‘%(%’

    1. Update:

      Not necessarily elegant but it works:
      trim(translate(REGEXP_SUBSTR(FNAME, ' \((.*?)\)'), '()', ' ' ))

  8. This query saved my bacon. There are plenty of examples out there using CONNECT BY, but it only works for one row at a time. Thank you so much.

  9. Hi. One Q regardig the first split post in http://nuijten.blogspot.no/2009/07/splitting-comma-delimited-string-regexp.html

    Now the output is shown in one column and rows.
    Is it possible to send them into a variabel for each of them ?

    Cheers Mate

    1. I'm not sure what you mean. If I understand correctly: you want the results of the query and place each value that you get back into a separate variable? Then you would have to know ahead of time how many variables you would need... unless you select them into a collection (like an Associative Array, Nested Table or a Varray).

  10. HI ALEX I have a requirment like this, there is a table and has three columns


    TEST1;TEST2; COL1|COL2;COL3|COL4; 1|2;3|4;



    TEST1 COL1 1
    COL2 2
    TEST2 COL3 3
    COL4 4

    Thanks in advance

    1. For that requirement you can use the technique described in this blogpost, but do it twice. First you split the content of the first column based on the ";" separator. Next you will split the content of the second column, based on the "|" separator.

      But the real solution is: change your datamodel so you don't have to go through hoops to get the correct values extracted.

  11. Hi, alex thanks for suggestion,

    i tried this what u said but it's not working.

    plz describe in det, i have urgent requirments.


    1. > i tried this what u said but it's not working.
      Let's see what you tried in that case. Maybe I can offer you some suggestions. The method I described works - I just tried it.

      If this is really "urgent", as in "stop what you're doing - I will hire you to write this query for me, no matter what the cost", I will get you in contact with our sales department.
      Remember: this is just a blog where I keep my notes. If my notes help you, or if I can help you with some advice - I would be more than happy to do so. However I will not do your job for you (that's why it's called your job).

      It seems that your keyboard is broken, a number of letters are missing.

  12. Thanks for suggestion.

  13. Thank you so much... it works great!

  14. This comment has been removed by the author.

  15. Thanks Alex for the wonderful solution :). But i would like to understand how does it work. I mean the connect by clause and the flow.

    1. The Connect By-clause is a "trick" which generates a number of rows specified by the value (mx in my example). You can try this by issueing a query like:
      select rownum from dual connect by level <= 10
      which will generate 10 rows.
      The rownum generated by the inner query is used with the REGEXP expression
      regexp_substr (str, '[^,]+', 1, rn)

      Does this help?

    2. This comment has been removed by the author.

    3. If you start breaking down the query - begin with the inner most query - and work your way out, you can see the way the query is structured.

  16. I have table product and i am displaying same products have different rates..but it will print on the same line ex..

    IND0001 $10$2$20$5

    but i want..
    Ind0001 $10
    in dropdown menu..for fetching the row..

    1. So, you have a table with 2 columns (PRODUCT_ID and RATES) and you want to split the RATES column so the enduser can select a RATE? To me it seems that your model is "not ideal" (to put it mildly). How would you otherwise know which RATE the user picked? To me it seems that you would need a PRODUCTS and a RATES table, with a one-to-many relation between them

  17. Hi Alex,
    Can you throw some light on this.
    we are getting the mx value for each record in the test. which is a performance hit .
    Also, let us say one record has a str with 100 words and everything else is less than 3 or 4 words, then in that case for all the records we are doing the cartesian product for 100 times and avoiding the nulls though it had just 3 words.
    Can this be tweaked to get the mx value for the str we are dealing with instead of finding the mx value for all the str records in test. Appreciate your inputs. And Thanks is advance!

    1. Is your claim (the performance hit) based on evidence - or just the assumption that it could be a performance hit? If you tested it, I am curious to find out how you tested it...

      If this (getting the words out of a column) is something that needs to be done frequently, then there might be something wrong with your datamodel. Should the words be a seperate child table? That way you wouldn't need to split the values using this technique.

  18. Alex,

    Thanks for the write-up; this is exactly what I need as a band-aid for my less-than-optimal data model. I implemented the 11g version with regexp_count and it's working great.

    One snag I hit, however, was with filtering out the unneeded rows. For some reason

    where regexp_substr (str, '[^,]+', 1, rn) is not null

    didn't work for me. I checked and made sure there wasn't a space or non-printable character or anything but the rows were still included in the results. Changing the IS NOT NULL check to a LENGTH checked fixed the issue for me:

    WHERE LENGTH( regexp_substr (str, '[^,]+', 1, rn) ) <> 0

    Any clue why the IS NOT NULL version would be failing?

    1. Just a wild stab: is the datatype of your column perhaps a CLOB?

    2. If it is a CLOB, check this: http://nuijten.blogspot.nl/2009/11/empty-clob-is-not-null-its-not-null.html

    3. Bingo! That was exactly the case.

      Who would have guessed, after coming to terms with Oracle's singular interpretation of empty string = NULL, that they would decide empty clob <> NULL. Oh well, live and learn. Thanks for the knowledge!

    4. Reading your blog post I agree that syntactically that is a good use of nullif(). Quite possibly the first good use I've seen.

      I tried it out but it's not good for my use case. In my worst-case example from current production data the CROSS JOIN above is generating about 2700 rows of which 329 are returned. With the simpler LENGTH() <> 0 it finishes in about 16 seconds. With NULLIF(LENGTH(),0) IS NOT NULL it takes about 25. So I'm sticking with former for now.

      While the vast majority of my data will probably only generate a dozen extra rows the worst-case is still to long for me. Next stop is trying to do this with procedurally with a PIPELINED return and see if I can speed it up at all.

      Thanks again.

    5. There is also a suggestion in the comments to use:
      dbms_lob.compare(nvl(text, empty_clob()),empty_clob())= 0

  19. Hi Alex....I need to separate this string called aux_return:
    aux_return = OK91,9702|7,6|11340|59,94|0|641579,34|1790,25. The separator is '|' and I need that:
    var1 := 91,9702;
    var2 := 7,6;
    var3 := 11340;
    var4 := 59,94;
    var5 := 0;
    var6 := 641579,34;
    var7 := 1790,25:

    I need yor help and THANK YOU SO MUCH.

    1. Easy...

      cursor variables is
      SELECT trim(regexp_substr( substr(aux_retorno,3,length(aux_retorno)-2), '[^|]+', 1, LEVEL)) aux_variables
      FROM dual
      CONNECT BY LEVEL <= length(substr(aux_retorno,3,length(aux_retorno)-2)) - length(REPLACE(substr(aux_retorno,3,length(aux_retorno)-2), '|', ''))+1;
      e variables%rowtype;

      open variables;
      fetch variables into e;
      exit when variables%notfound;
      if i = 0 then
      aux_1 := e.aux_variables;
      end if;
      if i = 1 then
      aux_2 := e.aux_variables;
      end if;
      if i = 2 then
      aux_3 := e.aux_variables;
      end if;
      if i = 3 then
      aux_4:= e.aux_variables;
      end if;
      if i = 4 then
      aux_5 := e.aux_variables;
      end if;
      if i = 5 then
      aux_6 := e.aux_variables;
      end if;
      if i = 6 then
      aux_7 := e.aux_variables;
      end if;
      i := i + 1;
      end loop;
      close variables;

  20. Hi Alex,

    Just found out that REGEXP_SUBSTR('field1||field3|field4', '[^|]+' ,1 , 2 ) is returning value 'field3' instead of NULL.
    My quick fix is trim(REGEXP_SUBSTR(replace('field1||field3|field4','||','| |'), '[^|]+' , 1, 2))

    Do you know a better / simpler solution?


    1. No, can't think of a different solution than the one you provided...

  21. Hi Alex.
    i have some different query. i have two table emp, and location:
    emp_name emp_code
    ---------- ------------------
    A 10
    B 12
    C 15
    D 20
    E 21
    F 23
    G 31


    ofc_location office_code
    X 10
    Y 20
    Z 30

    as per data above, i have to find our the location of the emp_name.
    on the basis of emp_code, emp_name A,B,C, belong to location X.
    that means emp_name which are have office code 10(have office_code of X or greater) and less than 20(have office code of Y) belongs to location X.
    emp_code which are between 20-30, these belongs from location Y.

    i.e. my output should be as below
    emp_name emp_code ofc_location
    ---------- ------------- -----
    A 10 X
    B 12 X
    C 15 X
    D 20 Y
    E 21 Y
    F 23 Y
    G 31 Z

    what should be my query to find out this data.

    Would appriciate you help here, and thanks in advance.

    1. Well, this is really not part of the subject of this blogpost but it is a nice little challenge..

      I created the tables "E" and "L" with your data. The query below uses an Analytic Function (LEAD in this case) to solve the problem at hand.

      with office_range as
      (select ofc_location
      ,office_code range_start
      ,lead (office_code) over (order by office_code) range_end
      from l
      select *
      from e
      join office_range o
      on (e.emp_code >= o.range_start
      and (e.emp_code < o.range_end or o.range_end is null))

  22. This seems helpful, but I couldn't get it to work. I copied and pasted your example and I get the following results:
    --------------- ------------------- ---------------
    1 joey,anthony,marvin ,
    1 joey,anthony,marvin ,
    5 tony,glenn ,

    I'm on Any ideas why this isn't working for me?

    1. It's not clear from the formatting, but there is a single comma in SPLIT of each row.


    2. Solved the problem -- in SQL*Plus, the escape character was set to '^'. Sorry for the noise here! Maybe this be of use to someone else -- I hope so.

    3. Nice to see that you solved your own problem :)

  23. Hi,

    I have a tables with below data
    Field1 | Field 2
    CAT1 |23,23,43,76,598,0,33,94,34,50,99,06.76,3s,adf,547,sdf
    CAT2 |hdsfd,dsf,dsfd,dsf,dsf,dfdds,ds,dsds,dsfds,dsf,ds,dsfds,ds

    I need the output in below format
    CAT1 |23,23
    CAT1|33,94 and so on for CAT 1

    Can anyone give me function for above output

    1. sounds like a nice challenge, but currently I don't have the time to figure it out... a first hunch would be to split the string up as described in this blogpost and then use an analytic function like ROW_NUMBER() to determine which two values to concatenate back together.

  24. How do I convert a string like 'A1,A2,A3,...AN' to '||A1||,||A2||...||AN||'. Here the length of the string is varying. Any help would be greatly appreciated.
    I tried the following but the issue I am facing is more than 4000 character for the dynamic query I am trying to build (which is a another issue I can deal with)

    1. What is the objective of what you're trying to do (the bigger picture). When you are trying to create a string to be used as the "WHERE clause", there are better options like this one described by Tom Kyte

  25. I want to create a script to generate insert statements for some tables. The idea is to use the all_tab_columns to get the columns run time instead of manually inputting all the column names for a table. This way I don't have to worry about any change in the table structure. Anyway, I was looping through the column names for a table which gives me buffer issue in UNIX but not in toad. So I searched around and tried to use the following query which will give me the all columns in the format of 'A1,A2,A3' instead of looping through the entire table. Now I want to play with the string to obtain the format I requested (extra Pipes at the start and end for the strings delimited by the commas). I looked around to use regexp_replace and INSTR combination but didn't get what I wanted (due to being newbie to pl/sql programming).

    SELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(column_name, ',')), ',') col INTO v_str1 from ( SELECT rownum rn, column_name FROM (SELECT column_name FROM all_tab_columns WHERE table_name=myTABLE order by column_id) )
    START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1;

    1. the reason for all this exercise is that I keep getting the following error no matter what i try to do..

      ORA-01489: result of string concatenation is too long

    2. I see.. quite a daunting task you set yourself. Although it sounds like a nice plan, it can get sour quickly. There are over a 190 different datatypes you need to consider.
      Why not simply export and import the data using datapump (or EXP and IMP)?
      Or maybe even pipelined table functions?

    3. Depending on your database version a string (VARCHAR2) can not be longer than 4000 in SQL. In Oracle 12c this could be (if the DBA enabled it) 32767.

    4. thanks for the feedback. varchar2 limit is the issue here as the DB version is 11G. The script is to generate insert statement from one db to populate in all other database. imp/exp could be overkill I think for one or few records as I am not dealing with 100s of records. What bothers me the most is that the same scripts works without any issue in TOAD but not through unix. Also, if I run part of my code as standalone (declare/begin/end part) from overall code, the procedure runs fine without any issue. That is confusing me even more because the varchar2 limit is coming in the same procedure. Let me try to identify the string with more than 4000 character and see if I can break it based on the length.

    5. it works in TOAD and not inSQL*Plus? Scroll down to the last record in TOAD and see if you still don't get the error. TOAD only fetches the first few rows. If you didn't run into an error it might be because of a difference in NLS settings between TOAD and SQL*Plus.

    6. No error in TOAD what so ever and it gave me the output that I want. If I forget toad for a moment, even standalone procedure runs fine in sqlplus. Btw, the following query gave me the wanted output for a string (my original request for help). So now trying to play with it :)

      SELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(column_name, '||'',''||')), '||'',''||') into v_str1
      from ( SELECT rownum rn, column_name FROM (SELECT column_name FROM all_tab_columns WHERE table_name=myTable order by column_id) )
      START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1;

    7. so following seems to be the issue after many rounds of testing different way...

      SELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(column_name, '||'''''',''''''||')), '||'''''',''''''||') col into v_str1
      from ( SELECT rownum rn, column_name FROM (SELECT column_name FROM all_tab_columns WHERE table_name=t_type(a) order by column_id) )
      START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1;

      I am storing few tables names in t_type array of varchar and running the above query trough loop. v_str1 is varchar2 variable. The total script has two parts. First part create temp table to prepare for the scripts and second part is the procedure for main logic. So if I run the procedure as stand alone in SQL, it works perfectly fine. As soon as I merge with sql and pl/sql code, it gives me ORA-01489: result of string concatenation is too long. I created two separate sql with 1 as sql code and 2nd as procedure creation. Now tried to call both of the files from third file but same issue. This is confusing as the standalone same code for procedure runs fine and I can see my wanted output on the prompt. If it's the issue with v_str1 variable, then procedure should give me the error. In fact, PL/SQL limit for varchar2 is 4000 which is less than what it has in sql. isn't it? Any hint would be greatly appreciated.

    8. yes, the limit for strings in SQL is 4000 while in PL/SQL it is 32767. In Oracle 12c this can be both 32767 making the integration between SQL and PL/SQL a lot better.

  26. Hi Alex - thanks for the implementation - works very well. Vrolijk Kerstfeest!

  27. i have a string 'a,b,c,d,e' ..i want to print output as

    1 a
    2 b
    3 c
    4 d
    5 e

    Please hep me to solve this..
    Thanks in advance..

    1. Seriously? Did you look at http://nuijten.blogspot.nl/2009/07/splitting-comma-delimited-string-regexp.html
      It has exactly what you want... well, maybe not for the numbers in the first column - you can use ROWNUM for that.
      with test as
      (select 'a,b,c,d,e' str from dual)
      select rownum
      ,regexp_substr (str, '[^,]+', 1, rownum) split
      from test
      connect by level <= length (regexp_replace (str, '[^,]+')) + 1

  28. Thanks for the reply. but the actual query is ....

    Write a packaged function that takes a string separated by commas (say a,b,c,d,e) and returns the words between the commas populated in a PLSQL table. The contents of the PLSQL table should be listed via a SELECT statement as rows. So final output of running the select statement should be like:

    Position String
    -------- ------
    1 a
    2 b
    3 c
    4 d
    5 e

    1. sounds like a homework assignment...
      the correct answer would be :"It's a trick question. There is no way to call a PLSQL table (aka Index-By table or Associative Array) in a SELECT statement".

      But for the hell of it... because it's the last day of the year..
      create type array is object
      (id number
      ,str varchar2(50)
      create type array_list is table of array

      create or replace
      function split (p_str in varchar2)
      return array_list pipelined
      l_array array := array(null,null);
      l_elements number;
      l_str number := 1;
      l_elements := length (regexp_replace (p_str, '[^,]+')) + 1;
      while l_str <= l_elements
      l_array.id := l_str;
      l_array.str := regexp_substr (p_str, '[^,]+', 1, l_str);
      pipe row (l_array);
      l_str := l_str + 1;
      end loop;
      end split;

      select *
      from table (split ('a,b,c,d,e'))

  29. Works great ! Thanks for this.

  30. hi,

    I've a string like 'AAAA-xx/BBBB-yy/CCCC-zz' and i want to print output as


    Thanks in advance,

    1. with test as
      (select 'AAAA-xx/BBBB-yy/CCCC-zz' str
      from dual)
      select substr (str, 1, instr (str||'/', '/',1, rownum)) result
      from test
      connect by level <= regexp_count (str||'/', '/')

  31. Hi Alex.
    What a popular and long-living post :-)
    I'd like to suggest an alternative, so the TEST table will be accessed only once, and the parsing of each row will happen as soon as the row is fetched (with no need to full scan the table before starting the first parsing). This may be significant when processing large result sets.
    As a byproduct, it also handles empty elements (like in ',tony,,,glenn'), which were supported by your original solution for a single-row splitting.

    In Oracle 12c, using Lateral Inline View:

    with test as
    select 1 id, 'joey,anthony,marvin' str from dual union all
    select 5 id, 'tony,glenn' str from dual union all
    select 8 id, 'john' str from dual
    select id,str,split
    from test,
    LATERAL (select regexp_substr (str, '[^,]+', 1, rownum) split from dual connect by level <= regexp_count(str, ',')+1) ;

    [I once wrote about Lateral Inline Views here - http://www.db-oriented.com/2013/08/10/tip004/ - with a conceptually similar example to the one I later showed in my "write less with more" presentation in DOAG 2014 - I gratefully remember you attended :-) ]

    In 11g, a more cumbersome solution, using Collection Unnesting, will achieve the same:

    with test as
    select 1 id, 'joey,anthony,marvin' str from dual union all
    select 5 id, 'tony,glenn' str from dual union all
    select 8 id, 'john' str from dual
    select id,str,column_value split
    from test,
    table(cast(multiset(select regexp_substr (str, '[^,]+', 1, rownum) from dual connect by level <= regexp_count(str, ',')+1) as sys.odcivarchar2list)) ;

    Oren Nakdimon

    1. Thanks Oren, wonderful solutions! I remember the LATERAL solution you presented at the DOAG (and it's still on my "look closer into this" list :)

    2. Oren,

      The 11g solution saved my day!
      Thank you.

  32. The idea is to use the plsql regex commands and split a string with similar characters.
    Ex: AAYYMMXXXXCC should return


    expecting this with sql statement.

  33. Request for the following help on the below:
    I have a parameter table like below:-
    id add_id sub_id div_id per_id
    100 12,132,15 54,56 null null
    141 125,56 null 56,89 1
    145 978,69 null 897,665
    100= (12+132+15)-(54+56)
    141= (125+56)/(56+89)
    145= (978+69)/(897+665)*100
    Is it possible to exend your query for the above ?


    1. > Is it possible to exend your query for the above ?
      Yes, it is... all the components are in the blogpost and the comments to write the query.

  34. how can we convert when there are no values in between like below

    Test1, Test2, , Test4, , , Test6

    1. no different than the examples shown in the blog

  35. hi, I am new to regexp_like, need help.

    data getting-
    I want my output get those records only where abc,cde and ijk is there only(any combination of thses value only)

    I am using this
    where regexp_like(username,'^[(abc|cde|ijk) ,;]+$');

    but its not giving correct answer.

    please help!!!!

  36. Hi i have string like'1,abc|2,cd|3,xy.....'
    i want a function that split the reslut by pipleine and then by commas and the result would be like this id=1 and the value = abc..