Notes on Oracle: Splitting a comma delimited string the RegExp way, Part Two

08 August 2011

Splitting a comma delimited string the RegExp way, Part Two

Over two years ago I wrote about a way to split a comma delimited string using Regular Expresssions. Just a little while ago someone asked how to split it when you have more records involved than just one (as I used in my example).

For this example I use the dataset as AnthonyJ used in the comments. The explanation for the regular expression can be found in the original post.


SQL> with test as
  2  (
  3  select 1 id, 'joey,anthony,marvin' str from dual union all
  4  select 5 id, 'tony,glenn' str from dual union all
  5  select 8 id, 'john' str from dual
  6  )
  7  select id
  8       , str
  9       , regexp_substr (str, '[^,]+', 1, rn) split
 10    from test
 11    cross
 12    join (select rownum rn
 13            from (select max (length (regexp_replace (str, '[^,]+'))) + 1 mx
 14                    from test
 15                 )
 16         connect by level <= mx
 17         )
 18   where regexp_substr (str, '[^,]+', 1, rn) is not null
 19   order by id
 20  ;

        ID STR                 SPLIT
---------- ------------------- -------------------
         1 joey,anthony,marvin joey
         1 joey,anthony,marvin marvin
         1 joey,anthony,marvin anthony
         5 tony,glenn          tony
         5 tony,glenn          glenn
         8 john                john

The trick here is in lines 11 through 17. The Cross Join is there to create multiple records, but no more than the longest intended individual words (three in this case).
Because each ID would result in three records, which is alright for ID 1, it will also create three records for ID 5 and ID 8. Line 18 removes these extra records.
If you use Oracle 11g, you can also use REGEXP_COUNT instead of the combination of REGEXP_REPLACE and LENGTH, which would look like this:


  cross 
  join (select rownum rn 
          from (select max (regexp_count (str, ',') + 1) mx
                  from test
               )
       connect by level <= mx
       )

Regexp_Count

73 comments:

AnonymousDecember 5, 2011 at 11:21 PM
I was looking for something exactly like this. Thanks a lot, it works like a charm.
ReplyDelete
Replies
TJ AbrahamsenMarch 15, 2012 at 7:07 PM
Hi there Alex. Thank you for sharing this. I found your post when looking for material for writing about delimited strings in PL/SQL. I am linking to your post in my tutorial (http://oracletuts.net/tutorials/how-to-tokenize-or-parse-a-string-in-plsql/)

I have been on your blog some times earelier. Keep up the good work!
ReplyDelete
Replies
AnonymousApril 18, 2012 at 11:24 AM
Thank you very much for this, it was just what I needed.
ReplyDelete
Replies
AnonymousMay 8, 2012 at 8:00 PM
Thanks... works good!
ReplyDelete
Replies
AnonymousJuly 10, 2012 at 4:36 AM
Thanks, very helpful for me.

In my case I also needed it to have outer join behaviour so made a slight variation as below:

with test as
(
select 1 id, 'joey,anthony,marvin' str from dual union all
select 5 id, 'tony,glenn' str from dual union all
select 8 id, 'john' str from dual union all
select 9 id, null str from dual
)
select id
, str
, regexp_substr (str, '[^,]+', 1, rn) split
from test
left outer join (select rownum rn
from (select max (length (regexp_replace (str, '[^,]+'))) + 1 mx
from test
)
connect by level <= mx) splits on splits.rn <= length (regexp_replace (str, '[^,]+'))
order by id
;

This could also be done using an or rn = 1 in the where clause instead
ReplyDelete
Replies
Alex NuijtenJuly 11, 2012 at 11:04 AM
Thank you Anonymous for your suggestion with the Left Outer Join. However, from the example you posted "John" is missing in the split-column. When you amend the second to last line to:
on splits.rn <= nvl (length(regexp_replace(str,'[^,]+')), 1)
you will see John show up.
The NVL was added to that line.
ReplyDelete
Replies
AnonymousAugust 3, 2012 at 3:35 AM
Hi,

dumb I know but I have a dataset with some records that have a second name in the field EG: Rob (Bobby). So I need to leave Rob in the FNAME field and put Bobby in the AFNAME field.

I have landed on
TRIM(REGEXP_REPLACE(FNAME, '$(.*?)$', '') )'
To update the FNAME field removing " (Bobby)

I was trying to use:
REGEXP_SUBSTR(FNAME, ' $(.*?)$') but that of course return "(Bobby)"
which is where I landed on your blog...
I need an elegant way to only return Bobby without the delimiters.

Update table
Set AFNAME= ???,
SET FNAME= TRIM(REGEXP_REPLACE(FNAME, '$(.*?)$', '') )'
Where FNAME like ‘%(%’
…
ReplyDelete
Replies
UnknownSeptember 18, 2012 at 7:47 PM
This query saved my bacon. There are plenty of examples out there using CONNECT BY, but it only works for one row at a time. Thank you so much.
ReplyDelete
Replies
AnonymousOctober 24, 2012 at 12:38 PM
Hi. One Q regardig the first split post in http://nuijten.blogspot.no/2009/07/splitting-comma-delimited-string-regexp.html

Now the output is shown in one column and rows.
Is it possible to send them into a variabel for each of them ?

Cheers Mate

ReplyDelete
Replies
AnonymousAugust 30, 2013 at 3:51 PM
HI ALEX I have a requirment like this, there is a table and has three columns

API FIELD_NAME FIELD_VALUE

TEST1;TEST2; COL1|COL2;COL3|COL4; 1|2;3|4;

AND I WANT OUTPUT LIKE THIS

API FIELD_NAME FIELD_VALUE

TEST1 COL1 1
COL2 2
TEST2 COL3 3
COL4 4

Thanks in advance

ReplyDelete
Replies
AnonymousSeptember 2, 2013 at 1:19 PM
Hi, alex thanks for suggestion,

i tried this what u said but it's not working.

plz describe in det, i have urgent requirments.

Thanks.
ReplyDelete
Replies
AnonymousSeptember 3, 2013 at 7:48 AM
Thanks for suggestion.
ReplyDelete
Replies
AnbuSeptember 20, 2013 at 8:46 AM
Thank you so much... it works great!
ReplyDelete
Replies
JingSeptember 27, 2013 at 2:53 PM
This comment has been removed by the author.
ReplyDelete
Replies
DeeptiFebruary 10, 2014 at 11:14 AM
Thanks Alex for the wonderful solution :). But i would like to understand how does it work. I mean the connect by clause and the flow.
ReplyDelete
Replies
Arpit JainFebruary 18, 2014 at 11:05 AM
I have table product and i am displaying same products have different rates..but it will print on the same line ex..

PRODUCT_ID RATES
IND0001 $10$2$20$5

but i want..
Ind0001 $10
$20
$5
in dropdown menu..for fetching the row..
ReplyDelete
Replies
AnonymousMarch 13, 2014 at 6:39 PM
Hi Alex,
Can you throw some light on this.
we are getting the mx value for each record in the test. which is a performance hit .
Also, let us say one record has a str with 100 words and everything else is less than 3 or 4 words, then in that case for all the records we are doing the cartesian product for 100 times and avoiding the nulls though it had just 3 words.
Can this be tweaked to get the mx value for the str we are dealing with instead of finding the mx value for all the str records in test. Appreciate your inputs. And Thanks is advance!
ReplyDelete
Replies
UnknownMarch 27, 2014 at 3:11 PM
Alex,

Thanks for the write-up; this is exactly what I need as a band-aid for my less-than-optimal data model. I implemented the 11g version with regexp_count and it's working great.

One snag I hit, however, was with filtering out the unneeded rows. For some reason

where regexp_substr (str, '[^,]+', 1, rn) is not null

didn't work for me. I checked and made sure there wasn't a space or non-printable character or anything but the rows were still included in the results. Changing the IS NOT NULL check to a LENGTH checked fixed the issue for me:

WHERE LENGTH( regexp_substr (str, '[^,]+', 1, rn) ) <> 0

Any clue why the IS NOT NULL version would be failing?
ReplyDelete
Replies
AnonymousMay 6, 2014 at 8:22 PM
Hi Alex....I need to separate this string called aux_return:
aux_return = OK91,9702|7,6|11340|59,94|0|641579,34|1790,25. The separator is '|' and I need that:
var1 := 91,9702;
var2 := 7,6;
var3 := 11340;
var4 := 59,94;
var5 := 0;
var6 := 641579,34;
var7 := 1790,25:

I need yor help and THANK YOU SO MUCH.
ReplyDelete
Replies
AnonymousJune 6, 2014 at 1:53 PM
Hi Alex,

Just found out that REGEXP_SUBSTR('field1||field3|field4', '[^|]+' ,1 , 2 ) is returning value 'field3' instead of NULL.
My quick fix is trim(REGEXP_SUBSTR(replace('field1||field3|field4','||','| |'), '[^|]+' , 1, 2))

Do you know a better / simpler solution?

Regards,
Eric
ReplyDelete
Replies
AnonymousJuly 15, 2014 at 10:26 PM
Hi Alex.
i have some different query. i have two table emp, and location:
emp
emp_name emp_code
---------- ------------------
A 10
B 12
C 15
D 20
E 21
F 23
G 31

location

ofc_location office_code
------------------------------
X 10
Y 20
Z 30

as per data above, i have to find our the location of the emp_name.
on the basis of emp_code, emp_name A,B,C, belong to location X.
that means emp_name which are have office code 10(have office_code of X or greater) and less than 20(have office code of Y) belongs to location X.
emp_code which are between 20-30, these belongs from location Y.

i.e. my output should be as below
emp_name emp_code ofc_location
---------- ------------- -----
A 10 X
B 12 X
C 15 X
D 20 Y
E 21 Y
F 23 Y
G 31 Z

what should be my query to find out this data.

Would appriciate you help here, and thanks in advance.
ReplyDelete
Replies
AnonymousJuly 16, 2014 at 7:29 PM
This seems helpful, but I couldn't get it to work. I copied and pasted your example and I get the following results:
ID STR SPLIT
--------------- ------------------- ---------------
1 joey,anthony,marvin ,
1 joey,anthony,marvin ,
5 tony,glenn ,

I'm on 11.2.0.3. Any ideas why this isn't working for me?
ReplyDelete
Replies
Manish Kumar GuptaAugust 1, 2014 at 7:19 PM
Hi,

I have a tables with below data
Field1 | Field 2
CAT1 |23,23,43,76,598,0,33,94,34,50,99,06.76,3s,adf,547,sdf
CAT2 |hdsfd,dsf,dsfd,dsf,dsf,dfdds,ds,dsds,dsfds,dsf,ds,dsfds,ds

I need the output in below format
CAT1 |23,23
CAT1|43,76
CAT1|598,0
CAT1|33,94 and so on for CAT 1
CAT2|hdsfd,dsf

Can anyone give me function for above output
ReplyDelete
Replies
AnonymousSeptember 19, 2014 at 9:31 PM
How do I convert a string like 'A1,A2,A3,...AN' to '||A1||,||A2||...||AN||'. Here the length of the string is varying. Any help would be greatly appreciated.
I tried the following but the issue I am facing is more than 4000 character for the dynamic query I am trying to build (which is a another issue I can deal with)
ReplyDelete
Replies
AnonymousSeptember 23, 2014 at 4:52 PM
I want to create a script to generate insert statements for some tables. The idea is to use the all_tab_columns to get the columns run time instead of manually inputting all the column names for a table. This way I don't have to worry about any change in the table structure. Anyway, I was looping through the column names for a table which gives me buffer issue in UNIX but not in toad. So I searched around and tried to use the following query which will give me the all columns in the format of 'A1,A2,A3' instead of looping through the entire table. Now I want to play with the string to obtain the format I requested (extra Pipes at the start and end for the strings delimited by the commas). I looked around to use regexp_replace and INSTR combination but didn't get what I wanted (due to being newbie to pl/sql programming).

SELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(column_name, ',')), ',') col INTO v_str1 from ( SELECT rownum rn, column_name FROM (SELECT column_name FROM all_tab_columns WHERE table_name=myTABLE order by column_id) )
START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1;
ReplyDelete
Replies
GerDecember 24, 2014 at 12:08 PM
Hi Alex - thanks for the implementation - works very well. Vrolijk Kerstfeest!
ReplyDelete
Replies
AnonymousDecember 30, 2014 at 10:00 AM
i have a string 'a,b,c,d,e' ..i want to print output as

1 a
2 b
3 c
4 d
5 e

Please hep me to solve this..
Thanks in advance..
ReplyDelete
Replies
AnonymousDecember 31, 2014 at 8:33 AM
Thanks for the reply. but the actual query is ....

Write a packaged function that takes a string separated by commas (say a,b,c,d,e) and returns the words between the commas populated in a PLSQL table. The contents of the PLSQL table should be listed via a SELECT statement as rows. So final output of running the select statement should be like:

Position String
-------- ------
1 a
2 b
3 c
4 d
5 e
ReplyDelete
Replies
shreedeviMarch 24, 2015 at 6:33 PM
Works great ! Thanks for this.
ReplyDelete
Replies
MeuBlogMarch 26, 2015 at 2:16 PM
hi,

I've a string like 'AAAA-xx/BBBB-yy/CCCC-zz' and i want to print output as

AAAA-xx
AAAA-xx/BBBB-yy
AAAA-xx/BBBB-yy/CCCC-zz

Thanks in advance,
Jorge
ReplyDelete
Replies
Oren NakdimonSeptember 22, 2015 at 4:32 PM
Hi Alex.
What a popular and long-living post :-)
I'd like to suggest an alternative, so the TEST table will be accessed only once, and the parsing of each row will happen as soon as the row is fetched (with no need to full scan the table before starting the first parsing). This may be significant when processing large result sets.
As a byproduct, it also handles empty elements (like in ',tony,,,glenn'), which were supported by your original solution for a single-row splitting.

In Oracle 12c, using Lateral Inline View:

with test as
(
select 1 id, 'joey,anthony,marvin' str from dual union all
select 5 id, 'tony,glenn' str from dual union all
select 8 id, 'john' str from dual
)
select id,str,split
from test,
LATERAL (select regexp_substr (str, '[^,]+', 1, rownum) split from dual connect by level <= regexp_count(str, ',')+1) ;

[I once wrote about Lateral Inline Views here - http://www.db-oriented.com/2013/08/10/tip004/ - with a conceptually similar example to the one I later showed in my "write less with more" presentation in DOAG 2014 - I gratefully remember you attended :-) ]

In 11g, a more cumbersome solution, using Collection Unnesting, will achieve the same:

with test as
(
select 1 id, 'joey,anthony,marvin' str from dual union all
select 5 id, 'tony,glenn' str from dual union all
select 8 id, 'john' str from dual
)
select id,str,column_value split
from test,
table(cast(multiset(select regexp_substr (str, '[^,]+', 1, rownum) from dual connect by level <= regexp_count(str, ',')+1) as sys.odcivarchar2list)) ;

Thanks,
Oren Nakdimon
@DBoriented
http://db-oriented.com
ReplyDelete
Replies
RAGHAVNovember 6, 2015 at 10:46 AM
The idea is to use the plsql regex commands and split a string with similar characters.
Ex: AAYYMMXXXXCC should return

AA
YY
MM
XXXX
CC

expecting this with sql statement.
ReplyDelete
Replies
suradmaMarch 29, 2016 at 7:20 PM
Request for the following help on the below:
I have a parameter table like below:-
id add_id sub_id div_id per_id
100 12,132,15 54,56 null null
141 125,56 null 56,89 1
145 978,69 null 897,665
100= (12+132+15)-(54+56)
141= (125+56)/(56+89)
145= (978+69)/(897+665)*100
Is it possible to exend your query for the above ?

Regards
Suresh
ReplyDelete
Replies
UnknownMay 2, 2016 at 7:56 PM
how can we convert when there are no values in between like below

Test1, Test2, , Test4, , , Test6

ReplyDelete
Replies
UnknownDecember 2, 2016 at 8:43 PM
hi, I am new to regexp_like, need help.

data getting-
abc,cde,efg,ghi,ijk
abc,cde,ijk
cde,ijk
abc,cde,ghi,ijk
I want my output get those records only where abc,cde and ijk is there only(any combination of thses value only)

I am using this
where regexp_like(username,'^[(abc|cde|ijk) ,;]+$');

but its not giving correct answer.

please help!!!!

ReplyDelete
Replies
AhmadMarch 17, 2020 at 6:09 AM
Hi i have string like'1,abc|2,cd|3,xy.....'
i want a function that split the reslut by pipleine and then by commas and the result would be like this id=1 and the value = abc..
ReplyDelete
Replies

Add comment

Notes on Oracle

Pages

08 August 2011

Splitting a comma delimited string the RegExp way, Part Two

73 comments:

Blog Archive

Non Oracle Links