For this example I use the dataset as AnthonyJ used in the comments. The explanation for the regular expression can be found in the original post.
SQL> with test as
2 (
3 select 1 id, 'joey,anthony,marvin' str from dual union all
4 select 5 id, 'tony,glenn' str from dual union all
5 select 8 id, 'john' str from dual
6 )
7 select id
8 , str
9 , regexp_substr (str, '[^,]+', 1, rn) split
10 from test
11 cross
12 join (select rownum rn
13 from (select max (length (regexp_replace (str, '[^,]+'))) + 1 mx
14 from test
15 )
16 connect by level <= mx
17 )
18 where regexp_substr (str, '[^,]+', 1, rn) is not null
19 order by id
20 ;
ID STR SPLIT
---------- ------------------- -------------------
1 joey,anthony,marvin joey
1 joey,anthony,marvin marvin
1 joey,anthony,marvin anthony
5 tony,glenn tony
5 tony,glenn glenn
8 john john
The trick here is in lines 11 through 17. The Cross Join is there to create multiple records, but no more than the longest intended individual words (three in this case).
Because each ID would result in three records, which is alright for ID 1, it will also create three records for ID 5 and ID 8. Line 18 removes these extra records.
If you use Oracle 11g, you can also use REGEXP_COUNT instead of the combination of REGEXP_REPLACE and LENGTH, which would look like this:
cross
join (select rownum rn
from (select max (regexp_count (str, ',') + 1) mx
from test
)
connect by level <= mx
)
Regexp_Count
I was looking for something exactly like this. Thanks a lot, it works like a charm.
ReplyDeleteHi there Alex. Thank you for sharing this. I found your post when looking for material for writing about delimited strings in PL/SQL. I am linking to your post in my tutorial (http://oracletuts.net/tutorials/how-to-tokenize-or-parse-a-string-in-plsql/)
ReplyDeleteI have been on your blog some times earelier. Keep up the good work!
Thank you very much for this, it was just what I needed.
ReplyDeleteThanks... works good!
ReplyDeleteThanks, very helpful for me.
ReplyDeleteIn my case I also needed it to have outer join behaviour so made a slight variation as below:
with test as
(
select 1 id, 'joey,anthony,marvin' str from dual union all
select 5 id, 'tony,glenn' str from dual union all
select 8 id, 'john' str from dual union all
select 9 id, null str from dual
)
select id
, str
, regexp_substr (str, '[^,]+', 1, rn) split
from test
left outer join (select rownum rn
from (select max (length (regexp_replace (str, '[^,]+'))) + 1 mx
from test
)
connect by level <= mx) splits on splits.rn <= length (regexp_replace (str, '[^,]+'))
order by id
;
This could also be done using an or rn = 1 in the where clause instead
Thank you Anonymous for your suggestion with the Left Outer Join. However, from the example you posted "John" is missing in the split-column. When you amend the second to last line to:
ReplyDeleteon splits.rn <= nvl (length(regexp_replace(str,'[^,]+')), 1)
you will see John show up.
The NVL was added to that line.
Hi,
ReplyDeletedumb I know but I have a dataset with some records that have a second name in the field EG: Rob (Bobby). So I need to leave Rob in the FNAME field and put Bobby in the AFNAME field.
I have landed on
TRIM(REGEXP_REPLACE(FNAME, '\((.*?)\)', '') )'
To update the FNAME field removing " (Bobby)
I was trying to use:
REGEXP_SUBSTR(FNAME, ' \((.*?)\)') but that of course return "(Bobby)"
which is where I landed on your blog...
I need an elegant way to only return Bobby without the delimiters.
Update table
Set AFNAME= ???,
SET FNAME= TRIM(REGEXP_REPLACE(FNAME, '\((.*?)\)', '') )'
Where FNAME like ‘%(%’
…
Update:
DeleteNot necessarily elegant but it works:
trim(translate(REGEXP_SUBSTR(FNAME, ' \((.*?)\)'), '()', ' ' ))
This query saved my bacon. There are plenty of examples out there using CONNECT BY, but it only works for one row at a time. Thank you so much.
ReplyDeleteHi. One Q regardig the first split post in http://nuijten.blogspot.no/2009/07/splitting-comma-delimited-string-regexp.html
ReplyDeleteNow the output is shown in one column and rows.
Is it possible to send them into a variabel for each of them ?
Cheers Mate
I'm not sure what you mean. If I understand correctly: you want the results of the query and place each value that you get back into a separate variable? Then you would have to know ahead of time how many variables you would need... unless you select them into a collection (like an Associative Array, Nested Table or a Varray).
DeleteHI ALEX I have a requirment like this, there is a table and has three columns
ReplyDeleteAPI FIELD_NAME FIELD_VALUE
TEST1;TEST2; COL1|COL2;COL3|COL4; 1|2;3|4;
AND I WANT OUTPUT LIKE THIS
API FIELD_NAME FIELD_VALUE
TEST1 COL1 1
COL2 2
TEST2 COL3 3
COL4 4
Thanks in advance
For that requirement you can use the technique described in this blogpost, but do it twice. First you split the content of the first column based on the ";" separator. Next you will split the content of the second column, based on the "|" separator.
DeleteBut the real solution is: change your datamodel so you don't have to go through hoops to get the correct values extracted.
Hi, alex thanks for suggestion,
ReplyDeletei tried this what u said but it's not working.
plz describe in det, i have urgent requirments.
Thanks.
> i tried this what u said but it's not working.
DeleteLet's see what you tried in that case. Maybe I can offer you some suggestions. The method I described works - I just tried it.
If this is really "urgent", as in "stop what you're doing - I will hire you to write this query for me, no matter what the cost", I will get you in contact with our sales department.
Remember: this is just a blog where I keep my notes. If my notes help you, or if I can help you with some advice - I would be more than happy to do so. However I will not do your job for you (that's why it's called your job).
It seems that your keyboard is broken, a number of letters are missing.
Thanks for suggestion.
ReplyDeleteThank you so much... it works great!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThanks Alex for the wonderful solution :). But i would like to understand how does it work. I mean the connect by clause and the flow.
ReplyDeleteThe Connect By-clause is a "trick" which generates a number of rows specified by the value (mx in my example). You can try this by issueing a query like:
Deleteselect rownum from dual connect by level <= 10
which will generate 10 rows.
The rownum generated by the inner query is used with the REGEXP expression
regexp_substr (str, '[^,]+', 1, rn)
Does this help?
This comment has been removed by the author.
DeleteIf you start breaking down the query - begin with the inner most query - and work your way out, you can see the way the query is structured.
DeleteI have table product and i am displaying same products have different rates..but it will print on the same line ex..
ReplyDeletePRODUCT_ID RATES
IND0001 $10$2$20$5
but i want..
Ind0001 $10
$20
$5
in dropdown menu..for fetching the row..
So, you have a table with 2 columns (PRODUCT_ID and RATES) and you want to split the RATES column so the enduser can select a RATE? To me it seems that your model is "not ideal" (to put it mildly). How would you otherwise know which RATE the user picked? To me it seems that you would need a PRODUCTS and a RATES table, with a one-to-many relation between them
DeleteHi Alex,
ReplyDeleteCan you throw some light on this.
we are getting the mx value for each record in the test. which is a performance hit .
Also, let us say one record has a str with 100 words and everything else is less than 3 or 4 words, then in that case for all the records we are doing the cartesian product for 100 times and avoiding the nulls though it had just 3 words.
Can this be tweaked to get the mx value for the str we are dealing with instead of finding the mx value for all the str records in test. Appreciate your inputs. And Thanks is advance!
Is your claim (the performance hit) based on evidence - or just the assumption that it could be a performance hit? If you tested it, I am curious to find out how you tested it...
DeleteIf this (getting the words out of a column) is something that needs to be done frequently, then there might be something wrong with your datamodel. Should the words be a seperate child table? That way you wouldn't need to split the values using this technique.
Alex,
ReplyDeleteThanks for the write-up; this is exactly what I need as a band-aid for my less-than-optimal data model. I implemented the 11g version with regexp_count and it's working great.
One snag I hit, however, was with filtering out the unneeded rows. For some reason
where regexp_substr (str, '[^,]+', 1, rn) is not null
didn't work for me. I checked and made sure there wasn't a space or non-printable character or anything but the rows were still included in the results. Changing the IS NOT NULL check to a LENGTH checked fixed the issue for me:
WHERE LENGTH( regexp_substr (str, '[^,]+', 1, rn) ) <> 0
Any clue why the IS NOT NULL version would be failing?
Just a wild stab: is the datatype of your column perhaps a CLOB?
DeleteIf it is a CLOB, check this: http://nuijten.blogspot.nl/2009/11/empty-clob-is-not-null-its-not-null.html
DeleteBingo! That was exactly the case.
DeleteWho would have guessed, after coming to terms with Oracle's singular interpretation of empty string = NULL, that they would decide empty clob <> NULL. Oh well, live and learn. Thanks for the knowledge!
Reading your blog post I agree that syntactically that is a good use of nullif(). Quite possibly the first good use I've seen.
DeleteI tried it out but it's not good for my use case. In my worst-case example from current production data the CROSS JOIN above is generating about 2700 rows of which 329 are returned. With the simpler LENGTH() <> 0 it finishes in about 16 seconds. With NULLIF(LENGTH(),0) IS NOT NULL it takes about 25. So I'm sticking with former for now.
While the vast majority of my data will probably only generate a dozen extra rows the worst-case is still to long for me. Next stop is trying to do this with procedurally with a PIPELINED return and see if I can speed it up at all.
Thanks again.
There is also a suggestion in the comments to use:
Deletedbms_lob.compare(nvl(text, empty_clob()),empty_clob())= 0
Hi Alex....I need to separate this string called aux_return:
ReplyDeleteaux_return = OK91,9702|7,6|11340|59,94|0|641579,34|1790,25. The separator is '|' and I need that:
var1 := 91,9702;
var2 := 7,6;
var3 := 11340;
var4 := 59,94;
var5 := 0;
var6 := 641579,34;
var7 := 1790,25:
I need yor help and THANK YOU SO MUCH.
Easy...
Deletecursor variables is
SELECT trim(regexp_substr( substr(aux_retorno,3,length(aux_retorno)-2), '[^|]+', 1, LEVEL)) aux_variables
FROM dual
CONNECT BY LEVEL <= length(substr(aux_retorno,3,length(aux_retorno)-2)) - length(REPLACE(substr(aux_retorno,3,length(aux_retorno)-2), '|', ''))+1;
e variables%rowtype;
open variables;
loop
fetch variables into e;
exit when variables%notfound;
---------------------------
if i = 0 then
aux_1 := e.aux_variables;
end if;
if i = 1 then
aux_2 := e.aux_variables;
end if;
if i = 2 then
aux_3 := e.aux_variables;
end if;
if i = 3 then
aux_4:= e.aux_variables;
end if;
if i = 4 then
aux_5 := e.aux_variables;
end if;
if i = 5 then
aux_6 := e.aux_variables;
end if;
if i = 6 then
aux_7 := e.aux_variables;
end if;
i := i + 1;
end loop;
close variables;
Hi Alex,
ReplyDeleteJust found out that REGEXP_SUBSTR('field1||field3|field4', '[^|]+' ,1 , 2 ) is returning value 'field3' instead of NULL.
My quick fix is trim(REGEXP_SUBSTR(replace('field1||field3|field4','||','| |'), '[^|]+' , 1, 2))
Do you know a better / simpler solution?
Regards,
Eric
No, can't think of a different solution than the one you provided...
DeleteHi Alex.
ReplyDeletei have some different query. i have two table emp, and location:
emp
emp_name emp_code
---------- ------------------
A 10
B 12
C 15
D 20
E 21
F 23
G 31
location
ofc_location office_code
------------------------------
X 10
Y 20
Z 30
as per data above, i have to find our the location of the emp_name.
on the basis of emp_code, emp_name A,B,C, belong to location X.
that means emp_name which are have office code 10(have office_code of X or greater) and less than 20(have office code of Y) belongs to location X.
emp_code which are between 20-30, these belongs from location Y.
i.e. my output should be as below
emp_name emp_code ofc_location
---------- ------------- -----
A 10 X
B 12 X
C 15 X
D 20 Y
E 21 Y
F 23 Y
G 31 Z
what should be my query to find out this data.
Would appriciate you help here, and thanks in advance.
Well, this is really not part of the subject of this blogpost but it is a nice little challenge..
DeleteI created the tables "E" and "L" with your data. The query below uses an Analytic Function (LEAD in this case) to solve the problem at hand.
with office_range as
(select ofc_location
,office_code range_start
,lead (office_code) over (order by office_code) range_end
from l
)
select *
from e
join office_range o
on (e.emp_code >= o.range_start
and (e.emp_code < o.range_end or o.range_end is null))
This seems helpful, but I couldn't get it to work. I copied and pasted your example and I get the following results:
ReplyDeleteID STR SPLIT
--------------- ------------------- ---------------
1 joey,anthony,marvin ,
1 joey,anthony,marvin ,
5 tony,glenn ,
I'm on 11.2.0.3. Any ideas why this isn't working for me?
It's not clear from the formatting, but there is a single comma in SPLIT of each row.
DeleteThanks!
Solved the problem -- in SQL*Plus, the escape character was set to '^'. Sorry for the noise here! Maybe this be of use to someone else -- I hope so.
DeleteNice to see that you solved your own problem :)
DeleteHi,
ReplyDeleteI have a tables with below data
Field1 | Field 2
CAT1 |23,23,43,76,598,0,33,94,34,50,99,06.76,3s,adf,547,sdf
CAT2 |hdsfd,dsf,dsfd,dsf,dsf,dfdds,ds,dsds,dsfds,dsf,ds,dsfds,ds
I need the output in below format
CAT1 |23,23
CAT1|43,76
CAT1|598,0
CAT1|33,94 and so on for CAT 1
CAT2|hdsfd,dsf
Can anyone give me function for above output
sounds like a nice challenge, but currently I don't have the time to figure it out... a first hunch would be to split the string up as described in this blogpost and then use an analytic function like ROW_NUMBER() to determine which two values to concatenate back together.
DeleteHow do I convert a string like 'A1,A2,A3,...AN' to '||A1||,||A2||...||AN||'. Here the length of the string is varying. Any help would be greatly appreciated.
ReplyDeleteI tried the following but the issue I am facing is more than 4000 character for the dynamic query I am trying to build (which is a another issue I can deal with)
What is the objective of what you're trying to do (the bigger picture). When you are trying to create a string to be used as the "WHERE clause", there are better options like this one described by Tom Kyte
DeleteI want to create a script to generate insert statements for some tables. The idea is to use the all_tab_columns to get the columns run time instead of manually inputting all the column names for a table. This way I don't have to worry about any change in the table structure. Anyway, I was looping through the column names for a table which gives me buffer issue in UNIX but not in toad. So I searched around and tried to use the following query which will give me the all columns in the format of 'A1,A2,A3' instead of looping through the entire table. Now I want to play with the string to obtain the format I requested (extra Pipes at the start and end for the strings delimited by the commas). I looked around to use regexp_replace and INSTR combination but didn't get what I wanted (due to being newbie to pl/sql programming).
ReplyDeleteSELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(column_name, ',')), ',') col INTO v_str1 from ( SELECT rownum rn, column_name FROM (SELECT column_name FROM all_tab_columns WHERE table_name=myTABLE order by column_id) )
START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1;
the reason for all this exercise is that I keep getting the following error no matter what i try to do..
DeleteORA-01489: result of string concatenation is too long
I see.. quite a daunting task you set yourself. Although it sounds like a nice plan, it can get sour quickly. There are over a 190 different datatypes you need to consider.
DeleteWhy not simply export and import the data using datapump (or EXP and IMP)?
Or maybe even pipelined table functions?
Depending on your database version a string (VARCHAR2) can not be longer than 4000 in SQL. In Oracle 12c this could be (if the DBA enabled it) 32767.
Deletethanks for the feedback. varchar2 limit is the issue here as the DB version is 11G. The script is to generate insert statement from one db to populate in all other database. imp/exp could be overkill I think for one or few records as I am not dealing with 100s of records. What bothers me the most is that the same scripts works without any issue in TOAD but not through unix. Also, if I run part of my code as standalone (declare/begin/end part) from overall code, the procedure runs fine without any issue. That is confusing me even more because the varchar2 limit is coming in the same procedure. Let me try to identify the string with more than 4000 character and see if I can break it based on the length.
Deleteit works in TOAD and not inSQL*Plus? Scroll down to the last record in TOAD and see if you still don't get the error. TOAD only fetches the first few rows. If you didn't run into an error it might be because of a difference in NLS settings between TOAD and SQL*Plus.
DeleteNo error in TOAD what so ever and it gave me the output that I want. If I forget toad for a moment, even standalone procedure runs fine in sqlplus. Btw, the following query gave me the wanted output for a string (my original request for help). So now trying to play with it :)
DeleteSELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(column_name, '||'',''||')), '||'',''||') into v_str1
from ( SELECT rownum rn, column_name FROM (SELECT column_name FROM all_tab_columns WHERE table_name=myTable order by column_id) )
START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1;
so following seems to be the issue after many rounds of testing different way...
DeleteSELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(column_name, '||'''''',''''''||')), '||'''''',''''''||') col into v_str1
from ( SELECT rownum rn, column_name FROM (SELECT column_name FROM all_tab_columns WHERE table_name=t_type(a) order by column_id) )
START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1;
I am storing few tables names in t_type array of varchar and running the above query trough loop. v_str1 is varchar2 variable. The total script has two parts. First part create temp table to prepare for the scripts and second part is the procedure for main logic. So if I run the procedure as stand alone in SQL, it works perfectly fine. As soon as I merge with sql and pl/sql code, it gives me ORA-01489: result of string concatenation is too long. I created two separate sql with 1 as sql code and 2nd as procedure creation. Now tried to call both of the files from third file but same issue. This is confusing as the standalone same code for procedure runs fine and I can see my wanted output on the prompt. If it's the issue with v_str1 variable, then procedure should give me the error. In fact, PL/SQL limit for varchar2 is 4000 which is less than what it has in sql. isn't it? Any hint would be greatly appreciated.
yes, the limit for strings in SQL is 4000 while in PL/SQL it is 32767. In Oracle 12c this can be both 32767 making the integration between SQL and PL/SQL a lot better.
DeleteHi Alex - thanks for the implementation - works very well. Vrolijk Kerstfeest!
ReplyDeletei have a string 'a,b,c,d,e' ..i want to print output as
ReplyDelete1 a
2 b
3 c
4 d
5 e
Please hep me to solve this..
Thanks in advance..
Seriously? Did you look at http://nuijten.blogspot.nl/2009/07/splitting-comma-delimited-string-regexp.html
DeleteIt has exactly what you want... well, maybe not for the numbers in the first column - you can use ROWNUM for that.
with test as
(select 'a,b,c,d,e' str from dual)
select rownum
,regexp_substr (str, '[^,]+', 1, rownum) split
from test
connect by level <= length (regexp_replace (str, '[^,]+')) + 1
Thanks for the reply. but the actual query is ....
ReplyDeleteWrite a packaged function that takes a string separated by commas (say a,b,c,d,e) and returns the words between the commas populated in a PLSQL table. The contents of the PLSQL table should be listed via a SELECT statement as rows. So final output of running the select statement should be like:
Position String
-------- ------
1 a
2 b
3 c
4 d
5 e
sounds like a homework assignment...
Deletethe correct answer would be :"It's a trick question. There is no way to call a PLSQL table (aka Index-By table or Associative Array) in a SELECT statement".
But for the hell of it... because it's the last day of the year..
create type array is object
(id number
,str varchar2(50)
);
/
create type array_list is table of array
/
create or replace
function split (p_str in varchar2)
return array_list pipelined
is
l_array array := array(null,null);
l_elements number;
l_str number := 1;
begin
l_elements := length (regexp_replace (p_str, '[^,]+')) + 1;
while l_str <= l_elements
loop
l_array.id := l_str;
l_array.str := regexp_substr (p_str, '[^,]+', 1, l_str);
pipe row (l_array);
l_str := l_str + 1;
end loop;
return;
end split;
/
select *
from table (split ('a,b,c,d,e'))
/
Works great ! Thanks for this.
ReplyDeletehi,
ReplyDeleteI've a string like 'AAAA-xx/BBBB-yy/CCCC-zz' and i want to print output as
AAAA-xx
AAAA-xx/BBBB-yy
AAAA-xx/BBBB-yy/CCCC-zz
Thanks in advance,
Jorge
with test as
Delete(select 'AAAA-xx/BBBB-yy/CCCC-zz' str
from dual)
select substr (str, 1, instr (str||'/', '/',1, rownum)) result
from test
connect by level <= regexp_count (str||'/', '/')
Hi Alex.
ReplyDeleteWhat a popular and long-living post :-)
I'd like to suggest an alternative, so the TEST table will be accessed only once, and the parsing of each row will happen as soon as the row is fetched (with no need to full scan the table before starting the first parsing). This may be significant when processing large result sets.
As a byproduct, it also handles empty elements (like in ',tony,,,glenn'), which were supported by your original solution for a single-row splitting.
In Oracle 12c, using Lateral Inline View:
with test as
(
select 1 id, 'joey,anthony,marvin' str from dual union all
select 5 id, 'tony,glenn' str from dual union all
select 8 id, 'john' str from dual
)
select id,str,split
from test,
LATERAL (select regexp_substr (str, '[^,]+', 1, rownum) split from dual connect by level <= regexp_count(str, ',')+1) ;
[I once wrote about Lateral Inline Views here - http://www.db-oriented.com/2013/08/10/tip004/ - with a conceptually similar example to the one I later showed in my "write less with more" presentation in DOAG 2014 - I gratefully remember you attended :-) ]
In 11g, a more cumbersome solution, using Collection Unnesting, will achieve the same:
with test as
(
select 1 id, 'joey,anthony,marvin' str from dual union all
select 5 id, 'tony,glenn' str from dual union all
select 8 id, 'john' str from dual
)
select id,str,column_value split
from test,
table(cast(multiset(select regexp_substr (str, '[^,]+', 1, rownum) from dual connect by level <= regexp_count(str, ',')+1) as sys.odcivarchar2list)) ;
Thanks,
Oren Nakdimon
@DBoriented
http://db-oriented.com
Thanks Oren, wonderful solutions! I remember the LATERAL solution you presented at the DOAG (and it's still on my "look closer into this" list :)
DeleteOren,
DeleteThe 11g solution saved my day!
Thank you.
The idea is to use the plsql regex commands and split a string with similar characters.
ReplyDeleteEx: AAYYMMXXXXCC should return
AA
YY
MM
XXXX
CC
expecting this with sql statement.
Request for the following help on the below:
ReplyDeleteI have a parameter table like below:-
id add_id sub_id div_id per_id
100 12,132,15 54,56 null null
141 125,56 null 56,89 1
145 978,69 null 897,665
100= (12+132+15)-(54+56)
141= (125+56)/(56+89)
145= (978+69)/(897+665)*100
Is it possible to exend your query for the above ?
Regards
Suresh
> Is it possible to exend your query for the above ?
DeleteYes, it is... all the components are in the blogpost and the comments to write the query.
how can we convert when there are no values in between like below
ReplyDeleteTest1, Test2, , Test4, , , Test6
no different than the examples shown in the blog
Deletehi, I am new to regexp_like, need help.
ReplyDeletedata getting-
abc,cde,efg,ghi,ijk
abc,cde,ijk
cde,ijk
abc,cde,ghi,ijk
I want my output get those records only where abc,cde and ijk is there only(any combination of thses value only)
I am using this
where regexp_like(username,'^[(abc|cde|ijk) ,;]+$');
but its not giving correct answer.
please help!!!!
Hi i have string like'1,abc|2,cd|3,xy.....'
ReplyDeletei want a function that split the reslut by pipleine and then by commas and the result would be like this id=1 and the value = abc..