29 July 2009

DELETE in the MERGE statement

The Merge statement was introduced in Oracle 9i and improved upon in Oracle 10g.
In Oracle 9i only the INSERT and UPDATE parts were supported, in Oracle 10g DELETE was added. The "merge_update_clause" and "merge_insert_clause" became optional.
The basic syntax for the MERGE statement:



DELETE can only occur in the "merge_update_clause" of the above schema. This means that it must occur in the WHEN MATCHED THEN clause.

Until recent, I missed this part of the description of the "merge_update_clause" concerning the DELETE operation. First I will show you what I thought, then I'll show you where the behavior is documented.

First we'll create a table with two columns:

SQL> select *
2 from v$version
3 /

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE 10.2.0.3.0 Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.3.0 - Productio
NLSRTL Version 10.2.0.3.0 - Production

SQL>
SQL> create table t
2 as
3 select rownum rn
4 , 'A' ind
5 from all_objects
6 where rownum <= 5
7 /

Table created.

SQL> update t
2 set ind = 'D'
3 where rn = 5
4 /

1 row updated.

SQL>
SQL> select *
2 from t
3 /

RN I
---------- -
1 A
2 A
3 A
4 A
5 D

Notice that the last record, with RN 5, has an Ind "D".

Next we will merge a record into this table.


SQL> merge into t
2 using (select 3 i
3 , 'D' ind
4 from dual
5 ) dat
6 on (t.rn = dat.i)
7 when matched then
8 update set t.ind = dat.ind
9 delete where t.ind = 'D' --<--- Here is the DELETE
10 /

1 row merged.


Only one row? Shouldn't that be two rows? One for the UPDATE and one for the DELETE?
Question for you: Which record(s) is (are) affected by this statement?

My wrong assumption was this:
Record with RN 3 has had the IND column changed to "D" and all records with IND "D" are removed. Effectively removing records with RN 3 and 5.

Now the quote from the documentation.

The only rows affected by this clause are those rows in the destination table that are updated by the merge operation.


This means that records in the destination table are not deleted when they are not updated by the MERGE first.



SQL> select *
2 from t
3 /

RN I
---------- -
1 A
2 A
4 A
5 D


As you can see the record with RN 5 is still in the table. Because it was not updated in the merge, it was not deleted.
In order to remove some records from the table using the MERGE statement, you need to update these records first. It is not possible to dismiss the UPDATE statement from the MERGE:


SQL> merge into t
2 using (select 3 i
3 , 'D' ind
4 from dual
5 ) dat
6 on (t.rn = dat.i)
7 when matched then
8 delete where t.ind = 'D' --<--- Here is the DELETE
9 /
delete where t.ind = 'D' --<--- Here is the DELETE
*
ERROR at line 8:
ORA-00905: missing keyword


... learn something every day.

documentation link

21 July 2009

NVL2 - useful? Two use cases

Sometimes you encounter a function that makes you wonder why Oracle made it. NVL2 is one of those functions, at least I think so. Even though it was introduced quite a long time ago, I've never used it in production code.
Until now that is.
NVL2 was introduced in Oracle 8i (8.1.7 according to Tahiti.Oracle.com)


Syntax
The NVL2 function takes three arguments. The first argument (could also be an expression) is evaluated for NULL or NOT NULL. When this argument evaluates to NOT NULL, the second expression is returned. When the first argument evaluates to NULL then last (third) expression is returned.
It works like this pseudocode:

if argument1 is not null
then
argument2;
else
argument3;
end if;

First Use Case: Concatenate strings with an optional delimiter
In The Netherlands, where I'm from, you can choose the way your last name should be used after marriage.
In general there are four options, say your lastname is "Jansen" and your partners name is "de Vries" - both very common names in The Netherlands - then these are your options:

  1. keep your own lastname: Jansen

  2. use the lastname of your partner: de Vries

  3. your own lastname hyphen partner lastname: Jansen - de Vries

  4. partner lastname hyphen own lastname: de Vries - Jansen



Say we have a table which store your name, your partners name and your preference of "name usage"


drop table names
/

create table names
(lastname varchar2(35)
,partner_lastname varchar2(35)
,name_usage varchar2(2)
)
/
insert into names values ('Jansen', 'de Vries', 'O') -- Own
/
insert into names values ('Jansen', 'de Vries', 'P') -- Partner
/
insert into names values ('Jansen', 'de Vries', 'OP') -- Own - Partner
/
insert into names values ('Jansen', 'de Vries', 'PO') -- Partner - Own
/


Using a simple Case Expression we can have the names the way we want them

SQL> select case name_usage
2 when 'O' then lastname
3 when 'P' then partner_lastname
4 when 'OP' then lastname||' - '||partner_lastname
5 when 'PO' then partner_lastname||' - '||lastname
6 end full_lastname
7 from names
8 /

FULL_LASTNAME
------------------------------------------------------------
Jansen
de Vries
Jansen - de Vries
de Vries - Jansen


Looks good sofar. But,... there is always a but... The data is not always up to par. Some records have "inappropriate" name_usage indicators:

insert into names values ('Jansen', null, 'O') -- Own
/
insert into names values ('Jansen', null, 'OP') -- Own - Partner
/
insert into names values ('Jansen', null, 'PO') -- Partner - Own
/

I left out the Partner usage when there is no Partner - this would just be a "regular" NVL.

When we run the query we used before, we will get this output

FULL_LASTNAME
------------------------
Jansen
de Vries
Jansen - de Vries
de Vries - Jansen
Jansen
Jansen -
- Jansen

7 rows selected.

The last two look strange, don't you agree? There are hyphens there when they shouldn't be.
We could of course write another Case Expression nested inside our already present Case:

SQL> select case name_usage
2 when 'O' then lastname
3 when 'P' then partner_lastname
4 when 'OP' then lastname||case
5 when partner_lastname is not null
6 then ' - '||partner_lastname
7 end
8 when 'PO' then partner_lastname||case
9 when partner_lastname is not null
10 then ' - '||lastname
11 end
12 end full_lastname
13 from names
14 /

FULL_LASTNAME
-------------------------------------------------------------------------
Jansen
de Vries
Jansen - de Vries
de Vries - Jansen
Jansen
Jansen


7 rows selected.


As you can see the SQL statement becomes quite bulky, but very easy to understand. It's also possible to use NVL2 to do the same

SQL> select case name_usage
2 when 'O' then lastname
3 when 'P' then partner_lastname
4 when 'OP' then lastname ||nvl2 (partner_lastname, ' - ', null)||partner_lastname
5 when 'PO' then partner_lastname ||nvl2 (partner_lastname, ' - ', null)||lastname
6 end full_lastname
7 from names
8 /

FULL_LASTNAME
-------------------------------------------------------------------------
Jansen
de Vries
Jansen - de Vries
de Vries - Jansen
Jansen
Jansen
Jansen

7 rows selected.

Nice.

Of course the same thing can be achieved using the good ol' DECODE:

SQL> select case name_usage
2 when 'O' then lastname
3 when 'P' then partner_lastname
4 when 'OP' then lastname ||decode (partner_lastname
5 , null, null, ' - ')||partner_lastname
6 when 'PO' then partner_lastname ||decode (partner_lastname
7 , null, null, ' - ')||lastname
8 end full_lastname
9 from names
10 /

FULL_LASTNAME
-------------------------------------------------------------------------
Jansen
de Vries
Jansen - de Vries
de Vries - Jansen
Jansen
Jansen
Jansen

7 rows selected.


Second Use Case: Implementing an Arc
To implement an Arc relation in the database, the NVL2 function could also be useful.
Let's start with the model

drop table t3
/
drop table t1
/
drop table t2
/

create table t1
(c1 int primary key)
/

create table t2
(c1 int primary key)
/

create table t3
(c1 int primary key
,t1_c1 int references t1
,t2_c1 int references t2
)
/


The Arc should be implemented on the T3 table. Either the relation to T1 should be filled or the relation to T2 should be filled. They should not both be filled for the same record.
This can be implemented with a Case Expression in a Check Constraint

SQL> alter table t3
2 add constraint arc_chk check (case
3 when t1_c1 is not null
4 and t2_c1 is null
5 then 1
6 when t1_c1 is null
7 and t2_c1 is not null
8 then 1
9 else 0
10 end = 1
11 )
12 /

Table altered.

The expression in the Check Constraint is
the value of the Case Expression (either "1" or "0") must be equal to "1"


And to test our Arc implementation:

SQL> insert into t1 values (1)
2 /

1 row created.

SQL>
SQL> insert into t2 values (2)
2 /

1 row created.

SQL>
SQL> insert into t3 values (10, 1, null)
2 /

1 row created.

SQL>
SQL> insert into t3 values (11, null, 2)
2 /

1 row created.

SQL>
SQL> insert into t3 values (12, 1, 2)
2 /
insert into t3 values (12, 1, 2)
*
ERROR at line 1:
ORA-02290: check constraint (ALEX.ARC_CHK) violated


SQL>
SQL> insert into t3 values (13, null, null)
2 /
insert into t3 values (13, null, null)
*
ERROR at line 1:
ORA-02290: check constraint (ALEX.ARC_CHK) violated

This seems to work, When either relation is filled, all is well. When both relations are filled, or none of the relations is, the Check Constraint is violated.

Now let's do the same thing with NVL2.

SQL> rollback
2 /

Rollback complete.

SQL>
SQL> alter table t3
2 drop constraint arc_chk
3 /

Table altered.

SQL>
SQL>
SQL> alter table t3
2 add constraint arc_chk check (nvl2 (t1_c1, 1, 0)
3 + nvl2 (t2_c1, 1, 0)
4 = 1)
5 /

Table altered.

The Check Constraint is a lot shorter, but does the same thing.

SQL>
SQL> insert into t1 values (1)
2 /

1 row created.

SQL>
SQL> insert into t2 values (2)
2 /

1 row created.

SQL>
SQL> insert into t3 values (10, 1, null)
2 /

1 row created.

SQL>
SQL> insert into t3 values (11, null, 2)
2 /

1 row created.

SQL>
SQL> insert into t3 values (12, 1, 2)
2 /
insert into t3 values (12, 1, 2)
*
ERROR at line 1:
ORA-02290: check constraint (ALEX.ARC_CHK) violated


SQL>
SQL> insert into t3 values (13, null, null)
2 /
insert into t3 values (13, null, null)
*
ERROR at line 1:
ORA-02290: check constraint (ALEX.ARC_CHK) violated


The NVL2 function can also take an expression as the first argument. Just for completeness I have taken this example from the documentation:

SELECT last_name, salary, NVL2(commission_pct,
salary + (salary * commission_pct), salary) income
FROM employees WHERE last_name like 'B%'
ORDER BY last_name;

LAST_NAME SALARY INCOME
------------------------- ---------- ----------
Baer 10000 10000
Baida 2900 2900
Banda 6200 6882
Bates 7300 8468
Bell 4000 4000
Bernstein 9500 11970
Bissot 3300 3300
Bloom 10000 12100
Bull 4100 4100


And there you have it, two use cases - and an example from the docs - for the NVL2 function.

What would be a reason not to use the NVL2 function? One reason would be that the function is quite hard to read (comparable to interpreting DECODE). Another would be that this function is not very well known.
The Use Cases described above are the cases where I use the NVL2 function. Simply because it's so concise. Matter of preference I think.
Do you use it? And how - for which use cases - do you use it?

07 July 2009

Making up Data with Partition Outer Join

Just the other day on the Oracle SQL and PL/SQL forum, someone asked on how to create non-existent rows. This post is not about generating a number of rows.
It's about handling Sparse Data, when you want to fill in some missing records in your result set.
First time I heard about this was in a blog written by Lucas Jellema.
Let's first start out with the table and some test data:

SQL> create table test_table1
2 (status varchar2(15)
3 ,manager number
4 ,sales number);

Table created.

SQL> insert into test_table1 values ('in process', 14, 100);

1 row created.

SQL> insert into test_table1 values ('in process', 15, 10);

1 row created.

SQL> insert into test_table1 values ('in process', 15, 40);

1 row created.

SQL> insert into test_table1 values ('done', 14, 200);

1 row created.

SQL> insert into test_table1 values ('done', 16, 50);

1 row created.
SQL> select *
2 from test_table1
3 /

STATUS MANAGER SALES
--------------- ---------- ----------
in process 14 100
in process 15 10
in process 15 40
done 14 200
done 16 50

As you can see in the sample data, Manager 14 has entries for the status "in process" and "done". Manager 15 only has entries for "in process". Manager 16 only has a single entry for "done".
The result that we are after is to show for each Manager a total sales value for both statuses "in process" and "done".
When we use a regular SUM and GROUP BY:

SQL> select manager
2 , status
3 , sum(sales)
4 from test_table1
5 group by manager
6 , status
7 order by manager
8 , status
9 /

MANAGER STATUS SUM(SALES)
---------- --------------- ----------
14 done 200
14 in process 100
15 in process 50
16 done 50

we only see values for records that are actually in the table... Go Figure!

Nice results, but not exactly what we are after. We want an extra record for Manager 15 (with status "done" and sales value of 0) and an extra record for Manager 16 (with status "in process" and also a value of 0).

One way to tackle this problem (or challenge if you prefer) is to use a Partition Outer Join. As far as i know this is not ANSI-SQL, but Oracle specific syntax. Tahiti.Oracle.com calls it an "extension to the ANSI syntax".
To make this query work, we need a "table" (or inline view) which has all possible statuses. Something like

SQL> select 'done' st from dual
2 union all
3 select 'in process' from dual
4 /

ST
----------
done
in process

This inline view will be outer joined to our table.
What makes a Partition Outer Join work differently from a regular Outer Join?
A regular Outer Join will show an extra single record even when a matching value is not present. In our case, this will not make a difference as the values "done" and "in process" are present in our base table.
What we want is to outer join all statuses from the inline view to our base table for each manager.
And this is exactly what the Partition Clause does. It breaks up the result set per manager. Per partition (one for Manager 14, one for Manager 15 and one for Manager 16) we want to outer join to the inline view.

Putting it all together, and here is the final result:

SQL> select manager
2 , st
3 , nvl (sum (sales) , 0)
4 from test_table1 t partition by (manager)
5 right outer
6 join (select 'done' st from dual
7 union all
8 select 'in process' from dual
9 ) sts
10 on (t.status = sts.st)
11 group by manager
12 , st
13 order by manager
14 , st
15 /

MANAGER ST NVL(SUM(SALES),0)
---------- ---------- -----------------
14 done 600
14 in process 300
15 done 0
15 in process 150
16 done 150
16 in process 0

6 rows selected.

Each Manager shows an entry for both statuses "done" and "in process", even when this value is not in the base table.

If -for whatever reason- you don't like RIGHT OUTER, just flip the tables around and call it a LEFT OUTER:

SQL> select manager
2 , st
3 , nvl (sum (sales) , 0)
4 from (select 'done' st from dual
5 union all
6 select 'in process' from dual
7 ) sts
8 left outer
9 join test_table1 t partition by (manager)
10 on (t.status = sts.st)
11 group by manager
12 , st
13 order by manager
14 , st
15 /

MANAGER ST NVL(SUM(SALES),0)
---------- ---------- -----------------
14 done 800
14 in process 400
15 done 0
15 in process 200
16 done 200
16 in process 0

6 rows selected.


Original question
Lucas Jellema on Partition Outer Join
Oracle 10g Documentation
Oracle 11g Documentation
Rob van Wijk on Interval Based Row Generation

03 July 2009

Splitting a comma delimited string the RegExp way

This is one of those recurring questions on the Oracle Forums of OTN.
How to split a comma delimited string? Of course there are several options how to tackle this problem. One of the most elegant ones, at least I think so, uses a regular expression.

Let's just look at an example

SQL> with test as
2 (select 'ABC,DEF,GHI,JKL,MNO' str from dual
3 )
4 select regexp_substr (str, '[^,]+', 1, rownum) split
5 from test
6 connect by level <= length (regexp_replace (str, '[^,]+')) + 1
7 /

SPLIT
---------------------------------------------------------------------
ABC
DEF
GHI
JKL
MNO

The first part creates some test data using the WITH clause (aka Subquery Factoring). The actual query with the regular expression starts on line 4.
The expression is

regexp_substr (str, '[^,]+', 1, rownum)

The meaning of "[^,]+" in normal English:
Give me one or more characters which are not in the list

The list consists of all characters between the square brackets. Here the "^" (circumflex) indicates "except" or "not in". The "+" means: one or more times.

The arguments of the REGEXP_SUBSTR determine which part of the string to subtract.
The third argument to the REGEXP_SUBSTR functions tells it where to start with the regular expression. The last argument means which occurence to match.

REGEXP_SUBSTR
Multilingual Regular Expression Syntax