28 September 2009

Planboard Symposium: Registration open

For the fourth time the Planboard Symposium will be held on November 17. This symposium is unique as it's "for DBA by DBA". This one day symposium will have 5 parallel sessions with lots of time for networking and open discussions.
I am lucky enough to be one of the presentors and my topic will be "Continuous Database Application Evolution in Oracle Database 11g Release 2 ", a mouth full.
This session will discuss a new feature of Oracle 11gR2: Edition Based Redefinition. Not just "a new feature", it's "The new feature".

You can register on the Planboard Symposium site

Oh, and by the way... don't tell anyone I'm not a DBA... ssshhh...
see you there.

24 September 2009

Carrying down values with Analytic Functions

The other day - yesterday actually - I got an email with a question regarding Analytic Functions.
This was the requirement for the query:


In a table there are a CODE, STARTDATE and VAL columns. The combination of CODE and STARTDATE are mandatory, the VAL is optional. I want to get an overview where the VAL column shows the latest (based on the STARTDATE) value. If there is no value filled out for a particular date, it should show the value of the latest entry for that particular CODE.

Let's take a look at an example to clarify the requirement. First of all the data in the table:

COD STARTDATE VAL
--- --------- ----------
A 24-SEP-08 QRS
A 24-OCT-08
A 24-NOV-08
A 24-DEC-08 a
A 24-JAN-09
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09
A 24-MAY-09
B 24-DEC-08 BLA
B 24-JAN-09
B 24-FEB-09
B 24-MAR-09 BLABLA
B 24-APR-09


As you can see in the above output, there are two CODES ("A" and "B"). The combination of CODE and STARTDATE is unique. The last column (VAL) has some values filled out, not all.
The desired output would be:


COD STARTDATE VAL
--- --------- ------
A 24-SEP-08 QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a
A 24-JAN-09 a
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA
B 24-APR-09 BLABLA


When the VAL column has no entry (IS NULL) the output should show the latest value of the VAL column - based on the Startdate within the CODE. Latest in this case means: the most recent value of the VAL column with regards to the STARTDATE column of the current record.

Is the requirement clear? Time to solve it. I will show you two ways of solving this, there are possibly many more ways of getting the required output. One way is quite cumbersome, but will work in Oracle 8.1.6 EE and up. The other way is really trivial and works in Oracle 10g and up.

First method, fully explained.
Per CODE, we need to identify which VAL should be carried down, until a different VAL is encountered. To identify these subgroups we use a Case statement:

SQL> select code
2 , startdate
3 , val
4 , case
5 when val is not null
6 then 1
7 end new_val
8 from tbl
9 order by code
10 , startdate
11 /

COD STARTDATE VAL NEW_VAL
--- --------- ---------- ----------
A 24-SEP-08 QRS 1
A 24-OCT-08
A 24-NOV-08
A 24-DEC-08 a 1
A 24-JAN-09
A 24-FEB-09 XY 1
A 24-MAR-09 ABC 1
A 24-APR-09
A 24-MAY-09
B 24-DEC-08 BLA 1
B 24-JAN-09
B 24-FEB-09
B 24-MAR-09 BLABLA 1
B 24-APR-09

Now the column NEW_VAL shows the marker for each subgroup. Using the "running total" technique we can clearly see which records belong together.

SQL> select code
2 , startdate
3 , val
4 , sum (case
5 when val is not null
6 then 1
7 end
8 ) over (partition by code
9 order by startdate
10 ) new_val
11 from tbl
12 order by code
13 , startdate
14 /

COD STARTDATE VAL NEW_VAL
--- --------- ---------- ----------
A 24-SEP-08 QRS 1
A 24-OCT-08 1
A 24-NOV-08 1
A 24-DEC-08 a 2
A 24-JAN-09 2
A 24-FEB-09 XY 3
A 24-MAR-09 ABC 4
A 24-APR-09 4
A 24-MAY-09 4
B 24-DEC-08 BLA 1
B 24-JAN-09 1
B 24-FEB-09 1
B 24-MAR-09 BLABLA 2
B 24-APR-09 2

With the distinction made - you can see the different subgroups, each having a unique number within the partition, we need the first value of the VAL column per subgroup. For that we will use the FIRST_VALUE function. Notice that the partitioning clause in the FIRST_VALUE is using the subgroups we defined in the previous section as well as the CODE.

SQL> select code
2 , startdate
3 , first_value (val)
4 over (partition by code
5 , new_val
6 order by startdate
7 ) new_val
8 from (
9 select code
10 , startdate
11 , val
12 , sum (case
13 when val is not null
14 then 1
15 end
16 ) over (partition by code
17 order by startdate
18 ) new_val
19 from tbl
20 )
21 order by code
22 , startdate
23 /

COD STARTDATE NEW_VAL
--- --------- ----------
A 24-SEP-08 QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a
A 24-JAN-09 a
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA
B 24-APR-09 BLABLA


And there you have it, the required result. But it takes a lot of typing. In Oracle 10g we can get the same results, but a lot simpler.
Newly added in Oracle 10g is the IGNORE NULLS clause. And it does exactly what it says, it ignores nulls.
In the result we want to get the last value, per partition within the window that gets larger with the current row. The default windowing clause is Rows Unbounded Preceding (the docs say Range Unbounded Preceding, but I believe this is wrong. Range only works for a numeric offset like with Dates and Numbers)
In the following query, I inserted the windowing clause in there. Just to be very explicit.

SQL> select code
2 , startdate
3 , val
4 , last_value (val ignore nulls)
5 over (partition by code
6 order by startdate
7 rows between unbounded preceding
8 and current row
9 ) new_val
10 from tbl
11 ;

COD STARTDATE VAL NEW_VAL
--- --------- ---------- ----------
A 24-SEP-08 QRS QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a a
A 24-JAN-09 a
A 24-FEB-09 XY XY
A 24-MAR-09 ABC ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA BLABLA
B 24-APR-09 BLABLA

As the Windowing Clause is the default, you can also omit it.


SQL> select code
2 , startdate
3 , last_value (val ignore nulls)
4 over (partition by code
5 order by startdate
6 ) new_val
7 from tbl
8 ;

COD STARTDATE NEW_VAL
--- --------- ----------
A 24-SEP-08 QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a
A 24-JAN-09 a
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA
B 24-APR-09 BLABLA