This was the requirement for the query:
In a table there are a CODE, STARTDATE and VAL columns. The combination of CODE and STARTDATE are mandatory, the VAL is optional. I want to get an overview where the VAL column shows the latest (based on the STARTDATE) value. If there is no value filled out for a particular date, it should show the value of the latest entry for that particular CODE.
Let's take a look at an example to clarify the requirement. First of all the data in the table:
COD STARTDATE VAL
--- --------- ----------
A 24-SEP-08 QRS
A 24-OCT-08
A 24-NOV-08
A 24-DEC-08 a
A 24-JAN-09
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09
A 24-MAY-09
B 24-DEC-08 BLA
B 24-JAN-09
B 24-FEB-09
B 24-MAR-09 BLABLA
B 24-APR-09
As you can see in the above output, there are two CODES ("A" and "B"). The combination of CODE and STARTDATE is unique. The last column (VAL) has some values filled out, not all.
The desired output would be:
COD STARTDATE VAL
--- --------- ------
A 24-SEP-08 QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a
A 24-JAN-09 a
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA
B 24-APR-09 BLABLA
When the VAL column has no entry (IS NULL) the output should show the latest value of the VAL column - based on the Startdate within the CODE. Latest in this case means: the most recent value of the VAL column with regards to the STARTDATE column of the current record.
Is the requirement clear? Time to solve it. I will show you two ways of solving this, there are possibly many more ways of getting the required output. One way is quite cumbersome, but will work in Oracle 8.1.6 EE and up. The other way is really trivial and works in Oracle 10g and up.
First method, fully explained.
Per CODE, we need to identify which VAL should be carried down, until a different VAL is encountered. To identify these subgroups we use a Case statement:
SQL> select code
2 , startdate
3 , val
4 , case
5 when val is not null
6 then 1
7 end new_val
8 from tbl
9 order by code
10 , startdate
11 /
COD STARTDATE VAL NEW_VAL
--- --------- ---------- ----------
A 24-SEP-08 QRS 1
A 24-OCT-08
A 24-NOV-08
A 24-DEC-08 a 1
A 24-JAN-09
A 24-FEB-09 XY 1
A 24-MAR-09 ABC 1
A 24-APR-09
A 24-MAY-09
B 24-DEC-08 BLA 1
B 24-JAN-09
B 24-FEB-09
B 24-MAR-09 BLABLA 1
B 24-APR-09
Now the column NEW_VAL shows the marker for each subgroup. Using the "running total" technique we can clearly see which records belong together.
SQL> select code
2 , startdate
3 , val
4 , sum (case
5 when val is not null
6 then 1
7 end
8 ) over (partition by code
9 order by startdate
10 ) new_val
11 from tbl
12 order by code
13 , startdate
14 /
COD STARTDATE VAL NEW_VAL
--- --------- ---------- ----------
A 24-SEP-08 QRS 1
A 24-OCT-08 1
A 24-NOV-08 1
A 24-DEC-08 a 2
A 24-JAN-09 2
A 24-FEB-09 XY 3
A 24-MAR-09 ABC 4
A 24-APR-09 4
A 24-MAY-09 4
B 24-DEC-08 BLA 1
B 24-JAN-09 1
B 24-FEB-09 1
B 24-MAR-09 BLABLA 2
B 24-APR-09 2
With the distinction made - you can see the different subgroups, each having a unique number within the partition, we need the first value of the VAL column per subgroup. For that we will use the FIRST_VALUE function. Notice that the partitioning clause in the FIRST_VALUE is using the subgroups we defined in the previous section as well as the CODE.
SQL> select code
2 , startdate
3 , first_value (val)
4 over (partition by code
5 , new_val
6 order by startdate
7 ) new_val
8 from (
9 select code
10 , startdate
11 , val
12 , sum (case
13 when val is not null
14 then 1
15 end
16 ) over (partition by code
17 order by startdate
18 ) new_val
19 from tbl
20 )
21 order by code
22 , startdate
23 /
COD STARTDATE NEW_VAL
--- --------- ----------
A 24-SEP-08 QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a
A 24-JAN-09 a
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA
B 24-APR-09 BLABLA
And there you have it, the required result. But it takes a lot of typing. In Oracle 10g we can get the same results, but a lot simpler.
Newly added in Oracle 10g is the IGNORE NULLS clause. And it does exactly what it says, it ignores nulls.
In the result we want to get the last value, per partition within the window that gets larger with the current row. The default windowing clause is Rows Unbounded Preceding (the docs say Range Unbounded Preceding, but I believe this is wrong. Range only works for a numeric offset like with Dates and Numbers)
In the following query, I inserted the windowing clause in there. Just to be very explicit.
SQL> select code
2 , startdate
3 , val
4 , last_value (val ignore nulls)
5 over (partition by code
6 order by startdate
7 rows between unbounded preceding
8 and current row
9 ) new_val
10 from tbl
11 ;
COD STARTDATE VAL NEW_VAL
--- --------- ---------- ----------
A 24-SEP-08 QRS QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a a
A 24-JAN-09 a
A 24-FEB-09 XY XY
A 24-MAR-09 ABC ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA BLABLA
B 24-APR-09 BLABLA
As the Windowing Clause is the default, you can also omit it.
SQL> select code
2 , startdate
3 , last_value (val ignore nulls)
4 over (partition by code
5 order by startdate
6 ) new_val
7 from tbl
8 ;
COD STARTDATE NEW_VAL
--- --------- ----------
A 24-SEP-08 QRS
A 24-OCT-08 QRS
A 24-NOV-08 QRS
A 24-DEC-08 a
A 24-JAN-09 a
A 24-FEB-09 XY
A 24-MAR-09 ABC
A 24-APR-09 ABC
A 24-MAY-09 ABC
B 24-DEC-08 BLA
B 24-JAN-09 BLA
B 24-FEB-09 BLA
B 24-MAR-09 BLABLA
B 24-APR-09 BLABLA
Great i helped me alot.
ReplyDelete