24 September 2009

Carrying down values with Analytic Functions

The other day - yesterday actually - I got an email with a question regarding Analytic Functions.
This was the requirement for the query:

In a table there are a CODE, STARTDATE and VAL columns. The combination of CODE and STARTDATE are mandatory, the VAL is optional. I want to get an overview where the VAL column shows the latest (based on the STARTDATE) value. If there is no value filled out for a particular date, it should show the value of the latest entry for that particular CODE.

Let's take a look at an example to clarify the requirement. First of all the data in the table:
`COD STARTDATE VAL--- --------- ----------A   24-SEP-08 QRSA   24-OCT-08A   24-NOV-08A   24-DEC-08 aA   24-JAN-09A   24-FEB-09 XYA   24-MAR-09 ABCA   24-APR-09A   24-MAY-09B   24-DEC-08 BLAB   24-JAN-09B   24-FEB-09B   24-MAR-09 BLABLAB   24-APR-09`

As you can see in the above output, there are two CODES ("A" and "B"). The combination of CODE and STARTDATE is unique. The last column (VAL) has some values filled out, not all.
The desired output would be:

`COD STARTDATE VAL--- --------- ------A   24-SEP-08 QRSA   24-OCT-08 QRSA   24-NOV-08 QRSA   24-DEC-08 aA   24-JAN-09 aA   24-FEB-09 XYA   24-MAR-09 ABCA   24-APR-09 ABCA   24-MAY-09 ABCB   24-DEC-08 BLAB   24-JAN-09 BLAB   24-FEB-09 BLAB   24-MAR-09 BLABLAB   24-APR-09 BLABLA`

When the VAL column has no entry (IS NULL) the output should show the latest value of the VAL column - based on the Startdate within the CODE. Latest in this case means: the most recent value of the VAL column with regards to the STARTDATE column of the current record.

Is the requirement clear? Time to solve it. I will show you two ways of solving this, there are possibly many more ways of getting the required output. One way is quite cumbersome, but will work in Oracle 8.1.6 EE and up. The other way is really trivial and works in Oracle 10g and up.

First method, fully explained.
Per CODE, we need to identify which VAL should be carried down, until a different VAL is encountered. To identify these subgroups we use a Case statement:
`SQL> select code  2       , startdate  3       , val  4       , case  5           when val is not null  6           then 1  7         end new_val  8    from tbl  9   order by code 10          , startdate 11  /COD STARTDATE VAL           NEW_VAL--- --------- ---------- ----------A   24-SEP-08 QRS                 1A   24-OCT-08A   24-NOV-08A   24-DEC-08 a                   1A   24-JAN-09A   24-FEB-09 XY                  1A   24-MAR-09 ABC                 1A   24-APR-09A   24-MAY-09B   24-DEC-08 BLA                 1B   24-JAN-09B   24-FEB-09B   24-MAR-09 BLABLA              1B   24-APR-09`

Now the column NEW_VAL shows the marker for each subgroup. Using the "running total" technique we can clearly see which records belong together.
`SQL> select code  2       , startdate  3       , val  4       , sum (case  5                when val is not null  6                then 1  7              end  8             ) over (partition by code  9                         order by startdate 10                    ) new_val 11    from tbl 12   order by code 13          , startdate 14  /COD STARTDATE VAL           NEW_VAL--- --------- ---------- ----------A   24-SEP-08 QRS                 1A   24-OCT-08                     1A   24-NOV-08                     1A   24-DEC-08 a                   2A   24-JAN-09                     2A   24-FEB-09 XY                  3A   24-MAR-09 ABC                 4A   24-APR-09                     4A   24-MAY-09                     4B   24-DEC-08 BLA                 1B   24-JAN-09                     1B   24-FEB-09                     1B   24-MAR-09 BLABLA              2B   24-APR-09                     2`

With the distinction made - you can see the different subgroups, each having a unique number within the partition, we need the first value of the VAL column per subgroup. For that we will use the FIRST_VALUE function. Notice that the partitioning clause in the FIRST_VALUE is using the subgroups we defined in the previous section as well as the CODE.
`SQL> select code  2       , startdate  3       , first_value (val)  4             over (partition by code  5                              , new_val  6                       order by startdate  7                  ) new_val  8    from (  9  select code 10       , startdate 11       , val 12       , sum (case 13                when val is not null 14                then 1 15              end 16             ) over (partition by code 17                         order by startdate 18                    ) new_val 19    from tbl 20  ) 21   order by code 22          , startdate 23  /COD STARTDATE NEW_VAL--- --------- ----------A   24-SEP-08 QRSA   24-OCT-08 QRSA   24-NOV-08 QRSA   24-DEC-08 aA   24-JAN-09 aA   24-FEB-09 XYA   24-MAR-09 ABCA   24-APR-09 ABCA   24-MAY-09 ABCB   24-DEC-08 BLAB   24-JAN-09 BLAB   24-FEB-09 BLAB   24-MAR-09 BLABLAB   24-APR-09 BLABLA`

And there you have it, the required result. But it takes a lot of typing. In Oracle 10g we can get the same results, but a lot simpler.
Newly added in Oracle 10g is the IGNORE NULLS clause. And it does exactly what it says, it ignores nulls.
In the result we want to get the last value, per partition within the window that gets larger with the current row. The default windowing clause is Rows Unbounded Preceding (the docs say Range Unbounded Preceding, but I believe this is wrong. Range only works for a numeric offset like with Dates and Numbers)
In the following query, I inserted the windowing clause in there. Just to be very explicit.
`SQL> select code  2       , startdate  3       , val  4       , last_value (val ignore nulls)  5              over (partition by code  6                        order by startdate  7                        rows between unbounded preceding  8                                 and current row  9                   ) new_val 10    from tbl 11  ;COD STARTDATE VAL        NEW_VAL--- --------- ---------- ----------A   24-SEP-08 QRS        QRSA   24-OCT-08            QRSA   24-NOV-08            QRSA   24-DEC-08 a          aA   24-JAN-09            aA   24-FEB-09 XY         XYA   24-MAR-09 ABC        ABCA   24-APR-09            ABCA   24-MAY-09            ABCB   24-DEC-08 BLA        BLAB   24-JAN-09            BLAB   24-FEB-09            BLAB   24-MAR-09 BLABLA     BLABLAB   24-APR-09            BLABLA`

As the Windowing Clause is the default, you can also omit it.

`SQL> select code  2       , startdate  3       , last_value (val ignore nulls)  4              over (partition by code  5                        order by startdate  6                   ) new_val  7    from tbl  8  ;COD STARTDATE NEW_VAL--- --------- ----------A   24-SEP-08 QRSA   24-OCT-08 QRSA   24-NOV-08 QRSA   24-DEC-08 aA   24-JAN-09 aA   24-FEB-09 XYA   24-MAR-09 ABCA   24-APR-09 ABCA   24-MAY-09 ABCB   24-DEC-08 BLAB   24-JAN-09 BLAB   24-FEB-09 BLAB   24-MAR-09 BLABLAB   24-APR-09 BLABLA`

1 comment:

1. Great i helped me alot.