Notes on Oracle: Carrying down values with Analytic Functions

The other day - yesterday actually - I got an email with a question regarding Analytic Functions.
This was the requirement for the query:

In a table there are a CODE, STARTDATE and VAL columns. The combination of CODE and STARTDATE are mandatory, the VAL is optional. I want to get an overview where the VAL column shows the latest (based on the STARTDATE) value. If there is no value filled out for a particular date, it should show the value of the latest entry for that particular CODE.

Let's take a look at an example to clarify the requirement. First of all the data in the table:


COD STARTDATE VAL
--- --------- ----------
A   24-SEP-08 QRS
A   24-OCT-08
A   24-NOV-08
A   24-DEC-08 a
A   24-JAN-09
A   24-FEB-09 XY
A   24-MAR-09 ABC
A   24-APR-09
A   24-MAY-09
B   24-DEC-08 BLA
B   24-JAN-09
B   24-FEB-09
B   24-MAR-09 BLABLA
B   24-APR-09

As you can see in the above output, there are two CODES ("A" and "B"). The combination of CODE and STARTDATE is unique. The last column (VAL) has some values filled out, not all.
The desired output would be:


COD STARTDATE VAL
--- --------- ------
A   24-SEP-08 QRS
A   24-OCT-08 QRS
A   24-NOV-08 QRS
A   24-DEC-08 a
A   24-JAN-09 a
A   24-FEB-09 XY
A   24-MAR-09 ABC
A   24-APR-09 ABC
A   24-MAY-09 ABC
B   24-DEC-08 BLA
B   24-JAN-09 BLA
B   24-FEB-09 BLA
B   24-MAR-09 BLABLA
B   24-APR-09 BLABLA

When the VAL column has no entry (IS NULL) the output should show the latest value of the VAL column - based on the Startdate within the CODE. Latest in this case means: the most recent value of the VAL column with regards to the STARTDATE column of the current record.

Is the requirement clear? Time to solve it. I will show you two ways of solving this, there are possibly many more ways of getting the required output. One way is quite cumbersome, but will work in Oracle 8.1.6 EE and up. The other way is really trivial and works in Oracle 10g and up.

First method, fully explained.
Per CODE, we need to identify which VAL should be carried down, until a different VAL is encountered. To identify these subgroups we use a Case statement:


SQL> select code
  2       , startdate
  3       , val
  4       , case
  5           when val is not null
  6           then 1
  7         end new_val
  8    from tbl
  9   order by code
 10          , startdate
 11  /

COD STARTDATE VAL           NEW_VAL
--- --------- ---------- ----------
A   24-SEP-08 QRS                 1
A   24-OCT-08
A   24-NOV-08
A   24-DEC-08 a                   1
A   24-JAN-09
A   24-FEB-09 XY                  1
A   24-MAR-09 ABC                 1
A   24-APR-09
A   24-MAY-09
B   24-DEC-08 BLA                 1
B   24-JAN-09
B   24-FEB-09
B   24-MAR-09 BLABLA              1
B   24-APR-09

Now the column NEW_VAL shows the marker for each subgroup. Using the "running total" technique we can clearly see which records belong together.


SQL> select code
  2       , startdate
  3       , val
  4       , sum (case
  5                when val is not null
  6                then 1
  7              end
  8             ) over (partition by code
  9                         order by startdate
 10                    ) new_val
 11    from tbl
 12   order by code
 13          , startdate
 14  /

COD STARTDATE VAL           NEW_VAL
--- --------- ---------- ----------
A   24-SEP-08 QRS                 1
A   24-OCT-08                     1
A   24-NOV-08                     1
A   24-DEC-08 a                   2
A   24-JAN-09                     2
A   24-FEB-09 XY                  3
A   24-MAR-09 ABC                 4
A   24-APR-09                     4
A   24-MAY-09                     4
B   24-DEC-08 BLA                 1
B   24-JAN-09                     1
B   24-FEB-09                     1
B   24-MAR-09 BLABLA              2
B   24-APR-09                     2

With the distinction made - you can see the different subgroups, each having a unique number within the partition, we need the first value of the VAL column per subgroup. For that we will use the FIRST_VALUE function. Notice that the partitioning clause in the FIRST_VALUE is using the subgroups we defined in the previous section as well as the CODE.


SQL> select code
  2       , startdate
  3       , first_value (val)
  4             over (partition by code
  5                              , new_val
  6                       order by startdate
  7                  ) new_val
  8    from (
  9  select code
 10       , startdate
 11       , val
 12       , sum (case
 13                when val is not null
 14                then 1
 15              end
 16             ) over (partition by code
 17                         order by startdate
 18                    ) new_val
 19    from tbl
 20  )
 21   order by code
 22          , startdate
 23  /

COD STARTDATE NEW_VAL
--- --------- ----------
A   24-SEP-08 QRS
A   24-OCT-08 QRS
A   24-NOV-08 QRS
A   24-DEC-08 a
A   24-JAN-09 a
A   24-FEB-09 XY
A   24-MAR-09 ABC
A   24-APR-09 ABC
A   24-MAY-09 ABC
B   24-DEC-08 BLA
B   24-JAN-09 BLA
B   24-FEB-09 BLA
B   24-MAR-09 BLABLA
B   24-APR-09 BLABLA

And there you have it, the required result. But it takes a lot of typing. In Oracle 10g we can get the same results, but a lot simpler.
Newly added in Oracle 10g is the IGNORE NULLS clause. And it does exactly what it says, it ignores nulls.
In the result we want to get the last value, per partition within the window that gets larger with the current row. The default windowing clause is Rows Unbounded Preceding (the docs say Range Unbounded Preceding, but I believe this is wrong. Range only works for a numeric offset like with Dates and Numbers)
In the following query, I inserted the windowing clause in there. Just to be very explicit.


SQL> select code
  2       , startdate
  3       , val
  4       , last_value (val ignore nulls)
  5              over (partition by code
  6                        order by startdate
  7                        rows between unbounded preceding
  8                                 and current row
  9                   ) new_val
 10    from tbl
 11  ;

COD STARTDATE VAL        NEW_VAL
--- --------- ---------- ----------
A   24-SEP-08 QRS        QRS
A   24-OCT-08            QRS
A   24-NOV-08            QRS
A   24-DEC-08 a          a
A   24-JAN-09            a
A   24-FEB-09 XY         XY
A   24-MAR-09 ABC        ABC
A   24-APR-09            ABC
A   24-MAY-09            ABC
B   24-DEC-08 BLA        BLA
B   24-JAN-09            BLA
B   24-FEB-09            BLA
B   24-MAR-09 BLABLA     BLABLA
B   24-APR-09            BLABLA

As the Windowing Clause is the default, you can also omit it.


SQL> select code
  2       , startdate
  3       , last_value (val ignore nulls)
  4              over (partition by code
  5                        order by startdate
  6                   ) new_val
  7    from tbl
  8  ;

COD STARTDATE NEW_VAL
--- --------- ----------
A   24-SEP-08 QRS
A   24-OCT-08 QRS
A   24-NOV-08 QRS
A   24-DEC-08 a
A   24-JAN-09 a
A   24-FEB-09 XY
A   24-MAR-09 ABC
A   24-APR-09 ABC
A   24-MAY-09 ABC
B   24-DEC-08 BLA
B   24-JAN-09 BLA
B   24-FEB-09 BLA
B   24-MAR-09 BLABLA
B   24-APR-09 BLABLA

Notes on Oracle

Pages

24 September 2009

Carrying down values with Analytic Functions

1 comment:

Blog Archive

Non Oracle Links