31 May 2016

Top N- queries: using the 12c syntax.

One of the new features with Oracle database 12c is the new syntax for Top N queries and pagination. Did we really need this? Should you choose for the new syntax over the way we used to do it, with an inline view? I think so, it simply adds syntactic clarity to the query, and in this blogpost I will show the difference between the "old" and the "new".

For the examples I will use my all time favourite demo data: the EMP table.

SQL> select ename
  2        ,sal
  3    from emp
  4   order by sal desc
  5  /

ENAME             SAL
---------- ----------
KING             5000
FORD             3000
SCOTT            3000
JONES            2975
BLAKE            2850
CLARK            2450
ALLEN            1600
TURNER           1500
MILLER           1300
WARD             1250
Widlake          1250
ADAMS            1100
JAMES             950
SMITH             800

14 rows selected.
As you can tell from the output above, KING has the highest salary and FORD and SCOTT have the same salary which is the second highest.

If you wanted to write a Top N query before Oracle database 12c, let's say the Top 2 of most earning EMP, you would probably have written something with an inline view.

SQL> select ename
  2        ,sal
  3    from (select ename
  4                ,sal
  5            from emp
  6           order by sal desc
  7         )
  8   where rownum <= 2
  9  /

ENAME             SAL
---------- ----------
KING             5000
SCOTT            3000
This query might do the job most of the time, but the results might not be what you were looking for. In this case the requirement is "the Top 2 of most earning EMP", SCOTT and FORD should have both been in the results as they have the same salary.
To resolve this, you would have to rewrite your query using an Analytic Ranking Function in the inline view:
SQL> select ename
  2        ,sal
  3    from (select ename
  4                ,sal
  5                ,rank() over (order by sal desc) rn
  6            from emp
  7         )
  8   where rn <= 2
  9  /

ENAME             SAL
---------- ----------
KING             5000
SCOTT            3000
FORD             3000

Using the new Top N syntax in Oracle database 12c, the query is a lot easier to understand.

SQL> select ename
  2        ,sal
  3    from emp
  4   order by sal desc
  5   fetch first 2 rows with ties
  6  /

ENAME             SAL
---------- ----------
KING             5000
SCOTT            3000
FORD             3000
On line 4 the results are sorted based on the salary (highest on top) and line 5 instructs to get the first two rows. Because of the addition "with ties" both SCOTT and FORD are shown in the results.
To see what happens under the covers, an explain plan is created for this query.
SQL> explain plan for
  2  select ename
  3        ,sal
  4    from emp
  5   order by sal desc
  6   fetch first 2 rows with ties
  7  /

Explained.

SQL> select *
  2    from table (dbms_xplan.display())
  3  /

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3291446077

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   154 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   154 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rank"<=2)
   2 - filter(RANK() OVER ( ORDER BY INTERNAL_FUNCTION("SAL") DESC )<=2)

16 rows selected.
The output above shows that the Analytic Function RANK is used.

18 May 2016

Rounding amounts, divide cents over multiple lines

In previous articles I wrote about dealing with a missing cent when you need to divide a certain amount over multiple lines. In these articles, links are at the bottom, I described a method to calculate the difference on the last row.
Then a question arose (as a comment):
What if for example i have 42 records and i wish to divide 100 by 42. I would get a rounded value of 2.38. If i multiply this by 42 it would amount to just 99.96. What if want to spread the .04 difference on 4 records, not just the last record. In effect i'll be having 4 records with 2.39. Right now i'm doing this via cursor. Im kinda hoping i can do this using sql or analytic functions
Let's create a table first, called T:
create table t
as
select rownum id
  from dual
 connect by level <= 42
In order to determine the number of rows, we need three pieces of information:
  1. The amount that we need to divide
  2. The total number of rows in the set
  3. The difference between the rounded amount and the amount that we need to divide
 select id
      ,100 amount
      ,count(*) over () entries
      ,round (100 / count(*) over (), 2) rounded
    from t
To calculate how many rows will have the extra cent added or subtracted, the following formula is used.
abs (amount - (entries * rounded)) * 100
When you want the rounding done on the "first" rows, a simple CASE expression can be used to mark the rows
case 
when rownum <= abs ((amount - (entries * rounded)) * 100)
then 'x'
end as indicator
The query now looks like the following, with the first ten rows:
with amounts
as
( select id
      ,100 amount
      ,count(*) over () entries
      ,round (100 / count(*) over (), 2) rounded
    from t
)
select id
      ,amount
      ,entries
      ,rounded
      ,case 
       when rownum <= abs ((amount - (entries * rounded)) * 100)
       then 'x'
       end as indicator
 from amounts
;
 ID     AMOUNT  ENTRIES    ROUNDED F
---------- ---------- ---------- ---------- -
  1   100       42       2.38 x
  2   100       42       2.38 x
  3   100       42       2.38 x
  4   100       42       2.38 x
  5   100       42       2.38
  6   100       42       2.38
  7   100       42       2.38
  8   100       42       2.38
  9   100       42       2.38
 10   100       42       2.38
The last piece of the puzzle is to determine if we need to add or subtract the cent. Using the SIGN function is an easy way to determine this.
sign (amount - (entries * rounded)) as pos_neg
Putting everything together will give you the following query (with the first 10 rows)
with amounts
as
( select id
      ,100 amount
      ,count(*) over () entries
      ,round (100 / count(*) over (), 2) rounded
    from t
)
,indicators as 
(
select id
      ,amount
      ,entries
      ,rounded
      ,case 
       when rownum <= abs ((amount - (entries * rounded)) * 100)
       then 'x'
       end as indicator
      ,sign (amount - (entries * rounded)) as pos_neg
 from amounts
)
select id
      ,rounded +
       (nvl2 (indicator, 0.01, 0) * pos_neg) final_amount
  from indicators
;
 ID FINAL_AMOUNT
---------- ------------
  1    2.39
  2    2.39
  3    2.39
  4    2.39
  5    2.38
  6    2.38
  7    2.38
  8    2.38
  9    2.38
 10    2.38

And there it is. The rounding is divided over the first four rows.


Don't want to use the first rows, but the last rows instead? Use the following expression to set the indicator
case 
when rownum >
 case sign ((amount - (entries * rounded)))
   when -1 then entries - abs ((amount - (entries * rounded)) * 100)
   else entries - (amount - (entries * rounded)) * 100
   end
then 'x'
end as indicator

Links