12 June 2009

Analytic Function: Finding Gaps

A little while ago Anton Nielsen posted a blog named "SQL for Spanned Data".

In this blog he describes a challenging query involving a table which stores start and end dates. The challenge as stated by Anton:

The challenge was to create a sql statement to only return contiguous spans by division.

To illustrate what is required:

Id, Division, Start_date, End_date
10 1 09-JUN-2009 10-JUN-2009
11 1 11-JUN-2009 12-JUN-2009
12 1 13-JUN-2009 14-JUN-2009 -- Note the following row is not contiguous
14 1 17-JUN-2009 18-JUN-2009
15 1 19-JUN-2009 20-JUN-2009

The final result will be

Division, Start_date, End_date
1 09-JUN-2009 14-JUN-2009
1 17-JUN-2009 20-JUN-2009


In this post, I will use Analytic Functions to find contiguous spans. As Anton already provided the DDL for this blog, I will not be repeating that here. So, if you want to follow along grab the DDL from his blog and join the fun.

The logic that we are using, is like this

  1. compare the current start date with the previous end date

  2. if the difference between these dates is one day, we have a contiguous span

  3. if the difference is anything else (greater than one day or NULL) then we start a new group of spans

  4. Now that we have a list of numbers (1 and 0) we do a running total, each span group will have it's own number

  5. Get the first and last day of each span group



To compare the current start date with the previous end date, we will use the LAG function. With the LAG function you can "look back" in your result set.
Let's have a look at the effect of this function

SQL> select id
2 , start_date
3 , end_date
4 , lag (end_date) over (partition by division
5 order by start_date
6 )
7 from spantest
8 where start_date between to_date ('01-06-2009', 'dd-mm-yyyy')
9 and to_date ('20-06-2009', 'dd-mm-yyyy')
10 and division = 1
11 order by start_date
12 /

ID START_DAT END_DATE LAG(END_D
---------- --------- --------- ---------
73383 02-JUN-09 03-JUN-09
73384 04-JUN-09 05-JUN-09 03-JUN-09
73385 06-JUN-09 07-JUN-09 05-JUN-09
73386 08-JUN-09 09-JUN-09 07-JUN-09
73387 10-JUN-09 11-JUN-09 09-JUN-09
73388 12-JUN-09 13-JUN-09 11-JUN-09
73389 14-JUN-09 15-JUN-09 13-JUN-09
73391 18-JUN-09 19-JUN-09 15-JUN-09
73392 20-JUN-09 21-JUN-09 19-JUN-09

9 rows selected.

The last column is the end date of the previous record. This makes it easy to implement steps 2 and 3. For this we will use a CASE statement


SQL> select id
2 , start_date
3 , end_date
4 , case
5 when start_date -
6 lag (end_date) over (partition by division
7 order by start_date
8 ) = 1
9 then 0
10 else 1
11 end span_group
12 from spantest
13 where start_date between to_date ('01-06-2009', 'dd-mm-yyyy')
14 and to_date ('20-06-2009', 'dd-mm-yyyy')
15 and division = 1
16 order by start_date
17 /

ID START_DAT END_DATE SPAN_GROUP
---------- --------- --------- ----------
73383 02-JUN-09 03-JUN-09 1
73384 04-JUN-09 05-JUN-09 0
73385 06-JUN-09 07-JUN-09 0
73386 08-JUN-09 09-JUN-09 0
73387 10-JUN-09 11-JUN-09 0
73388 12-JUN-09 13-JUN-09 0
73389 14-JUN-09 15-JUN-09 0
73391 18-JUN-09 19-JUN-09 1
73392 20-JUN-09 21-JUN-09 0

9 rows selected.

The last column (named Span_Group) now contains a list of 1 and 0. The "1" indicate the start of a new Span Group. This sample set therefore contains two Span Groups.
Using the Running Total technique, we can more clearly identify the Span Groups. Because it is not possible to nest analytic functions, we push the query we build so far into an inline view. Then we can use the SUM () OVER () on the Span Groups we created earlier.

SQL> select id
2 , start_date
3 , end_date
4 , sum (span_group) over (partition by division
5 order by start_date
6 ) span_grps
7 from (
8 select id
9 , division
10 , start_date
11 , end_date
12 , case
13 when start_date -
14 lag (end_date) over (partition by division
15 order by start_date
16 ) = 1
17 then 0
18 else 1
19 end span_group
20 from spantest
21 where start_date between to_date ('01-06-2009', 'dd-mm-yyyy')
22 and to_date ('20-06-2009', 'dd-mm-yyyy')
23 and division = 1
24 )
25 order by start_date
26 /

ID START_DAT END_DATE SPAN_GRPS
---------- --------- --------- ----------
73383 02-JUN-09 03-JUN-09 1
73384 04-JUN-09 05-JUN-09 1
73385 06-JUN-09 07-JUN-09 1
73386 08-JUN-09 09-JUN-09 1
73387 10-JUN-09 11-JUN-09 1
73388 12-JUN-09 13-JUN-09 1
73389 14-JUN-09 15-JUN-09 1
73391 18-JUN-09 19-JUN-09 2
73392 20-JUN-09 21-JUN-09 2

9 rows selected.

The last column in the result set (named SPAN_GRPS) now clearly identifies the two groups.

The final thing we need to do is retrieve the earliest start date and the latest end date, a simple aggregate will suffice

SQL> select division
2 , min (start_date) start_date
3 , max (end_date) end_date
4 from (
5 select id
6 , division
7 , start_date
8 , end_date
9 , sum (span_group) over (partition by division
10 order by start_date
11 ) span_grps
12 from (
13 select id
14 , division
15 , start_date
16 , end_date
17 , case
18 when start_date -
19 lag (end_date) over (partition by division
20 order by start_date
21 ) = 1
22 then 0
23 else 1
24 end span_group
25 from spantest
26 where start_date between to_date ('01-06-2009', 'dd-mm-yyyy')
27 and to_date ('20-06-2009', 'dd-mm-yyyy')
28 and division = 1
29 )
30 )
31 group by division, span_grps
32 /

DIVISION START_DAT END_DATE
---------- --------- ---------
1 02-JUN-09 15-JUN-09
1 18-JUN-09 21-JUN-09


Personally I think that Analytic Functions are a lot easier to understand than the CONNECT BY query. Just out of curiosity I ran both Queries with Autotrace on, and here are the Statistics on the queries:


Statistics
---------------------------------------------------
0 recursive calls
0 db block gets
248 consistent gets
0 physical reads
0 redo size
112543 bytes sent via SQL*Net to client
3164 bytes received via SQL*Net from client
255 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
3800 rows processed


The CONNECT BY Query showed these statistics:

Statistics
---------------------------------------------------
3 recursive calls
233 db block gets
3882 consistent gets
224 physical reads
692 redo size
112543 bytes sent via SQL*Net to client
3164 bytes received via SQL*Net from client
255 SQL*Net roundtrips to/from client
18 sorts (memory)
0 sorts (disk)
3800 rows processed

And yes, I did run the latter query a few times to reduce the recursive calls (the first time around 161 recursive calls)

Enough said,.. Another case to show the power of Analytic Functions.
Final Note:
The results which I got from my query are the same as the results from the first query in the original post.
The query, which Anton labeled "Don't run this"... well, I did run it and got these results for the Division 1 in June:

1 02-JUN-09 15-JUN-09
1 04-JUN-09 15-JUN-09
1 06-JUN-09 15-JUN-09
1 08-JUN-09 15-JUN-09
1 10-JUN-09 15-JUN-09
1 12-JUN-09 15-JUN-09
1 14-JUN-09 15-JUN-09

And I must admit, I don't understand these results... They are completely different than my query and Anton's first query. Maybe I don't understand the original requirements,...

11 June 2009

SQL Developer: Backspace not working properly

Nowadays I use SQL Developer on a daily basis, and I like it more and more. But every now and then something rears its ugly head, and that can be annoying.

Just this morning I had one of those encounters. The "Backspace"-key stopped working. Not only the backspace, but DELETE, ENTER, navigation with the arrow keys...

I tried re-installing SQL Developer, reboot the computer... no success.

"Google is your friend" at times like these. One of the first hits I came across was this blog post and that solved my problem.


The only thing you need to do is go from the menu "Tools -> Preferences" and in the Preferences window to "Accelerators", push the button "Load Preset" and pick the Default. And that's it.

Now it works like a charm again.