Monday, July 6, 2015

Finding Contiguous Date Ranges in your Data - using Recursive SQL

Hey team,

I found myself needing to write a query to detect the contiguous date ranges available in my warehouse's Point of Sale fact data table (in Development).

While we have data for every day in Production, in Development we use a technique that only copies a few weeks from the same month of two consecutive years - this lets us calculate "Year-over-Year" growth - while limiting the data segment size.

In such cases - finding the MIN/MAX is not acceptible, because it will look like I have data for about 13 months - while in actuality I only have data for two months (April 2014 and April 2015 for example).

I decided not to "google" my problem because I wanted a brain-teaser SQL problem (I love those) - but I think I came up with something kind of cool - a recursive SQL solution using Oracle's CONNECT BY syntax.  I haven't yet tackled it with ANSI recursive SQL - but I plan to convert this approach soon.

Anyway - here goes:
WITH dates AS (
 SELECT DISTINCT
 TRUNC (date_id, 'DD') AS date_id
 FROM (SELECT TO_DATE ('01-APR-2014', 'DD-MON-YYYY') + LEVEL - 1 AS date_id
         FROM dual
        CONNECT BY LEVEL <= 60
       UNION ALL
       SELECT TO_DATE ('01-APR-2015', 'DD-MON-YYYY') + LEVEL - 1 AS date_id
         FROM dual
        CONNECT BY LEVEL <= 90
      ) fact_table /* Here we are simulating a fact table having unbalanced date ranges available */
 )
 , recursive_dates AS (
 SELECT dates.date_id
 /* CONNECT_BY_ISLEAF tells me that I am at the bottom of the hierarchy in the recursive join for a branch */
 , CONNECT_BY_ISLEAF AS date_id_is_leaf_flag
 /* CONNECT_BY_ROOT always returns the top node in the recursive join - where "LEVEL" = 1 */
 , CONNECT_BY_ROOT date_id AS root_date_id
 /* The LEVEL expression tells me how many levels deep I am in the recursive join */
 , LEVEL AS date_id_level
 FROM dates
 /* Here we are finding any dates which do NOT have a preceding date (one day prior) in our "DATES" data set */
 START WITH date_id NOT IN (SELECT date_id + 1
 FROM dates
 )
 /* Here we are recursively joining the data set to itself to find the next date that is one day in the future from the current row's date */
 CONNECT BY PRIOR date_id = /* NEXT */ (date_id - 1)
 )
 SELECT root_date_id AS min_date_id
 , date_id AS max_date_id
 FROM recursive_dates rd
 WHERE date_id_is_leaf_flag = 1
 ORDER BY min_date_id 

/

I hope this is helpful - can you think of other ways to tackle it?

Phil 

No comments:

Post a Comment