Wednesday, September 4, 2019

ORCL - MATCH_RECOGNIZE

From time to time, collages ask me about doing max consecutive count, let's say we want to know a donor most continuous donation year total, we used to do the hard way* until we tried using the match_recognize ().

Courtesy to https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1, my favorite Oracle knowledge website.

"Introduced in Oracle 8i, Analytic Functions, also known as windowing functions, allow developers to perform tasks in SQL that were previously confined to procedural languages. Oracle 12c has added the MATCH_RECOGNIZE clause into the analytic function syntax to make pattern matching from SQL simpler. This article gives a flavour of what can be done using the MATCH_RECOGNIZE clause, but you will need to refer to the documentation to understand the true level of complexity possible."

So here is the new baby:))

select max(runs)
from
(select *  
from   (select distinct g.gift_donor_id, g.gift_year_of_giving from gift g where g.gift_donor_id = '00000xxxxx')
match_recognize ( 
  order by gift_year_of_giving 
  measures  
    first(gift_year_of_giving) as first_run, 
    last(gift_year_of_giving) as last_run, 
    count(*) as runs, 
    match_number() as grp 
  pattern ( strt consecutive* ) 
  define  
    consecutive as gift_year_of_giving = ( prev (gift_year_of_giving ) + 1 ) 
))

From Oracle:
What does this query do? The following explains each line in the MATCH_RECOGNIZE clause:

  • PARTITION BY divides the data from the Ticker table into logical groups where each group contains one stock symbol.
  • ORDER BY orders the data within each logical group by tstamp.
  • MEASURES defines three measures: the timestamp at the beginning of a V-shape (start_tstamp), the timestamp at the bottom of a V-shape (bottom_tstamp), and the timestamp at the end of the a V-shape (end_tstamp). The bottom_tstamp and end_tstamp measures use the LAST() function to ensure that the values retrieved are the final value of the timestamp within each pattern match.
  • ONE ROW PER MATCH means that for every pattern match found, there will be one row of output.
  • AFTER MATCH SKIP TO LAST UP means that whenever you find a match you restart your search at the row that is the last row of the UP pattern variable. A pattern variable is a variable used in a MATCH_RECOGNIZE statement, and is defined in the DEFINE clause.
  • PATTERN (STRT DOWN+ UP+) says that the pattern you are searching for has three pattern variables: STRT, DOWN, and UP. The plus sign (+) after DOWN and UP means that at least one row must be mapped to each of them. The pattern defines a regular expression, which is a highly expressive way to search for patterns.
  • DEFINE gives us the conditions that must be met for a row to map to your row pattern variables STRT, DOWN, and UP. Because there is no condition for STRT, any row can be mapped to STRT. Why have a pattern variable with no condition? You use it as a starting point for testing for matches. Both DOWN and UP take advantage of the PREV() function, which lets them compare the price in the current row to the price in the prior row. DOWN is matched when a row has a lower price than the row that preceded it, so it defines the downward (left) leg of our V-shape. A row can be mapped to UP if the row has a higher price than the row that preceded it.
*Hard way:):

with years as
(
select distinct g.id, g.giving_year
from gift, allocation a
where g.gift_associated_allocation = a.allocation_code and a.athletics_ind = 'Y'--upper(a.short_name) like '%ATHLETICS%'
order by g.id, g.gift_year
)
select id,   ayear from_yr, byear to_year, yearcon
from
(
  select
    a.id,
    a.year ayear,
    b.year byear,
    (b.year - a.year)+1 yearcon,
    dense_rank() over (partition by a.id order by (b.year - a.year) desc) rank
  from
    years a
    join years b on a.id = b.id and b.year > a.year
  where
    b.year - a.year =
      (select count(*)-1
         from years a1
        where a.id = a1.id
             and a1.year between a.year and b.year)
)
where rank = 1 and yearcon>=10
order by 4 desc

No comments:

Post a Comment