Monday, July 15, 2013

Analytic Functions: Rank() Dense_Rank() and Row_Number()


Analytic Ranking & Finding the Top-N.

When doing a top-n analysis, there are a few analytic functions that we might use. Here are the functions and an example of how they rank/order values:


Function Usage What it Does
Rank() Probably best used for other types of analysis. A mix between ranking rows and values.
Dense_Rank() Can be used to find the "Top-N" values. Ranks the Values.
Row_Number() Can be used to find the "Top-N" records or rows. Ranks the Rows.

The results of these functions when 'ordering-by' the score:


   
                      Dense  Row
   Name  Score  Rank  Rank   Number
   ----  -----  ----  -----  ------
   Deb     95     1      1      1
   Jan     94     2      2      2
   Lea     82     3      3      3
   Ray     82     3      3      4
   Ron     82     3      3      5
   Sam     82     3      3      6
   Don     71     7      4      7
   Amy     68     8      5      8
   Ben     68     8      5      9
   Bob     68     8      5     10
   Liz     53    11      6     11
   Jim     52    12      7     12
   --------------------------------
   


Note: when using the Row_Number() function, sorting by the "Score" may not be sufficient -- it does not result in a unique ordering of the records. You may want to provide a secondary sort column to insure you get the desired rows.



A Top-N Analysis


Below is an example that shows how to find the top-N scores or rows in each group.

The sub-query assigns a rank-row-number to the list of rows sorted by their score. Because we sort "desc" we get the highest score as row-number = 1. The outer query selects only the rows with a number of 1-3. Since there is a "Partition By" group, we get the top 3 records/scores for each group.


   
   with
     w_data  as
       ( select  NULL  as grp
              ,  NULL  as nm
              ,  NULL  as scr
           from  dual  where 1=0
         -----
         union select   'x', 'Deb', 95   from dual
         union select   'x', 'Jan', 94   from dual
         union select   'x', 'Lea', 82   from dual
         union select   'x', 'Ray', 82   from dual
         union select   'x', 'Ron', 82   from dual
         union select   'x', 'Sam', 82   from dual
         union select   'x', 'Don', 71   from dual
         union select   'x', 'Amy', 68   from dual
         union select   'x', 'Ben', 68   from dual
         union select   'x', 'Bob', 68   from dual
         union select   'x', 'Liz', 53   from dual
         union select   'x', 'Jim', 52   from dual
         union select   'y', 'Abe',  9   from dual
         union select   'y', 'Buz',  8   from dual
         union select   'y', 'Dan',  6   from dual
         union select   'y', 'Eve',  5   from dual
         union select   'y', 'Sal',  3   from dual
         -----
       )
   select  grp,  nm,  scr, rnk
     from  ( select  grp,  nm,  scr
               -- ,  row_number() over  /* Top-N Rows */
                  ,  dense_rank() over  /* Top-N Scores */
                       ( partition by  grp
                         order     by  scr desc
                       ) as rnk
               from  w_data
           )
    where  rnk <= 3 ;
   



And the results:

   
   GRP  NM   SCR  RNK
   ---  ---  ---  ---
   x    Deb   95    1
   x    Jan   94    2
   x    Lea   82    3
   x    Sam   82    3
   x    Ron   82    3
   x    Ray   82    3
   y    Abe    9    1
   y    Buz    8    2
   y    Dan    6    3



Notice that by using the "dense_rank()" function, we get the top-3 values from each group -- not the top-3 rows. Now if we switch the commented function to use the "row_number()" function we can see what happens when we want the top-3 rows:

   
   GRP  NM   SCR  RNK
   ---  ---  ---  ---
   x    Deb   95    1
   x    Jan   94    2
   x    Lea   82    3
   y    Abe    9    1
   y    Buz    8    2
   y    Dan    6    3


Note that even though 4 people had the score of 82, only one was selected. We can never be sure which one will be selected unless we define a sort (the "order by" piece of the analytic function) that would uniquely sort those rows.

Also note: by removing the "desc" from the "order by" section, you can make this a Bottom-N analysis. You can also use something like "where rnk = 5" to do an Nth-Value or Nth-Record query. The analytic functions make this very flexible.

Note: Prior to analytic functions it was common to use "ROWNUM" to write top-n queries. This 'rownum' method is not recommended. Also, it can only be run on one group at a time -- The above query would have to be split into two separate queries.

No comments:

Post a Comment