Analytic Ranking & Finding the Top-N.
When doing a top-n analysis, there are a few analytic functions that we might
use. Here are the functions and an example of how they rank/order values:
Function | Usage | What it Does |
---|---|---|
Rank() | Probably best used for other types of analysis. | A mix between ranking rows and values. |
Dense_Rank() | Can be used to find the "Top-N" values. | Ranks the Values. |
Row_Number() | Can be used to find the "Top-N" records or rows. | Ranks the Rows. |
The results of these functions when 'ordering-by' the score:
Dense Row Name Score Rank Rank Number ---- ----- ---- ----- ------ Deb 95 1 1 1 Jan 94 2 2 2 Lea 82 3 3 3 Ray 82 3 3 4 Ron 82 3 3 5 Sam 82 3 3 6 Don 71 7 4 7 Amy 68 8 5 8 Ben 68 8 5 9 Bob 68 8 5 10 Liz 53 11 6 11 Jim 52 12 7 12 --------------------------------
Note: when using the Row_Number() function, sorting by the "Score" may not be
sufficient -- it does not result in a unique ordering of the records. You may
want to provide a secondary sort column to insure you get the desired rows.
A Top-N Analysis
Below is an example that shows how to find the top-N scores or rows in each group.
The sub-query assigns a rank-row-number to the list of rows sorted by their score.
Because we sort "desc" we get the highest score as row-number = 1. The outer query
selects only the rows with a number of 1-3. Since there is a "Partition By" group,
we get the top 3 records/scores for each group.
with w_data as ( select NULL as grp , NULL as nm , NULL as scr from dual where 1=0 ----- union select 'x', 'Deb', 95 from dual union select 'x', 'Jan', 94 from dual union select 'x', 'Lea', 82 from dual union select 'x', 'Ray', 82 from dual union select 'x', 'Ron', 82 from dual union select 'x', 'Sam', 82 from dual union select 'x', 'Don', 71 from dual union select 'x', 'Amy', 68 from dual union select 'x', 'Ben', 68 from dual union select 'x', 'Bob', 68 from dual union select 'x', 'Liz', 53 from dual union select 'x', 'Jim', 52 from dual union select 'y', 'Abe', 9 from dual union select 'y', 'Buz', 8 from dual union select 'y', 'Dan', 6 from dual union select 'y', 'Eve', 5 from dual union select 'y', 'Sal', 3 from dual ----- ) select grp, nm, scr, rnk from ( select grp, nm, scr -- , row_number() over /* Top-N Rows */ , dense_rank() over /* Top-N Scores */ ( partition by grp order by scr desc ) as rnk from w_data ) where rnk <= 3 ;
And the results:
GRP NM SCR RNK --- --- --- --- x Deb 95 1 x Jan 94 2 x Lea 82 3 x Sam 82 3 x Ron 82 3 x Ray 82 3 y Abe 9 1 y Buz 8 2 y Dan 6 3
Notice that by using the "dense_rank()" function, we get the top-3 values from each group -- not the top-3 rows. Now if we switch the commented function to use the "row_number()" function we can see what happens when we want the top-3 rows:
Note that even though 4 people had the score of 82, only one was selected. We can never be sure which one will be selected unless we define a sort (the "order by" piece of the analytic function) that would uniquely sort those rows.
GRP NM SCR RNK --- --- --- --- x Deb 95 1 x Jan 94 2 x Lea 82 3 y Abe 9 1 y Buz 8 2 y Dan 6 3
Note that even though 4 people had the score of 82, only one was selected. We can never be sure which one will be selected unless we define a sort (the "order by" piece of the analytic function) that would uniquely sort those rows.
Also note: by removing the "desc" from the "order by" section, you can make this a
Bottom-N analysis. You can also use something like "where rnk = 5" to do an
Nth-Value or Nth-Record query. The analytic functions make this very flexible.
Note: Prior to analytic functions it was common to use "ROWNUM" to write top-n queries.
This 'rownum' method is not recommended. Also, it can only be run on one group
at a time -- The above query would have to be split into two separate queries.
No comments:
Post a Comment