Tuesday, November 23, 2021

Select Top(n) or nth-Ranked Rows (Top-N Analysis)

Sometimes you wish to select the top or highest 'n' rows of a sorted data-set. (or the bottom 'n' rows -- Reversing the sort will switch from top to bottom.) Simialarly, you may wish to only return the 'nth' row. For example, from a data set sorted by date, you may wish to find the oldest 10 records, or the one 3rd youngest record. In my experience there are 3 main ways to do this:

  
  1) Use the psudo-column ROWNUM.
  
  2) Use the Analytic functions Rank(), Dense_Rank(), or Row_Number().
  
  3) Use the Fetch and Offset keywords.
  
  
  
For the following examples, I'll use the a 'WITH' data set. Pre-pend it to all the other examples that follow to run them.

   with 
     w_data ( id, nm )  as
       ( select 30, 'Ted' from dual union all
         select 68, 'Dan' from dual union all
         select 21, 'Joe' from dual union all
         select 49, 'Ben' from dual union all
         select 12, 'Sam' from dual union all
         select 57, 'Art' from dual )
   select  id, nm
     from  w_data
    order  by nm ;
    
---

   {with...}
   select  id, nm
     from  w_data
    order  by id ;

The ROWNUM Psudo-Colum

The 'rownum' psudo column can be confusing; it returns a sequential number for each row as it is pulled from the source. It doesn't conform to any sort unless you build that into your query. Notice the example below; the second query returns nothing. It's trying to limit the pull to the 4th row -- but each row that's tentatively pulled and tested against the 'WHERE' clause, is assigned to be the first row, so it fails to qualify.


   {with...}
   select  id, nm      /* Returns 3 rows */
     from  w_data
    where  rownum < 4 ;  
    
   {with...}
   select  id, nm      /* Returns no rows */
     from  w_data
    where  rownum = 4 ;
    

However, if you use 'ROWNUM' properly, it can find your requested rows. Use a double sub-query as shown below, you can assign the 'ROWNUM' as a ranking after sorting the data (in this example sorting by "nm"). you can then use the 'WHERE' clause to select out the 'rankings' for which you are looking: top-3 (where rnk < 4), 3rd row (where rnk = 3).


And, notice that if you want to select the Bottom(n) rows, you just need to change the initial sort to "nm desc"

   {with...}
   select  id, nm, rnk
     from  ( select  id, nm
                  ,  rownum as rnk
               from  ( select id, nm from w_data order by nm ) 
           )
    where  rnk < 4
   ;
    
   /* Try other 'WHERE' clauses: */
    
     a)  where  rnk between 2 and 4
     b)  where  rnk = 2
     c)  where  rnk >= 3

Analytic Ranking

Using the analytic functions Rank(), Dense_Rank(), or Row_Number -- you can 'rank' the rows and select out the nth row or the top(n) rows as shown in this example.


Here I use the Row_Number() function, it does differ from the two 'rank' functions, but I'll leave that to another post focusing on those differences. Make sure you specify the correct sort order in the "(order by ---)" part of the analytic function.

   {with...}
   select  id, nm, rnk
     from  ( select  id, nm
                  ,  row_number() over (order by nm) as rnk
               from  w_data 
           )
    where  1=1
      and  rnk <= 4          /* top 4 records */
   -- and  rnk in (2,4,6)    /* rank is 2nd, 4th, or 6th */
   -- and  rnk = 1           /* first record */
   ;


Offset and Fetch

The "OFFSET" & "FETCH" offer a very similar functionality -- for more details, see my other post on these keywords. Put these after the sort (order by __ ). The "OFFSET" specifies how many rows to skip, and the "FETCH" specifies how many rows to return.


   Examples:
   
   a) offset 0 rows fetch next 10 rows only   -- get the top 10 rows
   
   b) offset 2 rows fetch next 1 row only     -- get the 3rd row 

   {with...}
   select  id, nm
     from  w_data 
    order  by nm
   OFFSET 2 rows
    FETCH next 3 rows only
   ;

No comments:

Post a Comment