Thursday, August 4, 2011

A basic Hierarchical (Connect-By) Query

Here is a sample data-set that shows a list of people with their associated "ID" numbers. The "prnt" column defines a hierarchical relationship by giving the "ID" number of the person directly above them in the hierarchy (i.e. their boss or parent). Notice that the first two records don't have a 'parent' -- they are top records in the hierarchy.

/* --- code --- */


     WITH  
       w_data  as 
         ( select   1 id, 'Albert'    nm, null prnt  from dual union all
           select   2   , 'Rhonda'      , null       from dual union all
           select   3   , 'Ralph'       ,    1       from dual union all
           select   4   , 'Erma'        ,    1       from dual union all
           select   5   , 'Leopold'     ,    2       from dual union all
           select   6   , 'Zanthia'     ,    3       from dual union all
           select   7   , 'Reginald'    ,    3       from dual union all
           select   8   , 'Hillary'     ,    4       from dual union all
           select   9   , 'Orpheus'     ,    4       from dual union all
           select  10   , 'Linda'       ,    4       from dual union all
           select  11   , 'Ronald'      ,    5       from dual union all
           select  12   , 'Deborah'     ,    5       from dual union all
           select  13   , 'Andrew'      ,    6       from dual union all
           select  14   , 'Maggie'      ,    6       from dual union all
           select  15   , 'Fletcher'    ,    6       from dual union all
           select  16   , 'Wanda'       ,    7       from dual union all
           select  17   , 'Dexter'      ,    8       from dual union all
           select  18   , 'Thelma'      ,    9       from dual union all
           select  19   , 'Wilson'      ,    9       from dual union all
           select  20   , 'Amanda'      ,   10       from dual union all
           select  21   , 'Thor'        ,   10       from dual union all
           select  22   , 'Glenda'      ,   11       from dual union all
           select  23   , 'Norman'      ,   11       from dual union all
           select  24   , 'Edith'       ,   11       from dual union all
           select  25   , 'Arvid'       ,   12       from dual union all
           select  26   , 'Wilma'       ,   12       from dual union all
           select  27   , 'Frederic'    ,   12       from dual union all
           select  28   , 'Unice'       ,   12       from dual 
         ) 
     select  *  from  w_data ;




Now, here's a simple query that uses the prior data and constructs a hierarchical list that clearly shows the parent-child connections. An explanation and the sample output follows. To run this query, combine the sample data from the WITH clause above with the query below.

/* --- code --- */
        
  
    /* Include the WITH clause sample data from above here. */      
    select  lpad( level, level*2 ) as "Level"
         ,  level 
         ,  id
         ,  nm
         ,  prnt
         ,  sys_connect_by_path( nm , ', ' ) as "Path"
      from  w_data
    START WITH ( id in (1,2) )
    CONNECT BY ( prnt = PRIOR id ) 
    ;



The components of the query are described below:

"START WITH"  The query needs to know how to locate the top record or records in the hierarchy. In this case it specifically states that the ID number must be in the list: (1,2). Any properly formed condition would work:
  • START WITH (prnt is NULL)                               -- those who don't have a parent listed
  • START WITH (nm='Albert' or nm='Rhonda')         -- a list of specific names
  • START WITH  (id in ( select id from w_data where prnt is NULL ))      -- even a query can work
You don't even have to 'start with' the top records in the source data. You can specify any record and the results will show that record as the top record and then trace the hierarchy down from their. Try the following and observe that you only get back the specified records and the ones below them.
  • START WITH (id in (3,9))
  • START WITH (nm in ('Linda'.'Reginald')
"CONNECT BY"  There needs to be a clear definition of how a parent record connects to a child record. The key word "PRIOR" is used to refer to the previous or 'higher' record in the hierarchy. In this case the "prnt" column of a record points to the PRIOR or higher record's "id" column. It doesn't matter which of the following you use, they both say the same thing.
  • CONNECT BY (prnt = PRIOR id)
  • CONNECT BY (PRIOR id = prnt)
However if you put the "PRIOR" in front of the "prnt" then you are trying to 'walk' the hierarchy backwards. (we'll get to an example of that).

"LEVEL" is a key word that indicates how deep that particular row will occur in the hierarchy. Note that the second column simply displays the "LEVEL" value where the first column uses the "LEVEL" value to print itself with additional spacing as it goes deeper & deeper into the hierarchy. This gives a nice visual component to the hierarchy that allows us to easily see its structure.

"SYS_CONNECT_BY_PATH"  is a hierarchical function that returns the complete path from the top record to the current record. In this case we are looking at a list of comma separated names. If we would like to see the path as "ID" numbers, we could use the following:
  • sys_connect_by_path(id,'/')




Here are a few other things you can do with simple Connect-By queries:

/* --- code --- */
    
      
     WITH  
       w_data  as 
         ( select   1 id, 'Albert'    nm, null prnt  from dual union all
           select   2   , 'Rhonda'      , null       from dual union all
           select   3   , 'Ralph'       ,    1       from dual union all
           select   4   , 'Erma'        ,    1       from dual union all
           select   5   , 'Leopold'     ,    2       from dual union all
           select   6   , 'Zanthia'     ,    3       from dual union all
           select   7   , 'Reginald'    ,    3       from dual union all
           select   8   , 'Hillary'     ,    4       from dual union all
           select   9   , 'Ronald'      ,    5       from dual union all
           select  10   , 'Deborah'     ,    5       from dual 
         ) 
     SELECT  lpad( level, level*3 ) as "Level"
          ,  id
          ,  nm
          ,  prnt
          ,  substr(sys_connect_by_path(nm,', '),3)  as "Path"
          ,  connect_by_root nm  as top_boss
          ,  connect_by_isleaf   as lowest
       from  w_data
     START WITH ( id in (1,2) )
     CONNECT BY ( prnt = PRIOR id  ) 
      order  siblings by id
     ;
   


First of all, we've fixed the leading comma in the "Path" column by using a substring function.

"CONNECT_BY_ROOT"  This opperator simply gives us the name, id number, or whatever it is we specify for the top-level record of the current hierarchy.

"CONNECT_BY_ISLEAF"  is a 0/1 flag that simply tells us if the current row is the lowest one on this branch of the hierarchy.

"ORDER SIBLINGS BY"  is just a modified version of the normal "Order By" clause. This allows us to order the sibling records of any parent by one of the attributes. In this example we may want to order the siblings by either id number or name. Notice below that Ralph and Erma are siblings and we have specified ordering them by id (3, 4). If we changed to ordering them by name then Erma's entire branch would come before Ralph's.




And finally, in the following example, we've selected all the "ID" numbers that are never listed as parents and used this as our "START WITH" values. We're going to walk-the-tree backwards, or turn the hierarchy upside-down. The only other thing we need to change is the "PRIOR" key word -- we'll now place it in front of the "prnt" column. This effectively reverses our hierarchy.

/* --- code --- */
          

     WITH  
       w_data  as 
         ( select   1 id, 'Albert'    nm, null prnt  from dual union all
           select   2   , 'Rhonda'      , null       from dual union all
           select   3   , 'Ralph'       ,    1       from dual union all
           select   4   , 'Erma'        ,    1       from dual union all
           select   5   , 'Leopold'     ,    2       from dual union all
           select   6   , 'Zanthia'     ,    3       from dual union all
           select   7   , 'Reginald'    ,    3       from dual union all
           select   8   , 'Hillary'     ,    4       from dual union all
           select   9   , 'Ronald'      ,    5       from dual union all
           select  10   , 'Deborah'     ,    5       from dual 
         ) 
     SELECT  lpad( level, level*3 ) as "Level"
          ,  id
          ,  nm
          ,  prnt
          ,  substr(sys_connect_by_path(nm,', '),3)  as "Path"
          ,  connect_by_root nm  as top_boss
          ,  connect_by_isleaf   as lowest
       from  w_data
     START WITH (id in (select id from w_data minus select prnt from w_data))
     CONNECT BY ( PRIOR prnt = id ) 
      order  siblings by id
     ;

   

Note the results. Lowest members now are listed at the #1 level and the path walks its way down to the original parents at the bottom.

1 comment:

  1. Gr8 Sharing dude... Thanks for giving such a detailed explanation on this connect by related stuffs.

    ReplyDelete