Friday, January 4, 2013

WITH Clause: Sub-queries and Sub-query Factoring (Great for sample or test data!)

The Sub-Query


One of the features of SQL that really makes it sing is its ability to query a query -- or in other words: to make use of sub-queries. I've seen lots of queries with nested sub-queries that could actually be written as a single query with no sub-queries -- here's a quick example:

-----> Example SQL Code Listing-#1:


   select  patient_id, order_id
     from  ( select  patient_id, order_id
               from  ( select  patient_id, admit_dt, order_id
                         from  ( select  patient_id, admit_dt, facility, order_id
                                   from  inpatient_admit
                                  where  diagnosis_cd in ( 257, 386, 1074, 2793 )
                               ) 
                        where  facility = 1389       
                     )
              where  admit_dt >= date '2008-01-01'
                and  admit_dt <  date '2009-01-01'
           )
    where  order_id is not NULL
      and  patient_id between 580000 
                          and 589999
    order  by  patient_id, order_id ;
   
------------------------
-- Written better as:
------------------------
   
   select  patient_id, order_id
     from  inpatient_admit
    where  diagnosis_cd in ( 257, 386, 1074, 2793 )
      and  facility = 1389
      and  admit_dt >= date '2008-01-01'
      and  admit_dt <  date '2009-01-01'
      and  order_id is not NULL
      and  patient_id between 580000 
                          and 589999
    order  by  patient_id, order_id ;



In cases like this, there's no need for a sub-query (or nested sub-queries 3 levels deep). But, in many queries they become very useful. They can be used just to make the query more understandable, or they can be completely necessary to the functionality of a query.

The WITH Clause


One of the nicest uses of sub-queries is in the WITH clause. This is also referred to as "Sub-Query Factoring". The WITH clause allows you to take a sub-query and move it, out of the way, to the top of your query. There you give it a name and refer to it, in the main query, just as you would refer to a table. This can greatly improve the readability of a query but where a sub-query is used multiple times in a main query, this can improve performance -- the WITH clause query runs once and is stored in a temporary table. It is then used and reused in the main query as needed.

Here's an example of a query that uses the same sub-query twice. It is first used to pull a cohort of records from the "visit" table. It is then used a second time to insure that the "recommended_by" person of each selected record is not some member of the cohort. In the second version of the query, the sub-query is "factored out" into the WITH clause and given the name of "cohort". That WITH-Clause sub-query is then used twice in the main query. The result is a simpler, neater, main query that would only need to run the sub-query one time.

-----> Example SQL Code Listing-#2:


   select  v.start_dt
        ,  v.person_id
        ,  v.person_name
        ,  v.description
        ,  v.recommended_by
     from  visit  v
     join  ( select  person_id
               from  population
              where  category_flg = 3
                and  birth_dt between date '1970-01-01'
                                  and date '1970-12-31'
                and  gender = 'M'
           )  c  on ( v.person_id = c.person_id )
    where  v.recommended_by not in
             ( select  person_id
                 from  population
                where  category_flg = 3
                  and  birth_dt between date '1970-01-01'
                                    and date '1970-12-31'
                  and  gender = 'M'
             )
      and  v.active = 'Y' 
    order  by  v.person_id ;
   
   
------------------------
-- Using the WITH Clause
------------------------
   
   WITH  cohort  as
     ( select  person_id
         from  population
        where  category_flg = 3
          and  birth_dt between date '1970-01-01'
                            and date '1970-12-31'
          and  gender = 'M'
     )
   select  v.start_dt
        ,  v.person_id
        ,  v.person_name
        ,  v.description
        ,  v.recommended_by
     from  visit   v
     join  cohort  c  on ( v.person_id = c.person_id )
    where  v.recommended_by not in
             ( select person_id from cohort )
      and  v.active = 'Y' 
    order  by  v.person_id ;
   



Sample or Test data.


One nice use of sub-queries is that you can quickly create a sample data set that will allow you to try out or learn how to use a particular function or SQL technique. In this case, I've created a 5 row "table" so that I can figure out how to use a simple "group by" aggregate query and the "sum( )" function.

Notice that in the FROM clause I select each row of sample data from dual and use the "Union" to tack them all together. I need to have "column names" assigned to the invented columns, but I only need to include them on the  1st row.  These 5 rows of data enclosed by parentheses act just like a table in the query. I can quickly change any value or add new rows to see how it affects my query logic.

-----> Example SQL Code Listing-#3:


   select  usr_type
        ,  sum(credit_amt)  as total_group_credit
     from  (       select  12 as user_id, date '2010-04-28' as start_dt, 'A' as usr_type, 234.76 as credit_amt  from dual
             union select  13           , date '2011-11-11'            , 'B'            ,  78.82                from dual
             union select  15           , date '2004-07-24'            , 'A'            , 138.12                from dual
             union select  16           , date '2012-02-09'            , 'C'            ,  24.47                from dual
             union select  17           , date '2009-12-15'            , 'C'            , 195.08                from dual
           ) 
    group  by  usr_type
    order  by  usr_type ;



In reality, I rarely put sample or test data in the FROM clause. I put it at the beginning in the WITH clause -- and the query then looks like a normal query pulling data from a named table of data:

-----> Example SQL Code Listing-#4:


   with sample_data as
     (       select  12 as user_id, date '2010-04-28' as start_dt, 'A' as usr_type, 234.76 as credit_amt  from dual
       union select  13           , date '2011-11-11'            , 'B'            ,  78.82                from dual
       union select  15           , date '2004-07-24'            , 'A'            , 138.12                from dual
       union select  16           , date '2012-02-09'            , 'C'            ,  24.47                from dual
       union select  17           , date '2009-12-15'            , 'C'            , 195.08                from dual
     ) 
   select  usr_type
        ,  sum(credit_amt)  as total_group_credit
     from  sample_data
    group  by  usr_type
    order  by  usr_type ;




As a final touch, when I'm creating examples or playing with SQL logic, I often put all the column names at the top in the first row of data having NULL values. The "where 1=0" insures that the NULL values row doesn't show up in my sample data (but the columns now all have names), and the actual rows of data are all 'UNIONed' in below.

-----> Example SQL Code Listing-#5:


   with  
     sample_data  as
       ( select  NULL  as user_id
              ,  NULL  as start_dt
              ,  NULL  as usr_type
              ,  NULL  as credit_amt    from dual  where 1=0
         ------------
         union select  12, date '2010-04-28', 'A', 234.76  from dual
         union select  13, date '2011-11-11', 'B',  78.82  from dual
         union select  15, date '2004-07-24', 'A', 138.12  from dual
         union select  16, date '2012-02-09', 'C',  24.47  from dual
         union select  17, date '2009-12-15', 'C', 195.08  from dual
         ------------
       ) 
   select  *
     from  sample_data ;



Running the above data-sample and query results in output that looks like it came from a table:










Sub-queries are great. If you can put them in the WITH clause they usually look much nicer. And the WITH clause is a great way to create simple test data or share SQL examples with others.

No comments:

Post a Comment