Friday, July 19, 2013

Aggregated Lists: Creating Aggregated List Strings - ListAgg() Function.


Oracle's ListAgg() Function -- Or Other Methods.

Given the following sample data shown below, we want to aggregate the elements into a single, comma separated list for each group. The position is given to define the order in which the elements will appear in the list. Also, the NULL elements should not be ignored -- they should show up as nothing between the commas that define their potential placement in the list.


   
  Convert groups of elements into
  comma separated, aggregated lists...

    GRP  POS  ELMNT
    ---  ---  -----
    A      1  bob
    A      2
    A      3
    A      4  joe
    A      5
    ---  ---  -----
    B      1  342
    B      2  0
    B      3  47
    ---  ---  -----
    C      1  # #
    ---  ---  -----
    D      1  big
    D      2  foot
    ---  ---  -----
    E      1  a
    E      2  b
    E      3
    E      4  c
    E      5  d
    E      6  e
    ---  ---  -----


  For each group above, combine all the
  elements into a single, comma separated
  list of elements in the order given by
  the position.
  
    GRP  LIST
    ---  -----------
    A    bob,,,joe,
    B    342,0,47
    C    # #
    D    big,foot
    E    a,b,,c,d,e
    ---  -----------
    



Here's The Data:

NOTE: The sample data set is given in a WITH clause. To run some of the queries below, you will need to paste the entire WITH clause onto the beginning of those queries.


   WITH 
     sample_data   as
       ( select  NULL  as grp
              ,  NULL  as pos
              ,  NULL  as elmnt
           from  dual  where 1=0
         --------------------------------------
         union select  'A', 1, 'bob'  from dual
         union select  'A', 2,  null  from dual
         union select  'A', 3,  null  from dual
         union select  'A', 4, 'joe'  from dual
         union select  'A', 5,  null  from dual
         union select  'B', 1, '342'  from dual
         union select  'B', 2, '0'    from dual
         union select  'B', 3, '47'   from dual
         union select  'C', 1, '# #'  from dual
         union select  'D', 1, 'big'  from dual
         union select  'D', 2, 'foot' from dual
         union select  'E', 1, 'a'    from dual
         union select  'E', 2, 'b'    from dual
         union select  'E', 3,  null  from dual
         union select  'E', 4, 'c'    from dual
         union select  'E', 5, 'd'    from dual
         union select  'E', 6, 'e'    from dual
         --------------------------------------
       )
   select  grp
        ,  pos
        ,  elmnt
     from  sample_data
    order  by  grp, pos ;
    



    
  The Data:
      
    GRP  POS  ELMNT
    ---  ---  -----
    A      1  bob
    A      2
    A      3
    A      4  joe
    A      5
    B      1  342
    B      2  0
    B      3  47
    C      1  # #
    D      1  big
    D      2  foot
    E      1  a
    E      2  b
    E      3
    E      4  c
    E      5  d
    E      6  e
    ---  ---  -----




Using ListAgg() as an Aggregate (Group-By) Function

Notice how the simple use of ListAgg() ignores the NULL values. To recognize the NULLs we first substitute some dummy value for the null "~"; then outside of the ListAgg() function we replace the dummy value with NULL. In this way we can overcome the limitation of Oracl's ListAgg() function.


   
   (-- Prepend the complete WITH clause data here to run the query. --)
   select  grp
        ,  ListAgg( elmnt, ',' ) within group (order by pos)  as list
     from  sample_data 
    group  by  grp ;




     
  First Results:
       
    GRP  LIST
    ---  ---------
    A    bob,joe
    B    342,0,47
    C    # #
    D    big,foot
    E    a,b,c,d,e
 



   (-- Prepend the complete WITH clause data here to run the query. --)
   select  grp
        ,  replace
             ( ListAgg(nvl(elmnt,'~'), ',') within group (order by pos)
             , '~', null 
             )  as list
     from  sample_data 
    group  by  grp ;




     
  Correct Results:
  
    GRP  LIST
    ---  ----------
    A    bob,,,joe,
    B    342,0,47
    C    # #
    D    big,foot
    E    a,b,,c,d,e
 



As a Side Note...

The ListAgg() function can be used as an "Analytic" function. The syntax is a little different from other analytic functions, but the results look like you would expect.


   
   (-- Prepend the complete WITH clause data here to run the query. --)
   select  grp
        ,  pos
        ,  elmnt
        ,  replace
             ( listagg(nvl(elmnt,'~'), ',') 
                 within group (order by pos) 
                 over ( partition by grp )
             , '~'
             , NULL 
             )  as list
     from  sample_data 
    order  by  grp, pos ;




    
  Results:

    GRP  POS  ELMNT  LIST
    ---  ---  -----  -----------
    A      1  bob    bob,,,joe,
    A      2         bob,,,joe,
    A      3         bob,,,joe,
    A      4  joe    bob,,,joe,
    A      5         bob,,,joe,
    ---  ---  -----  -----------
    B      1  342    342,0,47
    B      2  0      342,0,47
    B      3  47     342,0,47
    ---  ---  -----  -----------
    C      1  # #    # #
    ---  ---  -----  -----------
    D      1  big    big,foot
    D      2  foot   big,foot
    ---  ---  -----  -----------
    E      1  a      a,b,,c,d,e
    E      2  b      a,b,,c,d,e
    E      3         a,b,,c,d,e
    E      4  c      a,b,,c,d,e
    E      5  d      a,b,,c,d,e
    E      6  e      a,b,,c,d,e
    ---  ---  -----  -----------




Recursive WITH Instead of ListAgg()

Oracle's Recursive WITH (subquery factoring) can be used instead of the ListAgg() function. The recursive part of the WITH clause is the "w_agg" portion; the list of parameters that immediately follow the "w_agg" tells you that it's different from the other parts of a normal WITH clause. The pre-Union clause of the Recursive WITH clause returns the first element of each group allong with the maximum number of elements in each group. The post-Union clause joins the running data with the "w_agg" data to add on each additional element.


   
   WITH 
     sample_data   as
       ( select  NULL  as grp
              ,  NULL  as pos
              ,  NULL  as elmnt
           from  dual  where 1=0
         --------------------------------------
         union select  'A', 1, 'bob'  from dual
         union select  'A', 2,  null  from dual
         union select  'A', 3,  null  from dual
         union select  'A', 4, 'joe'  from dual
         union select  'A', 5,  null  from dual
         union select  'B', 1, '342'  from dual
         union select  'B', 2, '0'    from dual
         union select  'B', 3, '47'   from dual
         union select  'C', 1, '# #'  from dual
         union select  'D', 1, 'big'  from dual
         union select  'D', 2, 'foot' from dual
         union select  'E', 1, 'a'    from dual
         union select  'E', 2, 'b'    from dual
         union select  'E', 3,  null  from dual
         union select  'E', 4, 'c'    from dual
         union select  'E', 5, 'd'    from dual
         union select  'E', 6, 'e'    from dual
         --------------------------------------
       )
   , w_agg ( grp, pos, list, mpos ) as
       ( select  grp, pos
              ,  cast(elmnt as varchar2(2000))
              ,  mpos
           from  ( select  grp, pos, elmnt
                        ,  max(pos) over (partition by grp)  as mpos  
                     from  sample_data
                 )
          where  pos = 1
       UNION ALL
         select  a.grp
              ,  b.pos
              ,  a.list || ',' || b.elmnt
              ,  a.mpos
           from  w_agg        a
           join  sample_data  b  
             on  ( a.grp = b.grp  and  a.pos = (b.pos-1) ) 
       )
   select  grp, list
     from  w_agg
    where  pos = mpos
    order  by  grp ;



    
  Results:
      
    GRP  LIST
    ---  ----------
    A    bob,,,joe,
    B    342,0,47
    C    # #
    D    big,foot
    E    a,b,,c,d,e




A "Connect-By" Solution

The "Connect-By" feature of Oracle's SQL can also be used to process the solution.


   
   (-- Prepend the complete WITH clause data here to run the query. --)
   select  grp
        ,  list
     from  ( select  grp
                  ,  pos
                  ,  max(pos) over (partition by grp) as mpos
                  ,  substr(sys_connect_by_path(elmnt, ','),2) as list
               from  sample_data
            connect  by ( prior grp = grp  and  prior pos = pos-1 )
                     start with  pos = 1
           )
    where  pos = mpos
    order  by  grp ;




Another "Connect-By" Solution:

This solution assumes that you can just take the longest list in each group as the final answer. The previous solution explicitly found the final list that was constructed for each group ( pos = mpos ).


   
   (-- Prepend the complete WITH clause data here to run the query. --)
   select  grp
        ,  substr( max(sys_connect_by_path(elmnt, ',')), 2 ) as list
     from  sample_data
    group  by  grp
  connect  by ( prior grp = grp  and  prior pos = pos-1 )
           start with  pos = 1
    order  by  grp ;





That's it... See my other post on decomposing Aggregated Lists into individual elements.

Thursday, July 18, 2013

Aggregated Lists: Decomposing an Aggregated List of Values (De-ListAgg)

De-ListAgg, DeListAgg, or Reversing Oracle's ListAgg() Function.

Given the following sample data set shown below, we want to decompose these comma delimited lists into separate values. Each row will become a group of rows containing the elements of its list. Notice that a list may have missing elemets (nothing between the commas). We will also assign a position sequence number to each element so that if needed, the aggregated list could be reconstructed in the same order from the resulting data.


   
  Convert comma separated, aggregated lists...

    GRP  LIST
    ---  ---------------------
    A    sam,joe,bob,ray
    B    (who),(where),(why)
    C    n,,,nulls,
    D    *?%,,=+@#,^$~!*?
    E    e,d,c,b,a
    F    We are done now
 
 
 
  Into a list of single elements.
 
    GRP  Position  Element
    ---  --------  ------------------
    A    1         sam
    A    2         joe
    A    3         bob
    A    4         ray
    ---  --------  ------------------
    B    1         (who)
    B    2         (where)
    B    3         (why)
    ---  --------  ------------------
    C    1         n
    C    2
    C    3
    C    4         nulls
    C    5
    ---  --------  ------------------
    D    1         *?%
    D    2
    D    3         =+@#
    D    4         ^$~!*?
    ---  --------  ------------------
    E    1         e
    E    2         d
    E    3         c
    E    4         b
    E    5         a
    ---  --------  ------------------
    F    1         We are done now
    ---  --------  ------------------

  We will even keep track of "Position" as shown.
  In that way, we will preserve the order of the 
  elements in the original aggregated list.
   





Here's The Data:

NOTE: The sample data set is given in a WITH clause. To run some of the queries below, you will need to paste the entire WITH clause onto the beginning of those queries.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   select  grp
        ,  list
     from  sample_data  
    order  by  grp ;






Now let's see if we can solve this.

Using the "Translate()" function we can null-out the commas -- then by comparing the before and after lengths, we can determine how many elements there are in each list.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  grp
        ,  list
        ,  (length(list)-length(translate(list,'x,','x'))+1) as val_cnt
     from  sample_data  
    order  by  grp ;




    
  Results:
      
    GRP  LIST                 VAL_CNT
    ---  -------------------  -------
    A    sam,joe,bob,ray            4 
    B    (who),(where),(why)        3 
    C    n,,,nulls,                 5 
    D    *?%,,=+@#,^$~!*?           4 
    E    e,d,c,b,a                  5 
    F    We are done now            1 





Expanding the Groups

We now want to create one row for each element in a group. The "ODCINumberList()" function generates a table with a row for each listed number -- in this example it creates 14 rows, more than we need for the given situation; the largest group contains only 5 elements. It does the job, but it would be nice to make this query work no matter what the size of the list (we'll get to that later). If we range-join to this table of numbers and only include the numbers that are equal-to or less-than the value count that we just found, then we get the proper number of rows for each group as shown. We'll call this the "Position" and we'll use it in the next step to cleanly extract the desired element from the list.


   
   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  b.column_value as position
     from  sample_data  a
     join  table(sys.ODCInumberList(1,2,3,4,5,6,7,8,9,10,11,12,13,14))  b  
       on  ( b.column_value <= (length(a.list)-length(translate(a.list,'~,','~'))+1) )
    order  by  a.grp, b.column_value ;




    
  Results:
      
    GRP  LIST                 POSITION
    ---  -------------------  --------
    A    sam,joe,bob,ray             1
    A    sam,joe,bob,ray             2
    A    sam,joe,bob,ray             3
    A    sam,joe,bob,ray             4
    B    (who),(where),(why)         1
    B    (who),(where),(why)         2
    B    (who),(where),(why)         3
    C    n,,,nulls,                  1
    C    n,,,nulls,                  2
    C    n,,,nulls,                  3
    C    n,,,nulls,                  4
    C    n,,,nulls,                  5
    D    *?%,,=+@#,^$~!*?            1
    D    *?%,,=+@#,^$~!*?            2
    D    *?%,,=+@#,^$~!*?            3
    D    *?%,,=+@#,^$~!*?            4
    E    e,d,c,b,a                   1
    E    e,d,c,b,a                   2
    E    e,d,c,b,a                   3
    E    e,d,c,b,a                   4
    E    e,d,c,b,a                   5
    F    We are done now             1





Extracting the Elements:

Now by using this newly found position, we can use the "Instr()" and "Substr()" functions to extract the given element from each list. We now have the desired solution -- this is one way to get the results.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  b.column_value as position
        ,  substr
             ( a.list
             , instr( ','||a.list||',', ',', 1, b.column_value) 
             , instr( ','||a.list||',', ',', 1, b.column_value+1) 
             - instr( ','||a.list||',', ',', 1, b.column_value)-1
             ) as elmnt
     from  sample_data  a
     join  table(sys.ODCInumberList(1,2,3,4,5,6,7,8,9,10,11,12,13,14))  b  
       on  ( b.column_value <= (length(a.list)-length(translate(a.list,'~,','~'))+1) )
    order  by  a.grp, b.column_value ;




    
  Results:
  
    GRP  LIST                 POSITION  ELMNT
    ---  -------------------  --------  -----------------
    A    sam,joe,bob,ray             1  sam
    A    sam,joe,bob,ray             2  joe
    A    sam,joe,bob,ray             3  bob
    A    sam,joe,bob,ray             4  ray
    ---  -------------------  --------  -----------------
    B    (who),(where),(why)         1  (who)
    B    (who),(where),(why)         2  (where)
    B    (who),(where),(why)         3  (why)
    ---  -------------------  --------  -----------------
    C    n,,,nulls,                  1  n
    C    n,,,nulls,                  2
    C    n,,,nulls,                  3
    C    n,,,nulls,                  4  nulls
    C    n,,,nulls,                  5
    ---  -------------------  --------  -----------------
    D    *?%,,=+@#,^$~!*?            1  *?%
    D    *?%,,=+@#,^$~!*?            2
    D    *?%,,=+@#,^$~!*?            3  =+@#
    D    *?%,,=+@#,^$~!*?            4  ^$~!*?
    ---  -------------------  --------  -----------------
    E    e,d,c,b,a                   1  e
    E    e,d,c,b,a                   2  d
    E    e,d,c,b,a                   3  c
    E    e,d,c,b,a                   4  b
    E    e,d,c,b,a                   5  a
    ---  -------------------  --------  -----------------
    F    We are done now             1  We are done now
    ---  -------------------  --------  -----------------





Extracting the Elements using Regular Expressions:

I much prefer to use the "RegExp_Substr()" function to extract the elements. The search patern ".?([^,]*)" searches for an optional comma followed by non-comma characters. It then returns the non-comma charaters as the element.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  b.column_value as position
        ,  regexp_substr( a.list, ',?([^,]*)', 1, b.column_value, 'i', 1 )  as elmnt
     from  sample_data  a
     join  table(sys.ODCInumberList(1,2,3,4,5,6,7,8,9,10,11,12,13,14))  b  
       on  ( b.column_value <= (length(a.list)-length(translate(a.list,'~,','~'))+1) )
    order  by  a.grp, b.column_value ;





Using Connect-By to Generate the Positions

Instead of the "ODCINumberList()" function we can use the Connect-By functionality. This still has the same problem of having to hard-code an upper limit to the number of elements.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  seq as position
        ,  regexp_substr( a.list, ',?([^,]*)', 1, seq, 'i', 1 )  as elmnt
     from  sample_data  a
     join  ( select level seq from dual connect by level <= 500 )  b  
       on  ( regexp_count(a.list,',')+1 >= seq )
    order  by  a.grp, seq ;





Improved "Connect-By" solution

The Connect-By portion of the query can be modified to fix the upper-limit issue. With this solution you don't need to consider the maximum number of elements in the lists or hard-code them into the query.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   select  grp
        ,  level as pos
        ,  regexp_substr( ','||list, ',([^,]*)', 1, level, 'i', 1 )  as elmnt
     from  sample_data  
           CONNECT BY  
             ( regexp_count(list,',')+1 >= level    and
               prior grp = grp                      and
               prior dbms_random.value is not null       
             )
    order  by  grp, level ;




    
  Results:
      
    GRP  POS  ELMNT
    ---  ---  ----------------
    A      1  sam
    A      2  joe
    A      3  bob
    A      4  ray
    B      1  (who)
    B      2  (where)
    B      3  (why)
    C      1  n
    C      2
    C      3
    C      4  nulls
    C      5
    D      1  *?%
    D      2
    D      3  =+@#
    D      4  ^$~!*?
    E      1  e
    E      2  d
    E      3  c
    E      4  b
    E      5  a
    F      1  We are done now
    ---  ---  ----------------





Alternate Solution #1 -- Recursive WITH Clause

The list of numbers generated by the connect-by clause can be replaced with a recursive WITH clause (subquery factoring). This clause uses the RegExp_Count() function to determine the number of elements; it then starts with that number and counts down, producing a list of numbers for each group that will then be joined to the origial data.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   , w_rowgen ( grp, seq )  as
       ( select   grp, regexp_count(list,',')+1   from  sample_data       
         UNION ALL
         select   grp, seq-1   from  w_rowgen  where seq > 1
       )
   select  a.grp
        ,  b.seq
        ,  regexp_substr( a.list, ',?([^,]*)', 1, b.seq, 'i', 1 )  as elmnt
     from  sample_data  a
     join  w_rowgen     b  on ( a.grp = b.grp )
    order  by  a.grp, b.seq ;





Alternate Solution #2 -- Fully use the Recursive WITH Clause

And, if we're going to use the Recursive-WITH, we might as well use it to extract the elements from each group's list.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   , w_rowgen ( grp, list, seq, elmnt )  as
       ( select  grp
              ,  list
              ,  regexp_count(list,',')+1  
              ,  regexp_substr( list, '^.*?,?([^,]*)$' , 1, 1, 'i', 1 )
           from  sample_data       
         UNION ALL
         select  grp
              ,  list
              ,  seq-1    
              ,  regexp_substr( list, ',?([^,]*)', 1, seq-1, 'i', 1 )
           from  w_rowgen  where seq > 1
       )
   select  grp, seq, elmnt
     from  w_rowgen    
    order  by  grp, seq ;
---------------------------------------------------------------------
-- Update: (2016-04-07)
--
-- The following, in my oppinion, is a better solution. There are
-- slight changes from the previous version. It will now handle
-- leading and trailing NULL values in the lists properly.
---------------------------------------------------------------------
   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         union all select  'm',  ''                     from dual 
         union all select  'n',  ','                    from dual 
         union all select  'o',  ',,'                   from dual 
         union all select  'p',  ',x,'                  from dual 
         union all select  'q',  ',aa,,bb'              from dual 
         union all select  'r',  'mmm,oo,'              from dual 
         --------------------------------------------------------
       )
   , w_rowgen (grp, list, seq, val)  as
       ( select  grp,  list
              ,  regexp_count(list, ',')+1  
              ,  regexp_substr(list, ',?([^,]*)$', 1, 1, 'i', 1)
           from  sample_data       
         UNION ALL
         select  grp,  list
              ,  seq-1    
              ,  regexp_substr(list, '([^,]*),', 1, seq-1, 'i', 1)
           from  w_rowgen  
          where  seq > 1
       )
   select  grp, seq, val
     from  w_rowgen    
    order  by  grp, seq ;














Alternate Solution #3 -- Using Oracle's XML Functionality

Finally, we could use Oracle's XML functionality to extract the elements:


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  dense_rank() over ( partition by a.list  order by rownum )  as seq
        ,  trim( x.column_value.extract( 'e/text()' ) )  as elmnt
     from  sample_data  a
     join  table 
             ( xmlsequence
                 ( xmltype
                     (   '<e><e>' 
                     ||  replace( a.list, ',', '</e><e>' )
                     ||  '</e></e>' ).extract ( 'e/e' ) 
                 )
             )  x  
       on  ( 1=1 )
    order  by grp, seq ;






That's it... See my other post on creating Aggregated Lists from expanded elements.