Thursday, July 18, 2013

Aggregated Lists: Decomposing an Aggregated List of Values (De-ListAgg)

De-ListAgg, DeListAgg, or Reversing Oracle's ListAgg() Function.

Given the following sample data set shown below, we want to decompose these comma delimited lists into separate values. Each row will become a group of rows containing the elements of its list. Notice that a list may have missing elemets (nothing between the commas). We will also assign a position sequence number to each element so that if needed, the aggregated list could be reconstructed in the same order from the resulting data.


   
  Convert comma separated, aggregated lists...

    GRP  LIST
    ---  ---------------------
    A    sam,joe,bob,ray
    B    (who),(where),(why)
    C    n,,,nulls,
    D    *?%,,=+@#,^$~!*?
    E    e,d,c,b,a
    F    We are done now
 
 
 
  Into a list of single elements.
 
    GRP  Position  Element
    ---  --------  ------------------
    A    1         sam
    A    2         joe
    A    3         bob
    A    4         ray
    ---  --------  ------------------
    B    1         (who)
    B    2         (where)
    B    3         (why)
    ---  --------  ------------------
    C    1         n
    C    2
    C    3
    C    4         nulls
    C    5
    ---  --------  ------------------
    D    1         *?%
    D    2
    D    3         =+@#
    D    4         ^$~!*?
    ---  --------  ------------------
    E    1         e
    E    2         d
    E    3         c
    E    4         b
    E    5         a
    ---  --------  ------------------
    F    1         We are done now
    ---  --------  ------------------

  We will even keep track of "Position" as shown.
  In that way, we will preserve the order of the 
  elements in the original aggregated list.
   





Here's The Data:

NOTE: The sample data set is given in a WITH clause. To run some of the queries below, you will need to paste the entire WITH clause onto the beginning of those queries.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   select  grp
        ,  list
     from  sample_data  
    order  by  grp ;






Now let's see if we can solve this.

Using the "Translate()" function we can null-out the commas -- then by comparing the before and after lengths, we can determine how many elements there are in each list.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  grp
        ,  list
        ,  (length(list)-length(translate(list,'x,','x'))+1) as val_cnt
     from  sample_data  
    order  by  grp ;




    
  Results:
      
    GRP  LIST                 VAL_CNT
    ---  -------------------  -------
    A    sam,joe,bob,ray            4 
    B    (who),(where),(why)        3 
    C    n,,,nulls,                 5 
    D    *?%,,=+@#,^$~!*?           4 
    E    e,d,c,b,a                  5 
    F    We are done now            1 





Expanding the Groups

We now want to create one row for each element in a group. The "ODCINumberList()" function generates a table with a row for each listed number -- in this example it creates 14 rows, more than we need for the given situation; the largest group contains only 5 elements. It does the job, but it would be nice to make this query work no matter what the size of the list (we'll get to that later). If we range-join to this table of numbers and only include the numbers that are equal-to or less-than the value count that we just found, then we get the proper number of rows for each group as shown. We'll call this the "Position" and we'll use it in the next step to cleanly extract the desired element from the list.


   
   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  b.column_value as position
     from  sample_data  a
     join  table(sys.ODCInumberList(1,2,3,4,5,6,7,8,9,10,11,12,13,14))  b  
       on  ( b.column_value <= (length(a.list)-length(translate(a.list,'~,','~'))+1) )
    order  by  a.grp, b.column_value ;




    
  Results:
      
    GRP  LIST                 POSITION
    ---  -------------------  --------
    A    sam,joe,bob,ray             1
    A    sam,joe,bob,ray             2
    A    sam,joe,bob,ray             3
    A    sam,joe,bob,ray             4
    B    (who),(where),(why)         1
    B    (who),(where),(why)         2
    B    (who),(where),(why)         3
    C    n,,,nulls,                  1
    C    n,,,nulls,                  2
    C    n,,,nulls,                  3
    C    n,,,nulls,                  4
    C    n,,,nulls,                  5
    D    *?%,,=+@#,^$~!*?            1
    D    *?%,,=+@#,^$~!*?            2
    D    *?%,,=+@#,^$~!*?            3
    D    *?%,,=+@#,^$~!*?            4
    E    e,d,c,b,a                   1
    E    e,d,c,b,a                   2
    E    e,d,c,b,a                   3
    E    e,d,c,b,a                   4
    E    e,d,c,b,a                   5
    F    We are done now             1





Extracting the Elements:

Now by using this newly found position, we can use the "Instr()" and "Substr()" functions to extract the given element from each list. We now have the desired solution -- this is one way to get the results.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  b.column_value as position
        ,  substr
             ( a.list
             , instr( ','||a.list||',', ',', 1, b.column_value) 
             , instr( ','||a.list||',', ',', 1, b.column_value+1) 
             - instr( ','||a.list||',', ',', 1, b.column_value)-1
             ) as elmnt
     from  sample_data  a
     join  table(sys.ODCInumberList(1,2,3,4,5,6,7,8,9,10,11,12,13,14))  b  
       on  ( b.column_value <= (length(a.list)-length(translate(a.list,'~,','~'))+1) )
    order  by  a.grp, b.column_value ;




    
  Results:
  
    GRP  LIST                 POSITION  ELMNT
    ---  -------------------  --------  -----------------
    A    sam,joe,bob,ray             1  sam
    A    sam,joe,bob,ray             2  joe
    A    sam,joe,bob,ray             3  bob
    A    sam,joe,bob,ray             4  ray
    ---  -------------------  --------  -----------------
    B    (who),(where),(why)         1  (who)
    B    (who),(where),(why)         2  (where)
    B    (who),(where),(why)         3  (why)
    ---  -------------------  --------  -----------------
    C    n,,,nulls,                  1  n
    C    n,,,nulls,                  2
    C    n,,,nulls,                  3
    C    n,,,nulls,                  4  nulls
    C    n,,,nulls,                  5
    ---  -------------------  --------  -----------------
    D    *?%,,=+@#,^$~!*?            1  *?%
    D    *?%,,=+@#,^$~!*?            2
    D    *?%,,=+@#,^$~!*?            3  =+@#
    D    *?%,,=+@#,^$~!*?            4  ^$~!*?
    ---  -------------------  --------  -----------------
    E    e,d,c,b,a                   1  e
    E    e,d,c,b,a                   2  d
    E    e,d,c,b,a                   3  c
    E    e,d,c,b,a                   4  b
    E    e,d,c,b,a                   5  a
    ---  -------------------  --------  -----------------
    F    We are done now             1  We are done now
    ---  -------------------  --------  -----------------





Extracting the Elements using Regular Expressions:

I much prefer to use the "RegExp_Substr()" function to extract the elements. The search patern ".?([^,]*)" searches for an optional comma followed by non-comma characters. It then returns the non-comma charaters as the element.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  b.column_value as position
        ,  regexp_substr( a.list, ',?([^,]*)', 1, b.column_value, 'i', 1 )  as elmnt
     from  sample_data  a
     join  table(sys.ODCInumberList(1,2,3,4,5,6,7,8,9,10,11,12,13,14))  b  
       on  ( b.column_value <= (length(a.list)-length(translate(a.list,'~,','~'))+1) )
    order  by  a.grp, b.column_value ;





Using Connect-By to Generate the Positions

Instead of the "ODCINumberList()" function we can use the Connect-By functionality. This still has the same problem of having to hard-code an upper limit to the number of elements.


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  a.list
        ,  seq as position
        ,  regexp_substr( a.list, ',?([^,]*)', 1, seq, 'i', 1 )  as elmnt
     from  sample_data  a
     join  ( select level seq from dual connect by level <= 500 )  b  
       on  ( regexp_count(a.list,',')+1 >= seq )
    order  by  a.grp, seq ;





Improved "Connect-By" solution

The Connect-By portion of the query can be modified to fix the upper-limit issue. With this solution you don't need to consider the maximum number of elements in the lists or hard-code them into the query.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   select  grp
        ,  level as pos
        ,  regexp_substr( ','||list, ',([^,]*)', 1, level, 'i', 1 )  as elmnt
     from  sample_data  
           CONNECT BY  
             ( regexp_count(list,',')+1 >= level    and
               prior grp = grp                      and
               prior dbms_random.value is not null       
             )
    order  by  grp, level ;




    
  Results:
      
    GRP  POS  ELMNT
    ---  ---  ----------------
    A      1  sam
    A      2  joe
    A      3  bob
    A      4  ray
    B      1  (who)
    B      2  (where)
    B      3  (why)
    C      1  n
    C      2
    C      3
    C      4  nulls
    C      5
    D      1  *?%
    D      2
    D      3  =+@#
    D      4  ^$~!*?
    E      1  e
    E      2  d
    E      3  c
    E      4  b
    E      5  a
    F      1  We are done now
    ---  ---  ----------------





Alternate Solution #1 -- Recursive WITH Clause

The list of numbers generated by the connect-by clause can be replaced with a recursive WITH clause (subquery factoring). This clause uses the RegExp_Count() function to determine the number of elements; it then starts with that number and counts down, producing a list of numbers for each group that will then be joined to the origial data.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   , w_rowgen ( grp, seq )  as
       ( select   grp, regexp_count(list,',')+1   from  sample_data       
         UNION ALL
         select   grp, seq-1   from  w_rowgen  where seq > 1
       )
   select  a.grp
        ,  b.seq
        ,  regexp_substr( a.list, ',?([^,]*)', 1, b.seq, 'i', 1 )  as elmnt
     from  sample_data  a
     join  w_rowgen     b  on ( a.grp = b.grp )
    order  by  a.grp, b.seq ;





Alternate Solution #2 -- Fully use the Recursive WITH Clause

And, if we're going to use the Recursive-WITH, we might as well use it to extract the elements from each group's list.


   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         --------------------------------------------------------
       )
   , w_rowgen ( grp, list, seq, elmnt )  as
       ( select  grp
              ,  list
              ,  regexp_count(list,',')+1  
              ,  regexp_substr( list, '^.*?,?([^,]*)$' , 1, 1, 'i', 1 )
           from  sample_data       
         UNION ALL
         select  grp
              ,  list
              ,  seq-1    
              ,  regexp_substr( list, ',?([^,]*)', 1, seq-1, 'i', 1 )
           from  w_rowgen  where seq > 1
       )
   select  grp, seq, elmnt
     from  w_rowgen    
    order  by  grp, seq ;
---------------------------------------------------------------------
-- Update: (2016-04-07)
--
-- The following, in my oppinion, is a better solution. There are
-- slight changes from the previous version. It will now handle
-- leading and trailing NULL values in the lists properly.
---------------------------------------------------------------------
   WITH
     sample_data  as
       ( select  null  as grp
              ,  null  as list    
           from  dual  where 1=0
         --------------------------------------------------------
         union all select  'A',  'sam,joe,bob,ray'      from dual 
         union all select  'B',  '(who),(where),(why)'  from dual 
         union all select  'C',  'n,,,nulls,'           from dual 
         union all select  'D',  '*?%,,=+@#,^$~!*?'     from dual 
         union all select  'E',  'e,d,c,b,a'            from dual 
         union all select  'F',  'We are done now'      from dual 
         union all select  'm',  ''                     from dual 
         union all select  'n',  ','                    from dual 
         union all select  'o',  ',,'                   from dual 
         union all select  'p',  ',x,'                  from dual 
         union all select  'q',  ',aa,,bb'              from dual 
         union all select  'r',  'mmm,oo,'              from dual 
         --------------------------------------------------------
       )
   , w_rowgen (grp, list, seq, val)  as
       ( select  grp,  list
              ,  regexp_count(list, ',')+1  
              ,  regexp_substr(list, ',?([^,]*)$', 1, 1, 'i', 1)
           from  sample_data       
         UNION ALL
         select  grp,  list
              ,  seq-1    
              ,  regexp_substr(list, '([^,]*),', 1, seq-1, 'i', 1)
           from  w_rowgen  
          where  seq > 1
       )
   select  grp, seq, val
     from  w_rowgen    
    order  by  grp, seq ;














Alternate Solution #3 -- Using Oracle's XML Functionality

Finally, we could use Oracle's XML functionality to extract the elements:


   (-- Prepend the complete WITH clause data here to run the query. --)
   select  a.grp
        ,  dense_rank() over ( partition by a.list  order by rownum )  as seq
        ,  trim( x.column_value.extract( 'e/text()' ) )  as elmnt
     from  sample_data  a
     join  table 
             ( xmlsequence
                 ( xmltype
                     (   '<e><e>' 
                     ||  replace( a.list, ',', '</e><e>' )
                     ||  '</e></e>' ).extract ( 'e/e' ) 
                 )
             )  x  
       on  ( 1=1 )
    order  by grp, seq ;






That's it... See my other post on creating Aggregated Lists from expanded elements.

8 comments:

  1. I've been surfing online more than 2 hours today, yet I never found
    any interesting article like yours. It's pretty worth enough for me.

    In my view, if all web owners and bloggers made good content as
    you did, the net will be much more useful than ever before.


    Look into my page ... holztreppen

    ReplyDelete
  2. This is an excellent resource!

    ReplyDelete
  3. This site was... how do I say it? Relevant!! Finally I've found something
    that helped me. Kudos!

    Look at my page :: klimatyzacja

    ReplyDelete
  4. Excellent work, Steve. You really helped me out.

    ReplyDelete
  5. Really really useful, this has helped me immensely. Thank you

    ReplyDelete
  6. Fantastic article - many thanks Steve!

    ReplyDelete