For some of the simple ones, the functions ROUND( ) and TRUNC( ) are very effective. For example, if you only want to look at your date-time values to the nearest hour you could group-by the ROUND(dts, 'HH' ). Or if you want to eliminate the time component of the dates and just use the date part you could group-by TRUNC(dts). The following are some of the common examples:
Assuming "dts" is your date column, the following functions return date values as described: TRUNC( dts ) Time component set to 00:00:00. ROUND( dts ) Rounded to the nearest day. TRUNC( dts, 'MI' ) All seconds set to 00. ROUND( dts, 'MI' ) Rounded to the nearest minute. TRUNC( dts, 'HH' ) All minutes and seconds set to 00:00. ROUND( dts, 'HH' ) Rounded to the nearest hour. TRUNC( dts, 'MM' ) Dates adjusted to 1st day of its month. TRUNC( dts, 'Q' ) Dates adjusted to 1st day of its quarter. TRUNC( dts, 'YYYY' ) Dates adjusted to 1st day of its year. Note: this is not a complete list of all the ways you can use these functions.
As an example, you could use code similar to the following to count up the rows in your table grouping them by the hour in which the "dts" value fell:
/* --- code --- */ select trunc( dts, 'HH') as hour_buckets , count(*) from sometable group by trunc( dts, 'HH' ) ;
The Problem & The Solution:
The previous functions work quite well until you need to group up your date-time values by something like 10 minute buckets, or 6 hour buckets. For these requirements it's not quite so easy; but by using the formula given below, you can bucket your data by any number of minutes: 30 minutes, 2880 minutes (2days), or even 604,800 minutes (1 week).
Here's the basic formula. (date '1900-01-01' + floor(round(( dts - date '1900-01-01' )*1440/ m, 12))* m /1440) * Replace the "dts" with your date column or value. * Replace the two "m" values with the number of minutes by which you wish to bucket the data. * The date (date '1900-01-01') is any arbitrary date, it just needs to be the same in both places in the formula. To keep things neat, it should also have a time component of 00:00:00. "Trunc(sysdate)" would work. * Note: 1440 is the number of minutes in a day, and the 12 is just used to round the result to 12 decimal places. The above formula will then give you the "dts" date values bucketed to "m" minute intervals.
Example:
Here's a WITH clause that will generate 10,000 random date-time values for the current week:
/* --- code --- */ WITH w_random_dates as ( select trunc(sysdate,'IW') + (mod(abs(dbms_random.random),7)) + (mod(abs(dbms_random.random),86400)/60/60/24) as dts from dual connect by level <= 10000 ) select * from w_random_dates ;
Now we can use the formula to modify the "dts" values into bucketed values. Run the following and observe how the dates are recalculated into 45 minute increments.
/* --- code --- */ /* Include the WITH clause from the previous example here */ select dts , grpd_dts from ( select dts , (date '1900-01-01' + floor(round(( dts - date '1900-01-01' )*1440/ 45, 12))* 45 /1440) as grpd_dts from w_random_dates ) order by 1 ;
Now observe how you can group by the bucketed date-time values and count up the number that fall into each bucket.
/* --- code --- */ /* Include the WITH clause from the previous example here */ select grpd_dts , count(*) from ( select dts , (date '1900-01-01' + floor(round(( dts - date '1900-01-01' )*1440/ 45, 12))* 45 /1440) as grpd_dts from w_random_dates ) group by grpd_dts order by 1 ;
Explanation:
Here's an example that attemptes to explain the logic of the formula. The WITH clause generates a date-time value for every minute of the current date; and it uses last night at midnight (time = 00:00:00) for the arbitrary date "arbdt". This example buckets up the date values into 8 minute intervals. The columns show each step in the formula's calculation with a brief description in the comment. Run this & examine the output.
/* --- code --- */ WITH w_each_minute_of_day as ( select trunc(sysdate)+((rownum-1)/1440) as dts , trunc(sysdate) as arbdt from dual connect by level <= 1440 ) select to_char(dts, 'YYYY-Mon-DD HH24:MI') as "Date" , (dts-arbdt) as a /* Decimal part-of-day since midnight (arbitrary day). */ , (dts-arbdt)*1440 as b /* # of minutes since midnight. */ , (dts-arbdt)*1440/8 as c /* # of 8-minute intervals since midnight. */ , round((dts-arbdt)*1440/8,12) as d /* fix the precision issue, round to 12 decimal places. */ , floor(round((dts-arbdt)*1440/8,12)) as e /* Integral # of 8-minute intervals since midnight -- decimal part ignored. */ , floor(round((dts-arbdt)*1440/8,12))*8 as f /* convert # of 8-minute intervals back to total minutes. */ , floor(round((dts-arbdt)*1440/8,12))*8/1440 as g /* convert back to decimal part-of-day since midnight. */ , (arbdt+floor(round((dts-arbdt)*1440/8,12))*8/1440) as h /* add to midnight (arbitrary day) to get bucketed current dts. */ from w_each_minute_of_day ;
Here's what I think is simpler "bucketizer"
ReplyDeleteTRUNC( myDate, 'HH' ) + ( ( Round( ( TO_NUMBER( to_char( MyDate, 'MI' ) ) / mb ), 0 ) * mb ) / 1440 )
mtDate is your DATE value or column. mb is the number of minutes that you want in each bucket.
Moreover, this method would implement "Rounding" meaning that values will go into a lower or higher bucket depending on where they pass the rounding threshold. For example, if I want 15 minute buckets, 1:07 would go into the 1:00 bucket but, 1:08 would go into the 1:15 bucket.
Interesting! I used the "floor" function -- by changing it to "round" in my solution, you can get the same rounding effect as in your solution. I guess it depends on how you want to do the bucketing. Thanks for pointing that out.
DeleteThe main problem with your solution (as I see it) is that if you choose to bucket by something like 8 minutes, you get mostly 8-minute-wide buckets with a 4-minute-wide bucket at the top of each hour. By only bucketing by values that divide evenly into an hour (2, 4, 6, 10, 12, 15, 20, 30) you can avoid that issue.
Try the following -- it compares my original solution and yours. You can see that by bucketing by 8 minute intervals, your solution produces the hourly 4-minute-wide intervals. I've also changed my solution to "round" for comparison with yours.
WITH
w_param as ( select 8 as m from dual )
, w_random_dates as
( select date '2000-01-01' + ((level-1)/1440 /* 1440 86400 */ ) as dts
from dual connect by level <= 1000000
)
, w_compare as
( select dts
, date'2000-01-01' + round(round((dts - date'2000-01-01')*1440/m, 12))/1440*m as mine
, trunc(dts, 'HH') + round(to_number(to_char(dts, 'MI'))/m)*m/1440 as yours
from w_random_dates join w_param on 1=1
)
select dts
, mine
, yours
, decode( mine, yours, 'ok', null ) as flg
from w_compare
where 1=1
--and mine != yours
;
My solution buckets to a consistent 8-minute-wide buckets across all hours and days. This is also true even with buckets like 7 or 11 minutes.
Thanks for the input.