Calculating overtime with pure SQL

Table of contents for tricky_overtime

  1. Calculating overtime with pure SQL
  2. Anatomy of a bug: Tricky Overtime, Part 1

So let’s say that you have a time-entry system where an employee clocks-in, clocks-out, and enters a job code. After 40 hours per week, you have to pay that employee overtime. This is further complicated because each job code may have a different pay rate, so if the employee is doing an expensive job when they go into overtime, you can’t pay them the cheap overtime. How do you do this completely in SQL?

Can’t be done, right? You’d need some sort of iterative process to find the time before 40 hours, mark it as regular time, then find the time after 40 hours and mark it as overtime, and account for splitting any hours that straddle the 40-hour mark. Right?

Wrong. I’ll bet you I can do that in one SQL statement.

So let’s start with our table structure. First is our time entry log table, aptly named TimeEntry_Log:

ColumnTypeComments
idintIdentity/Autonumber
EmployeeintThe Foreign Key into you Employees table
Job_CodevarcharThe job code, maybe a FK to a Job_Codes table?
In_DatedatetimeThe clock-in time (nullable)
Out_DatedatetimeThe clock-out time (nullable)
In_DaysmalldatetimeDay-granularity-level for In_Date
In_WeeksmalldatetimeWeek-granularity-level for In_Date
HoursmoneyThe total hours worked for that shift and job code

You could keep In_Period instead of In_Week if your pay is based on a biweekly/80-hour schedule. Again, these dates are just denormalized from the In_Date column, preferably with a trigger. Similarly, the Hours column should be computed with a trigger instead of maintained manually. (Why use the money type? I’m quirky, and it’s a nice 4-decimal-place type, which is plenty for what we’re doing here. And really, isn’t time just money?) My trigger looks like this:

CREATE TRIGGER [TimeEntry_Log_fix] ON dbo.TimeEntry_Log AFTER INSERT, UPDATE AS SET NOCOUNT ON IF UPDATE(In_Date) OR UPDATE(Out_Date) BEGIN UPDATE TimeEntry_Log SET Hours = CASE WHEN (In_Date IS NOT NULL) AND (Out_Date IS NOT NULL) THEN DATEDIFF(MINUTE,In_Date,Out_Date) / 60.0 END, In_Day = CASE WHEN In_Date IS NOT NULL THEN DATEADD(DAY,DAY(In_Date)-1,DATEADD(MONTH,MONTH(In_Date)-1,DATEADD(YEAR,YEAR(In_Date)-2000,{d '2000-01-01'}))) END, In_Week = CASE WHEN In_Date IS NOT NULL THEN DATEADD(DAY,1-DATEPART(WEEKDAY,In_Date),DATEADD(DAY,DAY(In_Date)-1,DATEADD(MONTH,MONTH(In_Date)-1,DATEADD(YEAR,YEAR(In_Date)-2000,{d '2000-01-01'})))) END WHERE (id IN (SELECT id FROM inserted)) END SET NOCOUNT OFF

We’ll also need a pivot table, which I’ve called Pivot100:

ColumnTypeComments
iint 

For those of you not used to using pivot tables, just create a table like the above and add two rows, one with i=0 and another with i=1. Trust me on this, it’ll make sense later. (Mine happens to have values 0 to 100, hence its name.)

Now that we have our table structure, we need to put some bogus data in. Ensure that your bogus data has at least two people that go over 40 hours: one that hits 40 hours exactly and one that has a job that straddles the 40-hour mark.

insert into timeentry_log (employee, job_code, in_date, out_date) values (1, 'AAAA', {ts '2006-01-02 08:00:00'}, {ts '2006-01-02 16:00:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (1, 'BBBB', {ts '2006-01-03 08:00:00'}, {ts '2006-01-03 16:00:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (1, 'AAAA', {ts '2006-01-04 08:00:00'}, {ts '2006-01-04 16:00:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (1, 'AAAA', {ts '2006-01-05 08:00:00'}, {ts '2006-01-05 16:00:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (1, 'AAAA', {ts '2006-01-06 08:00:00'}, {ts '2006-01-06 16:00:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (1, 'BBBB', {ts '2006-01-07 08:00:00'}, {ts '2006-01-07 16:00:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (2, 'AAAA', {ts '2006-01-02 08:00:00'}, {ts '2006-01-02 16:30:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (2, 'BBBB', {ts '2006-01-03 08:00:00'}, {ts '2006-01-03 16:30:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (2, 'AAAA', {ts '2006-01-04 08:00:00'}, {ts '2006-01-04 16:30:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (2, 'AAAA', {ts '2006-01-05 08:00:00'}, {ts '2006-01-05 16:30:00'}) insert into timeentry_log (employee, job_code, in_date, out_date) values (2, 'AAAA', {ts '2006-01-06 08:00:00'}, {ts '2006-01-06 16:30:00'})

So if we total up the hours for each person for the week, we get:

SELECT In_Week, Employee, SUM(Hours) AS TotalHours
FROM TimeEntry_Log
GROUP BY In_Week, Employee
ORDER BY In_Week, Employee

Yields:

In_WeekEmployeeTotalHours
1/1/2006148.0
1/1/2006242.5

In order to be able to keep track of when an employee goes overtime, we need a running total of hours as each clock-in happens. Running totals are fairly easy in SQL, as they are just a self-join. We’ll also throw in a few extra columns: HoursBefore and HoursAfter, which logically represent how many hours the employee had before they clocked in and after they clocked out.

SELECT a.In_Week, a.Employee, a.In_Day, a.Job_Code,
  COALESCE(SUM(b.Hours),0) AS HoursBefore,
  a.Hours,
  COALESCE(SUM(b.Hours),0) + a.Hours AS HoursAfter
FROM TimeEntry_Log AS a LEFT OUTER JOIN TimeEntry_Log AS b
  ON (a.Employee = b.Employee) AND (a.In_Week = b.In_Week)
  AND (b.In_Date < a.In_Date)
GROUP BY a.In_Week, a.Employee, a.In_Day, a.Job_Code, a.Hours
ORDER BY a.In_Week, a.Employee, a.In_Day

Let me explain what this is doing for those of you who haven’t seen the self-join-rolling-sum trick before. They key is the ON part of the join. The first part looks normal enough: join on the pay period (week) and the employee. The second part grabs all records on the right-hand side (B) that came before the corresponding record on the left-hand side (A). When you sum these up, you get a running total. Pretty cool, eh? You just need to be careful about your operator. Do you want less-than or less-than-or-equal? In our case, we want just less-than so that we can get our HoursBefore. Also, try it without the COALESCE statements to see why I had to add them in there.

Running this query yields:

In_WeekEmployeeIn_DayJob_CodeHoursBeforeHoursHoursAfter
1/1/200611/2/2006AAAA0.08.08.0
1/1/200611/3/2006BBBB8.08.016.0
1/1/200611/4/2006AAAA16.08.024.0
1/1/200611/5/2006AAAA24.08.032.0
1/1/200611/6/2006AAAA32.08.040.0
1/1/200611/7/2006BBBB40.08.048.0
1/1/200621/2/2006AAAA0.08.58.5
1/1/200621/3/2006BBBB8.58.517.0
1/1/200621/4/2006AAAA17.08.525.5
1/1/200621/5/2006AAAA25.58.534.0
1/1/200621/6/2006AAAA34.08.542.5

This next part is the nasty part. Let me try to explain it, show the SQL, then explain it again.

We need to assign a value to each of our time segments to show whether we are paying our employee regular time or overtime. I choose to do this as a Rate multiplier column, but you could do it as a flag or whatever you like. But, to do this, we need to split any segments that cross the 40-hour boundary. We included the HoursBefore and HoursAfter to make this easy on us, because we can just check for the case when ((HoursBefore<40.0) AND (HoursAfter>40.0)). This lets us identify any rows that need to be split.

SQL is very good at rolling up multiple records into one, but it’s not nearly as good at splitting one record into two. This is where our pivot table comes in. My pivot table has 100 rows in it, but you’ll see in my SQL that I limit it to only one or two rows at a time. If we then join on those two rows we have duplicated our time entries and can then work a little CASE magic to fiddle with each of the rows to get the right hour totals and rate code for each. If we do even more CASE magic, we can correctly join on two rows only when we need to, and not worry about accidentally duplicating rows that don’t need it.

This is what we end up with, and you can see our previous query sandwiched in the middle and aliased as C:

SELECT In_Week, Employee, In_Day, Job_Code, CASE WHEN (HoursBefore < 40.0) AND (HoursAfter > 40.0) THEN /* straddler */ CASE WHEN p.i = 0 THEN Hours + (40.0 - HoursAfter) ELSE HoursAfter - 40.0 END /* hours adjustment */ ELSE Hours /* no adjustment */ END AS Hours, CASE WHEN (HoursBefore < 40.0) AND (HoursAfter > 40.0) THEN CASE WHEN p.i = 0 THEN 1.0 ELSE 1.5 END /* straddlers */ WHEN HoursBefore >= 40.0 THEN 1.5 /* non-straddlers after 40 hours */ ELSE 1.0 /* non-straddlers before 40 hours */ END AS Rate FROM ( -- c SELECT a.In_Week, a.Employee, a.In_Day, a.Job_Code, COALESCE(SUM(b.Hours),0) AS HoursBefore, a.Hours, COALESCE(SUM(b.Hours),0) + a.Hours AS HoursAfter FROM TimeEntry_Log AS a LEFT OUTER JOIN TimeEntry_Log AS b ON (a.Employee = b.Employee) AND (a.In_Week = b.In_Week) AND (b.In_Date < a.In_Date) GROUP BY a.In_Week, a.Employee, a.In_Day, a.Job_Code, a.Hours ) AS c INNER JOIN Pivot100 AS p ON (i < CASE WHEN (HoursBefore < 40.0) AND (HoursAfter > 40.0) THEN 2 ELSE 1 END) ORDER BY In_Week, Employee, In_Day, Job_Code

Ugh. Nasty, right? Told ya. Let’s go back through it.

The first CASE does the hours adjustment. We first check to see if the entry straddles the 40-hour mark. If so, we do another check, this time against our pivot table. If our pivot table’s value is 0, then we need to shave off any hours over 40. If the pivot table’s value is 1 then we need to shave off the hours under 40. And, of course, if it wasn’t a straddler then we leave it alone.

The second CASE does pretty much the same thing, but this time sets our rate code for us. There’s just one extra WHEN to account for non-straddling segments that are still in overtime.

The join is the next interesting part, as the C query should be verbatim from above. We know that we want one row from our pivot table for non-straddling segments so that we don’t end up with duplicates. We also know that we want two rows from our pivot table for each straddling segment. So, we get a bit creative with a CASE statement and get exactly that.

Running this query on our bogus data yields:

In_WeekEmployeeIn_DayJob_CodeHoursRate
1/1/200611/2/2006AAAA8.01.0
1/1/200611/3/2006BBBB8.01.0
1/1/200611/4/2006AAAA8.01.0
1/1/200611/5/2006AAAA8.01.0
1/1/200611/6/2006AAAA8.01.0
1/1/200611/7/2006BBBB8.01.5
1/1/200621/2/2006AAAA8.51.0
1/1/200621/3/2006BBBB8.51.0
1/1/200621/4/2006AAAA8.51.0
1/1/200621/5/2006AAAA8.51.0
1/1/200621/6/2006AAAA6.01.0
1/1/200621/5/2006AAAA2.51.5

As you can see, nothing particularly interesting happens for employee 1. Their last day of work counts as overtime because they had hit exactly 40 hours on the previous shift. Employee 2, however, now has an extra line item, as the last entry has been split into regular time before the 40-hour mark, and overtime after the 40-hour mark.

Taadaaa!

Presumably, you will then want to wrap the whole query in another query that sums up the hours for each job code for each day, so that you have unique records for Week-Employee-Day-Code-Rate combinations. You will probably also then want to tie into your Employees table and maybe your Job_Rates table, or whatever.

Published by

Rick Osborne

I am a web geek who has been doing this sort of thing entirely too long. I rant, I muse, I whine. That is, I am not at all atypical for my breed.