Hacking SQL Server DTS package dependencies, or: Auto-Kaboom

As I’ve mentioned before, the megalithic data warehouse reporting system that I wrote for my day job runs a huge DTS package every night to completely rebuild the system, and then a smaller DTS package every 10 minutes during the business day to incrementally keep the system up-to-date with the mainframe. We’re constantly getting requests for new dimensions and measures to be added to the data warehouse (we’re up to ~80 dimensions and ~40 measures as of this writing), which I let stack up for about a month until I can do them all at once. These almost always involve minor changes to the DTS packages to add the new columns to be brought over from the mainframe.

As much as it pains me to admit it, I am actually quite human, and I more-than-occasionally do something really dumb that breaks the huge nightly DTS package. It’s grown in complexity over the past year that even I, the guy that wrote it and maintains it, don’t always remember every last dependency and relationship. Tables cached from views based on other tables cached from other views, etc. You get the picture. And, in case you don’t, here’s the picture:

Data Warehouse Nightly Build DTS Package

On the occasion that the nightly breaks, I really don’t want the incremental to run. I get paged automagically, I can’t always make it into the office to fix it before the incremental starts. Similarly, there are times when we need new data off the mainframe now, and need to run the nightly during the day. If I forget to turn off the incremental before I restart the nightly, things can go all kinds of nuts.

Thus, I needed a way for the incremental to do the following:

  1. Look for the latest run of the nightly.
  2. If the nightly bombed out, don’t run the incremental.
  3. If the nightly is still running, don’t run the incremental.

The easiest way to do this was to add a SQL task to the beginning of the incremental package:

FROM msdb.dbo.sysdtspackagelog
WHERE (name = 'SMART Nightly')
  AND ((errorcode <> 0) OR (endtime IS NULL))
  AND (starttime IN (
    SELECT MAX(starttime)
    FROM msdb.dbo.sysdtspackagelog
    WHERE (name = 'SMART Nightly')
      AND (smart.dbo.DateTimeToDate(starttime) = smart.dbo.DateTimeToDate(GETDATE()))

From the bottom: the subquery finds the most recent time that the nightly has run, as long as that time is today. (Technically, the nightly runs in the wee hours of the morning.) If we didn’t run a nightly today, we’re probably safe to run the incremental, even if the nightly bombed yesterday. We’ve done dumb things like accidentally starting the package twice at once, hence the IN.

The outer query then finds the appropriate log entries and counts the number with non-zero error codes (bombs) or NULL finish times (still running), thus giving us the number of unfinished or problematic jobs. The CASE statement then inverts that number to give us a 1 if everything looks good to go, or a 0 if not. We then put that number as the denominator in a division, thus intentionally producing a divide-by-zero error if there were any unfinished or problematic nightly jobs.

Thus, no matter how often you run the incremental, it’ll keep (intentionally) bombing out until you rectify the problem with the nightly, or until the nightly finishes running.

By Rick Osborne

I am a web geek who has been doing this sort of thing entirely too long. I rant, I muse, I whine. That is, I am not at all atypical for my breed.