The job

I occasionally get asked about the type of system I’m currently working on. I used to do Content Management, then I moved over to eCommerce, and at my current job I do a little of both of those. Mostly, however, I am currently working on several Data Warehousing projects.

The big one is called SMART, and is our Sales, Marketing, Analysis, and Reporting Tool. As you can probably guess from the title, it’s all about sales trends and reporting. It started out as a simple nightly copy-and-cleanup of the sales history table from our AS400 mainframe, intended to speed up access by throwing just that data onto a dedicated SQL Server. It has since grown quite a bit:

Our nightly SMART rebuilding DTS package

In the image above you can see the nightly rebuild of our system:

  1. The top row truncates any tables that will be reimported.
  2. The second row of blue connectors are coming from our mainframe …
  3. … to the third row of grey connectors going to our SQL Server.
  4. We then do a bit of data scrubbing to compensate for years of database cruft …
  5. … before we get to the tail end that actually does the rebuilding.

Originally, the SQL Server fed an MS OLAP cube that rebuilt itself nightly. We then passed MDX queries through the SQL Server from ColdFusion, which then were flattened and passed back. However, after passing about a dozen dimensions, managing the cube was horrendously difficult, and it grew beyond the ability to rebuild in single night. Also, we were never really all that happy with the query speed nor the flexibility. For some reason, the SQL server would occasionally just stop talking to the cube and hang for 2 minutes before picking back up as if nothing had happened.

Long story short, we knew that we needed to just ditch the cube and write our own multi-dimensional query tool. That means that all of those tables coming from the mainframe feed into what is essentially one huge table during the build at the end. We still have normalized lookup tables, but the bulk of the query work happens on one big table.

The current incarnation of the tool is completely table-agnostic, in the sense that there’s nothing in the ColdFusion code that ties it to any specific table layout, column names, etc. Instead, the app can be pointed at different tables via XML files that contain all of the dimensions, measures, and summary calculation instructions. This means that we can use the same tool for querying our time entry system, our sales history, our customer service call history, and our web logs.

Thanks to the XML configuration, adding dimensions and measures to the system takes minutes instead of hours:

  1. The DBA adds the column to the master table, as well as a lookup table if necessary.
  2. Someone adds a line to the XML file specifying the column name, data type, and any summarization rules.

The XML looks like this:

		<setting id="TableName" value="sales_history" />
	<daterange id="DateRange" title="Date Range" idColumnMonth="Invoice_Date_Mon" idColumnYear="Invoice_Date_Year" idColumnDay="Invoice_Date_Day" cfSqlType="CF_SQL_DATE" />
		<dimension id="SoldToCustomer" title="SoldTo Customer" idColumn="Address_Number" titleColumn="Alpha_Name" titleTable="Addresses" titlePK="Address_Book_Number" cfSqlType="CF_SQL_INTEGER" />
		<dimension id="ShipToCustomer" title="ShipTo Customer" idColumn="Ship_To_Address_No" titleColumn="Alpha_Name" titleTable="Addresses" titlePK="Address_Book_Number" cfSqlType="CF_SQL_INTEGER" />
		<dimension id="SKU" title="Sold As SKU" idColumn="Second_Item_number" titleColumn="Description" titleTable="Items_All_Mapped" titlePKlist="SKU,Division" titleFKlist="Second_Item_Number,Division" cfSqlType="CF_SQL_VARCHAR" orderBy="ID" />
		<measure id="SalesDollars" title="Sales Dollars" column="Extended_Price" /> 
		<measure id="CostDollars" title="Cost Dollars" column="Extended_Cost" /> 
	<measuretotal title="(Totals)" />
		<calcmeasure id="MarginDollars" title="Margin Dollars" sourceExpr="SalesDollars - CostDollars" reqMeasures="SalesDollars,CostDollars" />

That’s it. No one ever writes a line of ColdFusion. And yes, we probably could have used database introspection to cut out the second step … but why? What would we really gain that we wouldn’t be trading for performance?

As you can see from that XML fragment, we do some neat things like optimizing queries for different types of date ranges. A user could put in a date range of "2002,2002-2003,10/2002,10/1/2002" and get exactly what they would expect to get, and the query would still be nigh-instantaneous thanks to the date denormalization I’ve mentioned here before. We also do all of the calculation on the SQL Server and absolutely none of it in the ColdFusion, not even the totaling. The SQL Server is built for that kind of thing, while ColdFusion is built for making web pages, so why not just let each to their own? The calculated measures are a bit tricky, but when you start getting into things like average price and such, they are immensely better to do in SQL than to try and write ColdFusion for every possible scenario.

The current system is up to 40 dimensions, 11 static measures, and 4 calculated measures. The ColdFusion required to allow someone to query any combination of those fields, group them in any way, provide the capability for ad hoc data filtering, and report over multiple date ranges of multiple types simultaneously is … interesting … to say the least, but it works extremely well and is blazingly fast. In fact, the upper limit on the system isn’t either of the SQL or ColdFusion servers, but the craptastic 10mbit network speed many of our users are on, and the old 400MHz machines with 128MB of memory. Most of the queries, even those returning hundreds of thousands of rows, come back in under 5 seconds, but when the users want a resultset that has 5,000 rows and they want to display it in a web browser … oy.

In fact, to speed things up, much of the interface for the system is AJAX-ified so that we only have to return the HTML elements that have changed. Steps have also been taken to make the interface both keyboard- and mouse-friendly, as we have quite the mix of old green-screeners and 21-year-old tech-support users.

But, anyway, there you go. That’s what I do these days.