Writing a Parent-Child Query Sorter in CF, Part 10

The first suggestion for an enhancement came from Tony Monast of CKM9. He was looking for a way to do bread crumb-style navigation based on a tree-sorted query. His way of doing it was actually pretty elegant:

  1. Given: A tree-sorted query and the ID of the current child,
  2. Iterate forwards through the query until you find the current child. Keep track of its depth. Then, reverse direction and …
  3. Iterate backwards through the ones you’ve already been over, kindof like a Bungee jumper.
  4. When you hit an item that has less depth than the depth you’re keeping track of, add it to the list and make its depth your current depth.
  5. Repeat the preceding two steps until you are back at the top of the query or your depth is less than 1.

He is essentially doing edge-detection on his query. By keeping track of only the first item where the depth changes you get the item’s parents, but none of its aunts or uncles. Like I said: elegant. For small queries, this would be pretty fast, and still O(n), which is good. Even for really large queries a single lookup would be fast.

The only problem is that it’s not all that repeatable or scalable. That is, once you have a given item’s lineage, why bother going through all that trouble a second time? Why not just calculate the lineage once and store it? You could store it in an Application-level struct and only calculate the lineage when you have a cache miss. But that would involve locking and making sure that you don’t end up with race conditions, and would generally be messy.

Why not build the lineage when we sort the query, and store it as another column? We already know the parents and children as we needed them to sort the query in the first place. We’re presumably caching the query anyway, as tree-sorting is fast but not something you want to do for every request. By storing the lineage in the query we not only keep it close to the data itself, but we don’t have to worry about its cache getting out of sync with the query itself. Given a decent architecture, we could make lineage lookups O(1), a zippy constant-time operation.

Enough theory, let’s add to our UDF code. First we need to add an optional argument for our new columns:

    <cfargument name="LineageName" type="string" required="No" default="TreeLineage">

We already have an index for keeping track of where each child was in the original query, but we’ll need to start keeping track of where they end up in the new query. I’ll explain why a little later. Add the following local variable declarations to the top of the function:

    <cfset var NewRowFromID=StructNew()>
    <cfset var Lineages=StructNew()>
    <cfset var ThisLineage="">
    <cfset var ThisParentRowID="">

I’ve also changed how the function works a little bit here. Since we’re adding several new columns over the next few articles, you might want an option to not bolt on columns that you don’t need. So, we’ll adjust the code to only populate columns that the programmer actually requests. Change the following line:

    <cfset var Ret=QueryNew(ListAppend(Stuff.ColumnList, Arguments.DepthName))>

To this:

    <cfset var RetColList=Stuff.ColumnList>
    <cfset var Ret="">
    <cfif Arguments.DepthName NEQ ""><cfset RetColList=ListAppend(RetColList, Arguments.DepthName)></cfif>
    <cfif Arguments.LineageName NEQ ""><cfset RetColList=ListAppend(RetColList, Arguments.LineageName)></cfif>
    <cfset Ret=QueryNew(RetColList)>

There’s nothing arcane here. We’re just dynamically building a list of the columns we want our result query to have. We start off with the same columns that the provided query has, and then unless the developer explicitly says that they don’t want Depth and Lineage columns, we tack those on before we instantiate our result query.

The rest of the magic happens down in the part of the function that adds the new rows to the result query. Replace this line:

            <cfset QuerySetCell(Ret, Arguments.DepthName, ThisDepth)>

With the following lines:

            <cfset NewRowFromID[ThisID]=Ret.RecordCount>
            <!--- Try to find the parent's lineage --->
            <cfif StructKeyExists(Lineages, Stuff[Arguments.ParentID][RowID])>
                <cfset ThisLineage=Lineages[Stuff[Arguments.ParentID][RowID]]>
                <cfset ThisLineage="">
            <!--- Add the parent if there is one --->
            <cfif StructKeyExists(NewRowFromID, Stuff[Arguments.ParentID][RowID])>
                <cfset ThisLineage=ListAppend(ThisLineage, NewRowFromID[Stuff[Arguments.ParentID][RowID]])>
            <cfset Lineages[ThisID]=ThisLineage>
            <cfif Arguments.DepthName NEQ ""><cfset QuerySetCell(Ret, Arguments.DepthName, ThisDepth)></cfif>
            <cfif Arguments.LineageName NEQ ""><cfset QuerySetCell(Ret, Arguments.LineageName, ThisLineage)></cfif>

This code totally cheats. But, cheating is good when it comes to programming opaque/black-box functions like this. Anything we can do to speed up or simplify the code is fair game.

First we add the current item’s new row number to our new index. The next part looks to see if we can find a lineage for the parent for the current item in our lineage cache. Since we will, by definition, be encountering items in tree-sorted order, everything but the top-level items should have cached parent lineages by the time we get to them. Basically, we’re trying to get a list of our grandparents.

Next we try to find out parent’s row number (not ID) in our cache. If it exists then we can append it to the current item’s lineage. That would be the grandparent list we just got, now plus our parent. We then store this item’s lineage in our cache so that its children can use it.

The last two lines just set the proper cells in the result query, checking to make sure that the developer has not explicitly told us to not return them.

That’s it. That’s all there is to it. Our resultant query now looks like this:

# ID Name Parent Depth Lineage
1 12 Coding 0 0  
2 15     Applications 12 1 1
3 8         C++ 15 2 1,2
4 9     Hybrid 12 1 1
5 19         Perl 9 2 1,4
6 26     Web 12 1 1
7 16         ColdFusion 26 2 1,6
8 7 Food 0 0  
9 1     Cheese 7 1 8

Now for the million-dollar question: Why did we use row numbers instead of IDs?

Answer: Because it makes things really, really easy. That is, with row numbers we can easily find out the ParentID, or the Name, or whatever else. To print out the bread crumb trail for any specific item, all you have to do is loop over the lineage:

<cfset ChildRowID=7>
<cfset Crumbs=ArrayNew(1)>
<cfloop list="#ListAppend(TreeStuff.TreeLineage[ChildRowID],ChildRowID)#" index="i">
    <!--- Do any formatting you want here, maybe even cache this if you can --->
    <cfset ArrayAppend(Crumbs,TreeStuff.Name[i])>
<cfoutput>#ArrayToList(Crumbs, " &##187; ")#</cfoutput>


Coding » Web » ColdFusion

By Rick Osborne

I am a web geek who has been doing this sort of thing entirely too long. I rant, I muse, I whine. That is, I am not at all atypical for my breed.