I’ve been playing around with MongoDB, comparing and contrasting the group() and mapreduce() functions. For test data, I’ve been using the last two years worth of Nike+ data from my iPod. For my future reference, here’s how to aggregate all of that data using the group() function:
db.workouts.group({ key: { }, cond: { distancekm: { $gt: 0.5 } }, initial: { km: 0, // total kilometers n: 0, // workout count g: 0, // count of workouts with goals d: 0, // duration in ms w: 0, // "wins" = better than goal l: 0, // "losses" = not better than goal mn: -1, // minimum km mx: -1, // maximum km wg: 0 // goal win rate }, reduce: function(obj,prev) { prev.km += obj.distancekm; prev.n++; prev.d += obj.duration; if(obj.distancekm > prev.mx) prev.mx = obj.distancekm; if((prev.mn < 0) || (obj.distancekm < prev.mn)) prev.mn = obj.distancekm; if(goal.goalType) { prev.wg++; prev.g += obj.goalValue; if(obj.goalType == "Time") prev[(obj.duration >= obj.goalValue * 1000) ? 'w' : 'l']++; else if(obj.goalType == "Distance") { dg = obj.goalValue * (obj.goalUnit == "mi" ? 1.609344 : 1.0); prev[obj.distancekm >= dg ? 'w' : 'l']++; } // else Distance } // if goalType }, finalize: function(out) { out.avgkm = out.km / out.n; out.mi = out.km / 1.609344; out.d /= 1000.0; // convert ms to seconds out.dm = out.d / 60.0; // convert sec to min out.dh = out.dm / 60.0; // convert min to hr out.wp = (out.wg > 0) ? out.w / out.wg : 0; } });
This produces a result set like so:
[ { "km" : 775.1911999999996, "n" : 80, "g" : 8307.25, "d" : 322547.941, "w" : 24, "l" : 5, "mn" : 0.7493, "mx" : 26.2581, "wg" : 29, "avgkm" : 9.689889999999995, "mi" : 481.68148015588935, "dm" : 5375.799016666667, "dh" : 89.59665027777778, "wp" : 0.8275862068965517 } ]
I think what trips me up about this is just remembering that the prev object is the one being assigned to, and thus should always be an l-value and not an r-value, while the obj object is the incoming data, and thus should always be an r-value and never an l-value. Going forward, I should probably name them left and right or something to make it clearer.
I do like the cleaner syntax of the group() function over the mapreduce() function. But, I also get that it is just semantic sugar—you could easily translate from the one to the other. Hmm … maybe I should update my SQL→MongoDB cheat sheet to go through a group() example first?
Unfortunately, using group means you can’t use the server-side post-filtering that you can with mapreduce—since your results aren’t converted to a new collection, functions like find and sort throw an error if you try to chain them. Maybe this will be included/fixed in a future version of MongoDB?