MongoDB + NikePlus
Feb 16
I’ve been playing around with MongoDB, comparing and contrasting the group() and mapreduce() functions. For test data, I’ve been using the last two years worth of Nike+ data from my iPod. For my future reference, here’s how to aggregate all of that data using the group() function:
db.workouts.group({
key: { },
cond: { distancekm: { $gt: 0.5 } },
initial: {
km: 0, // total kilometers
n: 0, // workout count
g: 0, // count of workouts with goals
d: 0, // duration in ms
w: 0, // "wins" = better than goal
l: 0, // "losses" = not better than goal
mn: -1, // minimum km
mx: -1, // maximum km
wg: 0 // goal win rate
},
reduce: function(obj,prev) {
prev.km += obj.distancekm;
prev.n++;
prev.d += obj.duration;
if(obj.distancekm > prev.mx)
prev.mx = obj.distancekm;
if((prev.mn < 0) || (obj.distancekm < prev.mn))
prev.mn = obj.distancekm;
if(goal.goalType) {
prev.wg++;
prev.g += obj.goalValue;
if(obj.goalType == "Time")
prev[(obj.duration >= obj.goalValue * 1000) ? 'w' : 'l']++;
else if(obj.goalType == "Distance") {
dg = obj.goalValue * (obj.goalUnit == "mi" ? 1.609344 : 1.0);
prev[obj.distancekm >= dg ? 'w' : 'l']++;
} // else Distance
} // if goalType
},
finalize: function(out) {
out.avgkm = out.km / out.n;
out.mi = out.km / 1.609344;
out.d /= 1000.0; // convert ms to seconds
out.dm = out.d / 60.0; // convert sec to min
out.dh = out.dm / 60.0; // convert min to hr
out.wp = (out.wg > 0) ? out.w / out.wg : 0;
}
});
This produces a result set like so:
[
{
"km" : 775.1911999999996,
"n" : 80,
"g" : 8307.25,
"d" : 322547.941,
"w" : 24,
"l" : 5,
"mn" : 0.7493,
"mx" : 26.2581,
"wg" : 29,
"avgkm" : 9.689889999999995,
"mi" : 481.68148015588935,
"dm" : 5375.799016666667,
"dh" : 89.59665027777778,
"wp" : 0.8275862068965517
}
]
I think what trips me up about this is just remembering that the prev object is the one being assigned to, and thus should always be an l-value and not an r-value, while the obj object is the incoming data, and thus should always be an r-value and never an l-value. Going forward, I should probably name them left and right or something to make it clearer.
I do like the cleaner syntax of the group() function over the mapreduce() function. But, I also get that it is just semantic sugar—you could easily translate from the one to the other. Hmm … maybe I should update my SQL→MongoDB cheat sheet to go through a group() example first?
Unfortunately, using group means you can’t use the server-side post-filtering that you can with mapreduce—since your results aren’t converted to a new collection, functions like find and sort throw an error if you try to chain them. Maybe this will be included/fixed in a future version of MongoDB?