SQL or NoSQL?

I’ve had some (relatively) free time over the last few weeks to dig down and get into some non-relational database alternatives, including Apache CouchDB and MongoDB. These databases fall into what is being referred to as the NoSQL movement: an attempt to get away from the strictly-typed table-column-row organization of relational databases, and on to databases where documents are stored in their entirety instead of broken into normalized chunks.

I documented my first experiences with MongoDB last week. I’ve been playing around with CouchDB over the last few days, which has turned out to be very different than MongoDB, despite their similarities on paper. When compared to a relational database such as mySQL, each system has specific use cases and pros and cons.

mySQL and other Relational Databases

In the web world, relational databases are the de facto go-to solution. It’s nigh impossible to find a web developer that can’t string together at least a SELECT statement. The fact that there’s a bit of an impedance mismatch between a relational system and the software objects increasingly used to manage the data in that system doesn’t do much to slow anyone down.

ColdFusion is a great example here: the ORM, Object Relational Management, built into it shows just how ingrained relational databases are in the web world. The ColdFusion community isn’t the only one to move in this direction—the .NET community is increasingly moving to LINQ, a SQL-like extension to put a sort of relational spin on variable scopes. (Think Query-of-Queries on crack.)

Relational database systems are a known quantity: there are proven solutions for nearly every major problem a web developer would run into. But, they aren’t infallible, and they aren’t a perfect solution to every problem.

Document-Oriented Databases and noSQL

A document-oriented database is just what it sounds like: a database of entire documents. A document-oriented database for a blog, for example, wouldn’t split up the posts and comments and tags into separate tables. Instead, absolutely everything needed to render that blog post would be in a single document, like so:

{
    _id:      "sql-or-nosql",
    created:  "2010/02/14 17:00:00 +0000",
    author:   "Rick O",
    tags:     [ "sql", "nosql", "mongodb", "couchdb" ],
    body:     "...",
    comments: {
        "2010/02/14 17:31:45 +0000": {
            name:    "Ben Camden",
            email:   "bc@cfgurus.info",
            comment: "..."
        }
    }
}

The infographic I made relating SQL to the MapReduce functionality in MongoDB goes into more depth, as does my previous post on MongoDB, but the gist is that you’ll use MapReduce functions to get the data out. You’ll have one map function for your Recent Posts page, one for your Posts By Tag page, etc. The syntax for MongoDB is a little different than that for CouchDB, so here’s a CouchDB example:

/* for recent posts ... */
function (doc) {
    /* use the post date as the key */
    emit(doc.created, doc);
}
/* for posts by tag ... */
function (doc) {
    /* return a copy of each post for each tag */
    doc.tags.forEach(function (tag) {
        /* use the tag then date for the key */
        emit([ tag, doc.created ], doc);
    });
}

This MapReduce functionality is powerful stuff. You can do some interesting things with it, and it makes parallelizing your query dirt simple. Combined with the denormalized approach used in a document-oriented database, where you can fetch an entire page’s data with a single query, it’s not hard to see why more and more large sites (like Facebook and Amazon) are going with similar systems.

But, like relational databases, documented-oriented databases and MapReduce aren’t appropriate for every situation. And, again like relational databases, different software packages offer different solutions and techniques. CouchDB and MongoDB use JSON and JavaScript in similar ways, but to different effect.

Reduce functions, for example, use the same function signature and are almost perfectly interchangeable between the two. Map functions, however, look similar but have very different implications between the two. The pre-filtering provided by the query attribute in MongoDB doesn’t have a direct counterpart in CouchDB—that kind of logic would need to be done procedurally in the CouchDB map function.

CouchDB’s approach also doesn’t allow for post-filtering of aggregated values equivalent to SQL’s HAVING clause, or MongoDB’s finalize function—that functionality would need to happen in the client. CouchDB also builds an index for each MapReduce query, like a materialized view or a temporary table with the cached results of the table.

This indexing stage leads to a sizable performance implication: the first time a query is run it may take an extremely long time to build this index, but every query after that is lightning fast. For example, on a 300MB database with 300,000 documents and a relatively simple query, the first run took over 20 minutes to index but only milliseconds to run thereafter—even when the underlying data values were changed.

This compromise means that there’s effectively no such thing as an ad hoc query against CouchDB. MongoDB doesn’t make this trade-off: it doesn’t materialize queries. On a similarly-sized database, the first query took 1200ms, with additional runs of the same query only shaving this down to 1100ms.

While CouchDB would be wicked fast for databases that are always queried the same way, it would be almost unusable in a data mining environment where the slicing and dicing could be different every time. MongoDB may never be able to touch CouchDB’s performance for repeated queries, but it has the flexibility to replace a relational database in a wider range of scenarios, including data mining.

Next Steps

I’m certainly not the first to poke around with ColdFusion and NoSQL databases. Bill Shelton has an excellent series of posts on using MongoDB with ColdFusion. Russ Spivey has some CouchDB wrapper code for ColdFusion up on RIAForge (and related blog posts). Matt Woodward has a massive brain dump about CouchDB and NoSQL integration with ColdFusion in general.

I think it would be beneficial to throw together a few example applications using ColdFusion and CouchDB and MongoDB. Maybe a LitePost implementation against each? I’ll see if I can get one done next week.

Published by

Rick Osborne

I am a web geek who has been doing this sort of thing entirely too long. I rant, I muse, I whine. That is, I am not at all atypical for my breed.