<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>No, I am better than that! &#187; SQL</title>
	<atom:link href="http://rickosborne.org/blog/tag/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://rickosborne.org/blog</link>
	<description>Striving to subdue the mediocrity.</description>
	<lastBuildDate>Sun, 21 Aug 2011 23:27:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Yes, Virginia, that&#8217;s automated SQL to MongoDB MapReduce</title>
		<link>http://rickosborne.org/blog/2010/02/yes-virginia-thats-automated-sql-to-mongodb-mapreduce/</link>
		<comments>http://rickosborne.org/blog/2010/02/yes-virginia-thats-automated-sql-to-mongodb-mapreduce/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 23:42:10 +0000</pubDate>
		<dc:creator>Rick Osborne</dc:creator>
				<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[derby]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[mongodb]]></category>
		<guid isPermaLink="false">http://rickosborne.org/blog/?p=1174</guid>
		<description><![CDATA[I have to admit, I&#8217;m pretty darn proud of this one. This is a righteous hack. I can now write SQL against MongoDB with ColdFusion. My project today was to take this SQL statement: SELECT "goalType", SUM("distancekm") AS "totalkm", COUNT(*) AS "workouts", COUNT("powerSongAlbum") AS "songcount", AVG("distancekm") AS "avgkm", MAX("distancekm") AS "maxkm", MIN("distancekm") AS "minkm" FROM [...]]]></description>
			<content:encoded><![CDATA[<p>I have to admit, I&#8217;m pretty darn proud of this one.  This is a righteous hack.  I can now write SQL against MongoDB with ColdFusion.</p>
<p>My project today was to take this SQL statement:</p>
<pre class="sql">SELECT "goalType",
  SUM("distancekm") AS "totalkm",
  COUNT(*) AS "workouts",
  COUNT("powerSongAlbum") AS "songcount",
  AVG("distancekm") AS "avgkm",
  MAX("distancekm") AS "maxkm",
  MIN("distancekm") AS "minkm"
FROM "workouts"
GROUP BY "goalType"</pre>
<p>&#8230; and create a set of ColdFusion components to transform that query into a MapReduce function that would run on MongoDB.  <a href="/blog/index.php/2010/02/09/infographic-migrating-from-sql-to-mapreduce-with-mongodb/">Mapping basic SQL to MongoDB MapReduce</a> isn&#8217;t too hard.  Getting the <a href="/blog/index.php/2010/02/19/derby-svn-coldfusion-sql-parser/">SQL parser from Derby working within ColdFusion</a> was significantly harder.</p>
<p>But I did it.  This is the <strong>completely automated</strong> result:</p>
<pre class="javascript">db.runCommand({
mapreduce: "workouts",
map: function () { emit(
  this.goalType,
  {
    '_cfcount': 1,
    'distancekm_cfsum': isNaN(this.distancekm) ? null : this.distancekm,
    'distancekm_cfnum': isNaN(this.distancekm) ? 0 : 1,
    'powerSongAlbum_cfcount': (this.powerSongAlbum == null) ? 0 : 1,
    'distancekm_cfmax': isNaN(this.distancekm) ? null : this.distancekm,
    'distancekm_cfmin': isNaN(this.distancekm) ? null : this.distancekm
  }
); },
reduce: function (key,vals) {
  var ret = {
    'distancekm_cfmax': null,
    'distancekm_cfsum': null,
    'distancekm_cfmin': null,
    'distancekm_cfnum': 0,
    'powerSongAlbum_cfcount': 0,
    '_cfcount': 0
  };
  for(var i = 0; i < vals.length; i++) {
    var v = vals[i];
    ret['distancekm_cfnum'] += v['distancekm_cfnum'];
    if(!isNaN(v['distancekm_cfmax'])) ret['distancekm_cfmax'] = (ret['distancekm_cfmax'] == null) ? v['distancekm_cfmax'] : (ret['distancekm_cfmax'] > v['distancekm_cfmax']) ? ret['distancekm_cfmax'] : v['distancekm_cfmax'];
    ret['_cfcount'] += v['_cfcount'];
    if(!isNaN(v['distancekm_cfmin'])) ret['distancekm_cfmin'] = (ret['distancekm_cfmin'] == null) ? v['distancekm_cfmin'] : (v['distancekm_cfmin'] > ret['distancekm_cfmin']) ? ret['distancekm_cfmin'] : v['distancekm_cfmin'];
    ret['powerSongAlbum_cfcount'] += v['powerSongAlbum_cfcount'];
    if(!isNaN(v['distancekm_cfsum'])) ret['distancekm_cfsum'] = v['distancekm_cfsum'] + (ret['distancekm_cfsum'] == null ? 0 : ret['distancekm_cfsum']);
  }
  return ret;
},
finalize: function (key,val) {
  return {
    'totalkm'   : val['distancekm_cfsum'],
    'workouts'  : val['_cfcount'],
    'songcount' : val['powerSongAlbum_cfcount'],
    'avgkm'     : (isNaN(val['distancekm_cfnum']) || isNaN(val['distancekm_cfsum'])) ? null : val['distancekm_cfsum'] / val['distancekm_cfnum'],
    'maxkm'     : val['distancekm_cfmax'],
    'minkm'     : val['distancekm_cfmin']
  };
},
out: "s2mr",
verbose: true
});</pre>
<p>And here&#8217;s the output when I run that against my MongoDB collection:</p>
<pre class="coldfusion">{ "_id" : null, "value" : {
    "totalkm" : 451.6752000000001,
    "workouts" : 54,
    "songcount" : 53,
    "avgkm" : 8.364355555555557,
    "maxkm" : 19.7502,
    "minkm" : 0.0194
} }
{ "_id" : "Distance", "value" : {
    "totalkm" : 304.76879999999994,
    "workouts" : 27,
    "songcount" : 27,
    "avgkm" : 11.287733333333332,
    "maxkm" : 26.2581,
    "minkm" : 4.0486
} }
{ "_id" : "Time", "value" : {
    "totalkm" : 19.221,
    "workouts" : 2,
    "songcount" : 2,
    "avgkm" : 9.6105,
    "maxkm" : 9.9224,
    "minkm" : 9.2986
} }</pre>
<p>w00t!</p>
<p>I&#8217;ve left the code modular enough that I can make a CouchDB version <em>almost</em> as easily.</p>
<p><em>Update:</em> Better yet, I can now get all the way to an actual query data type:</p>
<table style="border-color: #884488 ; border-collapse: collapse; border-color: #884488;" border="2">
<tr>
<th style="background-color: #aa66aa;" colspan="8">query</th>
</tr>
<tr bgcolor="eeaaaa" >
<td style="background-color: #ffddff ;">&nbsp;</td>
<td style="background-color: #ffddff ;">AVGKM</td>
<td style="background-color: #ffddff ;">MAXKM</td>
<td style="background-color: #ffddff ;">MINKM</td>
<td style="background-color: #ffddff ;">SONGCOUNT</td>
<td style="background-color: #ffddff ;">TOTALKM</td>
<td style="background-color: #ffddff ;">WORKOUTS</td>
<td style="background-color: #ffddff ;">_ID</td>
</tr>
<tr >
<td style="background-color: #ffddff ;">1</td>
<td valign="top"> 8.3644 </td>
<td valign="top">19.7502 </td>
<td valign="top">0.0194 </td>
<td valign="top">53 </td>
<td valign="top">451.6752 </td>
<td valign="top">54 </td>
<td valign="top">[empty string] </td>
</tr>
<tr >
<td style="background-color: #ffddff ;">2</td>
<td valign="top">11.2877 </td>
<td valign="top">26.2581 </td>
<td valign="top">4.0486 </td>
<td valign="top">27 </td>
<td valign="top">304.7688 </td>
<td valign="top">27 </td>
<td valign="top">Distance </td>
</tr>
<tr >
<td style="background-color: #ffddff ;">3</td>
<td valign="top">9.6105 </td>
<td valign="top">9.9224 </td>
<td valign="top">9.2986 </td>
<td valign="top">2 </td>
<td valign="top">19.221 </td>
<td valign="top">2 </td>
<td valign="top">Time </td>
</tr>
</table>
<p>(Note for potential cynics: no, I&#8217;m <em>not</em> missing the point.  This is <em>not</em> meant as a production tool&mdash;it&#8217;s a learning tool.  If my students can start with SQL and see the end result in MapReduce, then they have that much better chance of grokking all of it.)</p>
]]></content:encoded>
			<wfw:commentRss>http://rickosborne.org/blog/2010/02/yes-virginia-thats-automated-sql-to-mongodb-mapreduce/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SQL or NoSQL?</title>
		<link>http://rickosborne.org/blog/2010/02/sql-or-nosql/</link>
		<comments>http://rickosborne.org/blog/2010/02/sql-or-nosql/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 00:27:06 +0000</pubDate>
		<dc:creator>Rick Osborne</dc:creator>
				<category><![CDATA[SQL]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[nosql]]></category>
		<guid isPermaLink="false">http://rickosborne.org/blog/?p=1127</guid>
		<description><![CDATA[I&#8217;ve had some (relatively) free time over the last few weeks to dig down and get into some non-relational database alternatives, including Apache CouchDB and MongoDB. These databases fall into what is being referred to as the NoSQL movement: an attempt to get away from the strictly-typed table-column-row organization of relational databases, and on to [...]]]></description>
			<content:encoded><![CDATA[<p align="center"><img src="http://rickosborne.org/blog/wp-content/uploads/2010/02/which-db-mysql-couch-mongo.png" title="SQL or noSQL?" width="480" height="230" /></p>
<p>I&#8217;ve had some (relatively) free time over the last few weeks to dig down and get into some non-relational database alternatives, including <a href="http://couchdb.apache.org/">Apache CouchDB</a> and <a href="http://www.mongodb.org/">MongoDB</a>.  These databases fall into what is being referred to as the <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> movement: an attempt to get away from the strictly-typed table-column-row organization of relational databases, and on to databases where documents are stored in their entirety instead of broken into normalized chunks.</p>
<p>I documented <a href="/blog/index.php/2010/02/08/playing-around-with-mongodb-and-mapreduce-functions/">my first experiences with MongoDB</a> last week.  I&#8217;ve been playing around with CouchDB over the last few days, which has turned out to be <em>very</em> different than MongoDB, despite their similarities on paper.  When compared to a relational database such as mySQL, each system has specific use cases and pros and cons.</p>
<h3>mySQL and other Relational Databases</h3>
<p>In the web world, relational databases are the <em>de facto</em> go-to solution.  It&#8217;s nigh impossible to find a web developer that can&#8217;t string together at least a <kbd>SELECT</kbd> statement.  The fact that there&#8217;s a bit of an impedance mismatch between a relational system and the software objects increasingly used to manage the data in that system doesn&#8217;t do much to slow anyone down.</p>
<p>ColdFusion is a great example here: the ORM, Object Relational Management, built into it shows just how ingrained relational databases are in the web world.  The ColdFusion community isn&#8217;t the only one to move in this direction&mdash;the .NET community is increasingly moving to <a href="http://en.wikipedia.org/wiki/Language_Integrated_Query">LINQ</a>, a SQL-like extension to put a sort of relational spin on variable scopes.  (Think Query-of-Queries on crack.)</p>
<p>Relational database systems are a known quantity: there are proven solutions for nearly every major problem a web developer would run into.  But, they aren&#8217;t infallible, and they aren&#8217;t a perfect solution to every problem.</p>
<h3>Document-Oriented Databases and noSQL</h3>
<p>A document-oriented database is just what it sounds like: a database of entire documents.  A document-oriented database for a blog, for example, wouldn&#8217;t split up the posts and comments and tags into separate tables.  Instead, absolutely everything needed to render that blog post would be in a single document, like so:</p>
<pre class="javascript">{
    _id:      "sql-or-nosql",
    created:  "2010/02/14 17:00:00 +0000",
    author:   "Rick O",
    tags:     [ "sql", "nosql", "mongodb", "couchdb" ],
    body:     "...",
    comments: {
        "2010/02/14 17:31:45 +0000": {
            name:    "Ben Camden",
            email:   "bc@cfgurus.info",
            comment: "..."
        }
    }
}</pre>
<p>The infographic I made <a href="/blog/index.php/2010/02/09/infographic-migrating-from-sql-to-mapreduce-with-mongodb/">relating SQL to the MapReduce functionality in MongoDB</a> goes into more depth, as does my previous post on MongoDB, but the gist is that you&#8217;ll use MapReduce functions to get the data out.  You&#8217;ll have one map function for your Recent Posts page, one for your Posts By Tag page, etc.  The syntax for MongoDB is a little different than that for CouchDB, so here&#8217;s a CouchDB example:</p>
<pre class="javascript">/* for recent posts ... */
function (doc) {
    /* use the post date as the key */
    emit(doc.created, doc);
}
/* for posts by tag ... */
function (doc) {
    /* return a copy of each post for each tag */
    doc.tags.forEach(function (tag) {
        /* use the tag then date for the key */
        emit([ tag, doc.created ], doc);
    });
}</pre>
<p>This MapReduce functionality is powerful stuff.  You can do some interesting things with it, and it makes parallelizing your query dirt simple.  Combined with the denormalized approach used in a document-oriented database, where you can fetch an entire page&#8217;s data with a single query, it&#8217;s not hard to see why more and more large sites (like Facebook and Amazon) are going with similar systems.</p>
<p>But, like relational databases, documented-oriented databases and MapReduce aren&#8217;t appropriate for every situation.  And, again like relational databases, different software packages offer different solutions and techniques.  CouchDB and MongoDB use JSON and JavaScript in similar ways, but to different effect.</p>
<p>Reduce functions, for example, use the same function signature and are almost perfectly interchangeable between the two.  Map functions, however, look similar but have very different implications between the two.  The pre-filtering provided by the <kbd>query</kbd> attribute in MongoDB doesn&#8217;t have a direct counterpart in CouchDB&mdash;that kind of logic would need to be done procedurally in the CouchDB map function.</p>
<p>CouchDB&#8217;s approach also doesn&#8217;t allow for post-filtering of aggregated values equivalent to SQL&#8217;s <kbd>HAVING</kbd> clause, or MongoDB&#8217;s <kbd>finalize</kbd> function&mdash;that functionality would need to happen in the client.  CouchDB also builds an index for each MapReduce query, like a materialized view or a temporary table with the cached results of the table.</p>
<p>This indexing stage leads to a sizable performance implication: the first time a query is run it may take an extremely long time to build this index, but every query after that is lightning fast.  For example, on a 300MB database with 300,000 documents and a relatively simple query, the first run took over 20 minutes to index but only milliseconds to run thereafter&mdash;even when the underlying data values were changed.</p>
<p>This compromise means that there&#8217;s effectively no such thing as an <em>ad hoc</em> query against CouchDB.  MongoDB doesn&#8217;t make this trade-off: it doesn&#8217;t materialize queries.  On a similarly-sized database, the first query took 1200ms, with additional runs of the same query only shaving this down to 1100ms.</p>
<p>While CouchDB would be wicked fast for databases that are always queried the same way, it would be almost unusable in a data mining environment where the slicing and dicing could be different every time.  MongoDB may never be able to touch CouchDB&#8217;s performance for repeated queries, but it has the flexibility to replace a relational database in a wider range of scenarios, including data mining.</p>
<h3>Next Steps</h3>
<p>I&#8217;m certainly not the first to poke around with ColdFusion and NoSQL databases.  Bill Shelton has an excellent series of <a href="http://blog.mxunit.org/2009/10/look-ma-no-sql-mongodb-and-coldfusion.html">posts on using MongoDB with ColdFusion</a>.  Russ Spivey has some <a href="http://couchdb.riaforge.org/">CouchDB wrapper code for ColdFusion</a> up on RIAForge (and <a href="http://cfruss.blogspot.com/2009/07/about-couchdb-for-coldfusion.html">related blog posts</a>).  Matt Woodward has a <a href="http://blog.mattwoodward.com/massive-couchdb-brain-dump">massive brain dump</a> about CouchDB and NoSQL integration with ColdFusion in general.</p>
<p>I think it would be beneficial to throw together a few example applications using ColdFusion and CouchDB and MongoDB.  Maybe a <a href="http://code.google.com/p/litepost/">LitePost</a> implementation against each?  I&#8217;ll see if I can get one done next week.</p>
]]></content:encoded>
			<wfw:commentRss>http://rickosborne.org/blog/2010/02/sql-or-nosql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Year in review: 2008</title>
		<link>http://rickosborne.org/blog/2008/12/year-in-review-2008/</link>
		<comments>http://rickosborne.org/blog/2008/12/year-in-review-2008/#comments</comments>
		<pubDate>Wed, 24 Dec 2008 15:32:05 +0000</pubDate>
		<dc:creator>Rick Osborne</dc:creator>
				<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[Fitness]]></category>
		<category><![CDATA[Life]]></category>
		<category><![CDATA[Random]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[2008]]></category>
		<category><![CDATA[jogging]]></category>
		<category><![CDATA[School]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[trust network]]></category>
		<guid isPermaLink="false">http://rickosborne.org/blog/?p=385</guid>
		<description><![CDATA[I&#8217;m going to be almost entirely off the &#8216;net until the new year, so this is a little early. Accomplishments in 2008: I got 3 more semesters of school under my belt. I have 1 left until I get my BS in Info Sys Tech, making it 14 years to get a 4-year degree. I [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m going to be almost entirely off the &#8216;net until the new year, so this is a little early.</p>
<p>Accomplishments in 2008:</p>
<ul>
<li>
<p>I got 3 more semesters of school under my belt.  I have 1 left until I get my BS in Info Sys Tech, making it 14 years to get a 4-year degree.</p>
</li>
<li>
<p>I did generally interesting things in code.  I find that I have the most fun when I get to implement algorithms in CF and SQL, such as my work on <a href="http://rickosborne.org/blog/index.php/2008/11/01/quine-mccluskey-in-mostly-sql-with-cf-qoq/">QMC in SQL</a> or with <a href="http://rickosborne.org/blog/index.php/2008/01/31/generating-scalable-stretchy-and-smart-graphics-with-coldfusion-part-4/">SVG images in CF</a>.</p>
</li>
<li>
<p>I jogged more than I ever thought I could.  Officially, <a href="http://rickosborne.org/blog/index.php/2008/03/08/jacksonville-gate-river-run-15k/">I completed a 10K</a>.  Unofficially, <a href="http://rickosborne.org/blog/index.php/2008/02/23/half-marathon-run-for-saturday-2008-02-23/">I&#8217;ve done a half-marathon</a>.</p>
</li>
<li>
<p>I finished my book.  (Double yay!)</p>
</li>
<li>
<p>I&#8217;ve been much more aware of my eating and fitness habits. It&#8217;s been much harder to be good this year thanks to my hectic schedule, but I think I&#8217;ve done pretty well.</p>
</li>
<li>
<p>Survived a tumultuous year at work.</p>
</li>
</ul>
<p>Goals for 2009:</p>
<ul>
<li>
<p>Finish my BS-IST.</p>
</li>
<li>
<p>Start on my MS in Technology, and keep on track to have it completed by the end of Summer 2010.</p>
</li>
<li>
<p>Jog an official half-marathon.</p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://rickosborne.org/blog/2008/12/year-in-review-2008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

