More on Non-relational (aka, NoSQL) Databases

July 29, 2011

Non-relational databases seem to be getting all the attention these days (apparently we’ve moved on from HTML5). If you’re not familiar with the concept of a non-relational database, they are sometimes called NoSQL databases, but that’s a misnomer, as these databases do use SQL. What a non-relational database does not do is organize its data in related tables (i.e., data is stored in a non-normalized way). For example, in a conventional database, a blog might have one table that stores posts and another table that stores comments. A JOIN is then required to pull out all the comments along with a particular post. With a non-relational database, one “collection” (the non-relational version of a table) would store all of the posts. Each comment associated with a post would be stored as part of that post’s record within the collection. This means that one record (or “document”, in non-relational terms) might contain just the post and no comments, another record might contain a much longer post and hundreds of comments. The benefit is that when you go to retrieve an individual post, you’re automatically retrieving all the associated information (e.g., the comments for that post). Non-relational databases really are the opposite of a normalized database, taking more of an object-oriented approach to data storage and retrieval. The hardest thing about using a non-relational database may be going against years of relational database training.

There are a number of non-relational databases out there; the key ones I’ve come across are MongoDB (I’ve written more about MongoDB separately), Neo4j, Cassandra, and CouchDB (Cassandra and CouchDB are both Apache projects, which is a always a good thing). The benefits of using non-relational databases are most important for sites with a lot of activity and data. And by “a lot”, I mean on a Facebook or near-Facebook level. I believe non-relational databases should be on the Web developer’s radar, but I’m not suggesting we should all jump ship on relational databases just yet; most sites don’t have the performance issues that non-relational database solve and do have the need for data integrity that relational databases offer.

The creators of CouchDB and SQLite have just announced a new query language called UnSQL, for “Unstructured Data Query Language”, pronounced “Uncle”. The hope is that UnQL can do for non-relational databases what SQL did for relational ones: provide a common way of interacting with the underlying database, without too much regard for the database application itself. The project is also being supported by Microsoft, for whatever that’s worth.

For a long discussion of non-relational databases, check out this article at InfoWorld. I also benefited from the posts at Sean Coates’s blog, discussing the initial adoption of CouchDB, and the switch to MongoDB on a project.