Looking Ahead: MongoDB

April 12, 2011

Someone, I forget who (sorry), referred me to MongoDB, some time back. I haven’t written about this yet, as I was having trouble “getting” MongoDB. This isn’t going to be an exhaustive introduction to MongoDB, but I want to explain what I finally “got”. And although that I’m still not sure MongoDB is right for me, today at least, it definitely seems to be something worth keeping an eye on going into the future.According to their Web site, MongoDB excels at high demand applications, with good caching, scalability,and replication. On the same page, the documentation says that MongoDB is “Less Well Suited” for “Problems requiring SQL”. This, of course, is where I get confused! In order to “get” MongoDB, you have to drop what you know, and believe, about how relational databases are organized and used. Because MongoDB is not a relational database.

MongoDB is an open-source document-oriented database. Instead of storing rows of data in tables, MongoDB stores data as Documents in Collections. Documents are represented using the BSON format, which is extremely similar to JSON (in fact, there’s a strong relationship between MongoDB and JavaScript).

Now this may still sound a lot like the DBMS you’re used to (probably MySQL), but there are big differences here. For starters, and this is a big one, a document-oriented database is not normalized. If you’re storing a Collection of books, every book is a Document in itself. And each Document includes every attribute of that particular book: author, page count, ISBN, publisher information, and so forth. If one book has multiple authors, all of that information goes within that Document. If you want to include the publisher’s information with a book, that goes in the Document, too. This is not a normalized database, where related information is spread out over multiple tables. For many people, including me, this is a pretty foreign concept.

Secondarily, each document in a collection can have different attributes. If one book record has a link to a corresponding Web site, that’s fine. And it’s fine if other documents don’t have that attribute. Crazy, eh?

When it comes time to retrieve stored data, a single “find” command can retrieve a single document, including all its properties, or every document, using any attribute of your choosing to limit the selection. Because the database is not relational, no JOINs are required.

If you’re still thinking that this is snake oil, an argument for using MongoDB can be found on their list of organizations that recently started using it: Shutterfly, Foursquare, bit.ly, Intuit, LexisNexis, SourceForge, GitHub, The New York Times, and more. Those are big names. Now, aside from SourceForge, most aren’t using MongoDB for their primary applications, but they are adopting it in one way or another. If you have the time, on that page are links to several articles and presentations by those companies as to why and how they began using MongoDB.

Now, the MongoDB documentation clearly states there are situations where it’s not an appropriate choice, most specifically applications, such as banking and perhaps e-commerce, that require transactions (as MongoDB does not support transactions). Personally, I always take it as a sign of credibility when companies (or people) admit to their own shortcomings. You would also want to think twice, or three times, before converting an existing RDBMS-based application to MongoDB. The fact that MongoDB does not support SQL means you’d have to change a lot of your existing code to make such a conversion.

Speaking of which, there are drivers for MongoDB in all the popular languages, including my personal favorite, PHP. The MongoDB driver is in PECL and is documented in the PHP manual.

If you want to learn more about MongoDB, I’d recommend the links found on list of organizations that recently started using MongoDB page, plus the following: