NoSQL, schemeless, open source, “Not Only SQL” databases have been around for quite some time. The NoSQL term was originally used by Carlo Strozzi to describe a light weight open-source relational database that didn’t adhere to a standard SQL programming interface. This technology is not anything new. NoSQL data stores existed before relational database theory was even published. I look at it as history repeating itself.
This flavor of NoSQL data store is different what is being described as NoSQL databases in the present. In the age of Twitter, Eric Evans reintroduced the term to help organize and promote an event around open-source distributed database. These happen to be non-relational, data stores that are very different from relational data stores.
It Isn’t Exactly a Database
It is better described as a structured data store. The storage is organized more like a file directory instead of a Excel workbook. RDMS inherently benefits from normalization, causing an entity’s data to get spread across multiple tables and further complicating horizontal scaling. NoSQL data stores are meant to be simple. This simplicity is evident through its features and the lack of features.
With NoSQL, Atomic transaction are inherent when writing a value/object/document. There since object is scope within itself, data writes occur on the object level. For consistency, NoSQL data store fails to provide this feature in favor of simplicity. Requesting data from a NoSQL data store is not guaranteed up-to date. There are versioning techniques that can help reject bad object writes but since objects exists on their own data is not guaranteed to be up-to date.
The Highlander Rule
There can be only one. This is true when it comes to relational databases in the respect data needs to exist or act like it exists in one place. The implementation expects scaling to occur vertically (more power on one system). Scaling a RDMS horizontally (data sharding or partitioning across multiple systems) can be very difficult. With difficulty comes cost and complexity (which adds more cost).
This is the problem Google and Amazon, to name a few companies, were experiencing with scaling RDMS. Even with companies having the resources to perform horizontal DB scaling it was not adequate and not financially feasible to do so. These relational data store implementations are just not designed to live on multiple systems to handle such large amounts of data. It was more feasible to design a new data store then to attempt to scale a relational database and that is what they did.
When Scalability, Performance and High Availability Are Paramount
Companies like Google (BigTable), LinkedIn (SenseiDB), Amazon (SimpleDB) and Facebook(Cassandra) have unique requirements when it comes to their data stores and they all have built NoSQL databases to support high volume processes. Today, most companies can do just fine with implementing a relational data store for their Line of Business applications. Tomorrow will undoubtedly be a different story.
Scheme-less is the New Trend
These NoSQL databases fall into several categories:
- Key–value Stores
- Very simple non relational data store. These implementations can be found where blob storage is used. Very simple, very fast data store. Google’s BigTable database is an example of a KV store.
- Document Store
- This is a little bit more then just a PDF or Word document storage. These repositories store objects or data representations without adhering to a scheme. MongoDB is an example of a document data store. It stores its documents as a binary JSON (BSON) and allows for defining relationship across documents without using constraints or integrity rules. I like to think these repositories are great for tightly scoped applications such as a blogging engine or eCommerce application where UI object can be persisted without the need to organize into a relational store.
- Graph Database
- These data stores are oriented to handle relationship definitions between data. Graph databases are useful when storing social relationships and travel maps/stops, to name a few.
Simplicity is the Sign of a Good Design
Concurrency control, transactions, availability and other ACID considerations when dealing with data validity are valid concerns. With NoSQL these concerns are typically an after thought. Since the data is stored in aggregates, changes are usually done on a single record not requiring a transaction across multiple entities.
These stores are great if you don’t what to define a schema. A lot of back end API mobile development has leveraged a NoSQL database to persist user profiles, game accounts and other simple entities. This leaves developers with one less task, designing and scripting out the schema of a database. This is, in my opinion, is not the best reason to use a NoSQL data store but if it solving a problem then to each his own.
There are a lot of the downsides. Let me take a moment to run through a few of them:
Querying is not a first class citizen. Again, I will reiterate, the NoSQL is not a Relational data store. So your data hasn’t no structure to easily define queries against.
Indexing is Limited Some solutions have indexing but is it more along the lines of value promotion or metadata tags assigned to the objects when writing them to the data store.
Durability is sacrificed for scalability.
NoSQL is an Immature Technology. You can probably get that feeling just looking over the number of NoSQL solution list on the Wiki Site. No standard API/Interfaces and lack of administration tools on most flavors of NoSQL.
Vendor Lock. Moving data from on type of store to another could be very difficult. Also, since there is no standard API you will have to re-write an entire implementation to support another NoSQL solution.
Whats next for NoSQL?
I would say it isn’t going anywhere. Companies like Amadeus are making big investments in NoSQL’s Couchbase data store. It is a very early in the game for most companies to make a commitment to adopt the technology as an enterprise solution. However, I do see these data stores becoming a good compliment to an organizations big data strategy.