SQL or NoSQL, You Choose
An interesting debate is brewing about SQL databases and NoSQL database. So what are these things?
A SQL database is a relational database. SQL, the set based query processing language, pretty much defines what a relational database is: a collection of data held as 2 dimensional tables. It’s not that relational databases don’t allow other structures for data – the object relational databases, for example, are far more versatile in the structures you can store. But with such databases you have to extend SQL and when you do, it ceases to be a simple set-based language for getting at data.
And that’s where the NoSQL database enters in. The NoSQL term was coined by a Rackspace employee, Eric Evans. The name and concept generated a bit of a buzz.
NoSQL term is supposed to stand for “Not Only SQL.” However, in practice, it means not SQL-oriented, because SQL doesn’t define what the database is, so whether it has a SQL interface is irrelevant. Examples of such databases include: Apache Jackrabbit, Amazon’s Dynamo, Cassandra, CouchDB, Chordless, Google BigTable, HBase, MongoDB, Neo4j, Redis, Riak, Scalaris, Tokyo Cabinet, hamsterdb and VertexDB.
Why NoSQL is a Crock
The term NoSQL is really useful because it sets off an important debate on the limitations of Relational Database (see 6 Reasons Why Relational Database Will Be Superseded) but it aint never gonna be a reasonable classification of database because all it says is “none of the above”. Some of the “NoSQL databases” are for accessing large data heaps, some are for cheap indexing (that does force you to do an oh-so-expensive-in-processing-power SQL join), some are for storing documents and so on.
Try defining a ball point pen by what it isn’t. Well it’s not a fish and you can’t watch video on it and it has no nutritious value and its useless for accessing the Internet and it’s not made of uranium and it doesn’t make the sun shine. So what is it? It’s an airplane propeller, isn’t it?
The list of databases I’ve provided have been classified on Wikipedia as NoSQL. But they could be extended to include all the Codacyl databases and the in-memory databases and the XML databases and the Object-Relational databases and so on.
But there’s an important point here.
Accessing Data Is Nine Points of the Database
What makes a database any different from a file system?
Not much. A database is a coherent heap of persistent data. A file system is the same but lacks coherence, so you have multiple interfaces for accessing the files in the file system. The persistence is important, of course, because that means that the data can be reused – so databases also tend to have capabilities like logging, replication, etc. built in, to guarantee data persistence in the event of failure. But the key point is data access.
SQL is a comprehensive access language only for data held in tables. The “relational theory”, that SQL emerged from, is really set theory applied to sets of data, with the odd nuance thrown in. It’s viable for data held in tables, but semantically, it’s flat out inadequate.
Let me illustrate this with a simple very common example. The relational model would say that there is a one to many relationship between an ORDER-HEADER and an ORDER-LINE, there being one or more ORDER-LINEs for each ORDER-HEADER. It would also say that there is a one-to-many relationship between PRODUCT and ORDER-LINE, there being multiple ORDER-LINES for every PRODUCT. But nothing in relational theory semantically states that an ORDER-LINE is part of an order and is not part of a product. There is no semantics of composition in relational theory. None at all.
You can usually get around that inconvenience, but in some situations, such as the fairly common situation of a parts explosion, the way round generates work for everyone, the programmer, the database designer and the user.
The relational theory of data is either incomplete or horribly wrong. The NoSQL debate may help by drawing attention to that, but I suspect it won’t. In the NoSQL debate there is lots of criticism of relational database for not scaling – which in reality is criticism of relational products. There are relational database implementations that do scale well because of how they physically manage the data and access to that data.
There is very little criticism of the semantics of SQL, but that’s where the real problem is. If the semantics of SQL were any good then every new product that came to market would hold its data in a relational database. But that just doesn’t happen. Just about every new application that we see builds it own file system – and that’s because the semantics of data access in SQL are hopelessly inadequate for most data.
The fault is in the semantics of composition. It simply isn’t there in SQL.



















