6 Reasons Why Relational Database Will Be Superseded
In a conversation with Mike Hoskins, the CTO of Pervasive, we jointly agreed that relational databases, as they currently function will eventually be replaced. There are many reasons why. Here’s a list of some that came up in the conversation and some that I’ve added on for the purposes of this posting.
1. CRUD is crud. The fundamental activities of Create, Read, Update and Delete are fundamentally wrong-headed. Updates to data should never be allowed, because they destroy information and Deletes should never be allowed for the same reason. When data changes the database should just add another record and note (by any mechanism that works) the previous value for that record. If the database gets too large then data should be archived from it. The right set of activities is thus CRA, Create, Read, Archive. All databases I’m aware of implement a CRUD model. The closest to a CRA model is/was Interbase (now called FireBird, and available under open source) in that its underlying engine was designed to work in a CRA manner. Because of CRUD, relational databases have to do truly inconvenient things and, into the bargain, the audit trail of data is lost – or has to be tacked on as an afterthought in a peripheral and highly inefficient way.
2. The Relational Model was always a partial model. Relational database with its table-based normalization approach to data has always been a limited model. Certain data structures such as recursive structures (as in a part-explosion) or an ordered list or even a circular route simply could not be represented in a relational database without jerry-rigging the data, or the programs, or both, in some way. It wasn’t a show-stopper, but it wasn’t ideal. The problem of dynamic (i.e. changing) meta data was also a problem that only ever gave rise to jerry-rigged solutions. It would be nice to move to a more comprehensive model (such as, for example, is offered by object database products like Intersystems’ Caché, or Progress’ ObjectStore.)
3. The Logical/Physical separation of concerns was ignored. The Relational Model of Data was always a logical model. For those “data pools” to which it could be accurately applied, it was an excellent model that has proved to be highly successful and which has promoted a very easily understood query language; SQL. However, most relational database products implemented the relational model at the physical level as a set of btree files with multiple indexes. This works well enough for data held on spinning disks, but it imposes an artificial overhead onto managing the database. You need to have DBA’s who create and drop indexes for no reason other than getting the database to perform. There was no reason why more versatile physical engines could not have been built.
4. It’s over for spinning disks. All “old style” databases and relational databases assume that data comes from a spinning disk. That’s the wave of the past and you can wave goodbye to it. Those databases hark back to the time when memory was a very scarce resource. Nowadays memory costs about $15 per gigabyte, so there’s no real reason not to have a memory-based architecture for databases up to tens or even hundreds of gigabytes in size. You could hold the database in memory and replicate to a second site for back-up and disaster recovery – and the database would be a hundred times faster. Companies will start building products like this soon, if they are not doing so already. (Spinning disks are necessary for much larger databases and for archiving, of course).
5. Relational database never really did perform. Despite the vast amount of money invested in database benchmarks and the attendant publicity over the years, the simple truth is that relational database never really did perform well. All that happened was that Moore’s Law made hardware run faster and memory grew in size and ran faster and that made the same old engines appear to improve significantly. In some areas there were gains (caching strategies improved, and bit-mapped indexes – an idea that preceded relational database by many years – were added) but the underlying engines never changed much. They were always a little ponderous and they still are, it’s just that they ponder swiftly thanks to the silicon.
6. The big databases are too big. Once you get into the multi-terabyte and petabyte size databases – usually query-based data warehouse applications – you run into another problem with the old relational database engines. They are too slow for long queries that dive into big pools of data. They were never designed for big pools and they don’t perform well. You need purpose-designed parallel processing engines to hack those big databases.
Now, if you’re thinking to yourself “but those relational databases run most of the applications in the world and this means that, in time, they are going to be replaced” then you’re thinking what I’m thinking.















I couldn’t agree more. The truth is that if most people actually sat down and thought about what they use relational databases for, they would realize that they’ve been paying for a very very expense file storage system.
The truth is that the only thing you should use a relational database for is things that you need to be able to query. The problem is that very few solutions have been presented that allow you to easily separate out “storage” from “index+search”.
In our recent efforts, we’ve been looking a lot at the S3 + SimpleDB approach offered by Amazon Web Services. SimpleDB seems like the kind of no nonsense “index+search” system that most people actually need, whereas S3 can be used for any substantial record storage.
Please give us a rough timeframe for when you expect relational databases will eventully be “replaced”, so we can measure the accuracy of this prediction. People who predict these massive sea changes almost never do.
This is no easy question to answer because old databases tend to live for a long time and what you are really asking (I think) is “when will new products become mainstream/dominant”. Unfortunately that depends on the market rather than the technology equation and I don’t know of any way to predict market behavior accurately However if you want to know the time frame for parallel and in-memory (or a combination of the two) to get traction, then that will happen in the next 6 years. It’s happening now in niche areas.
I can pretty much guarantee that it will happen because of the technical direction of chips (to multicore) and the decline in memory costs. The real trick would be to be able to say how it will happen – in terms of which products will get traction – but it’s too early to say for that.
I think relational databases are like COBOL. they will never go away.
Very nice artice! Thanks for sharing!
Oops, it looks like I forgot to end my blockquote tag.
I’ll try again:
So now you’re backing off. Instead of saying the RDBMS will be replaced in the foreseeable future, you’re only saying that an alternative will “get traction” within 6 years. So what? A lot of technologies will “get traction” within the next 6 years.
Ok, so what will replace RDBMSes? What other technology has the ability or the potential to process transactions more quickly and answer queries with as much speed and flexibility (Ok – OLAP systems win here, but they are almost always used in conjunction with relational data warehouses).
It’s tiresome to hear complaints against RDBMSes without hearing any better ideas.