Is The Relational Database Doomed?
To be honest, that’s completely the wrong question to ask, but life being what it is, it is a question that is being asked. The reason it is being asked is because different kinds of data store are now popping up all over the place – but, truth to tell, they always did. Today I read an article on Read Write Web with the title Is Relational Database Doomed, which span a series of half-truths and misleading ideas about relational database. It then described, very usefully imho, a number of different not-exactly relational databases that have emerged in recent times.
On the second page of the article it proclaims that:
The relational data model is based on a “natural” representation of the data it contains, not on an application’s functionality.
This is so badly wrong and such a misleading statement that I have no option but to challenge it. If you really want to understand data and contribute to building better database products then here is a list of points to consider:
- When Implemented, the Relational Model of Data Delivers Poor Performance. The original relational database theory was based on the idea that it was possible for data to be represented independently of processing. This is wrong. There is no data anywhere that does not have context. As soon as you start applying this idea, you run into trouble. When I was doing database consultancy in the 1990s, I was once invited into a room where the design team had put a diagram on the wall of the data model intended for use in building a big system. It occupied twenty or so big sheets of paper. I studied it for a minute or two while they explained that it had been normalized to 5th Normal form and there were over 250 tables. They asked me what I thought and I told them instantly that it wasn’t going to work. I didn’t need to do much looking. It was pretty obvious that some calls for data would inevitably provoke large multi-way joins which kill performance.
- These Normalized Database Models are Completely Unnatural Representations of Data. I have not yet come across a software development project where a normalized a data model was implemented and it worked – except in cases of extreme triviality (databases with 3 or 4 tables) for simple applications. The upshot is that relational database is always designed for a context and hence always completely dependent on the applications it serves. This becomes completely obvious – in fact a joke – as soon as you look at data warehouses, where star schemas and snowflake schemas abound. With a star schema you compress the much-lauded-fully-normalized-application-independent-data-model into a handful of tables with one big “fact table” in the middle. Why? Because you are trying to provide a large number of users with a single query service and you have to fit the data to the constraints of that application. The so-called application independence doesn’t exist. It’s all unicorns and hens teeth.
- Logical and Physical. Let’s step back a little. When you see articles on relational database theory you quickly realize that in most cases the author gets the logical and physical layers totally confused. The first rule of good practice, that Computer Science students should be taught, is to separate logical from physical. It’s like this: logically we want to process information in some way. Physically we need to store the data on various devices. We therefore need a logical data model that can be used to process the data logically and a data model for use in physically storing the data. We also need a mapping between the two models. Most systems that use an object model of data (very common in java development) end up storing the data in a relational database. To make this work they normally employ the Hibernate product which delivers “a high performance object/relational persistence and query service.” Which means that it does the mapping from logical to physical.
- The Virtue of Relational Database. If you have a data domain that can be accurately represented or modeled as collection of unordered sets, then SQL with an RDBMS provides a wonderful set-based data access capability for that data. The popularity of relational database is based almost entirely on this – and that’s fine because, for many applications and systems, such a logical representation of data meets nearly all needs. Relational database was designed with this in mind and as such it represents an entirely valid and effective way to model such data. There is a very good reason to provide a logical layer of this kind for applications to use but you don’t need a relational database in order to do that.
- Relational Theory is NOT “Mathematically Correct” At the inception of the RDBMS revolution, its proponents used to proclaim, with no justification at all, that Relational Database was “mathematically correct.” Few of the people making this claim can have had any mathematical background. The actual truth is that relational theory derives primarily from the mathematics of set theory and predicate algebra. Normalization is a mathematical technique that you also find in Ring Theory, that can be used to create a representation that eliminates dependencies. From a practical perspective, the application of the second and third normalization will help a designer to avoid some data design errors. It is therefore true that Relational Theory employs sound mathematical techniques, just as it’s true that when mathematicians proved that bumblebees could not fly they employed sound mathematical techniques. The bumblebee problem was eventually solved by changing the mathematics of aerodynamics to a model which supported bumblebee flight. Ultimately, the term “mathematically correct” cannot apply to a theory, but can apply to the proof of a theorem. These are quite different things.
- Relational Databases Deliver Poor Performance. There are two parts to this problem. When RDBMS products first hit the scene their performance was really poor. This was primarily because of the confusing of logical and physical. RDBMS products tended to use only btrees with unsophisticated indexing to implement tables and they were not the best structures for every occasion. They tried to prove their worth in benchmarks, which were synthetic tests against other RDBMS, rather than comparison with other approaches to database. They became successful primarily because Unix had a poor file system and you needed a much better file system to use Unix effectively. RDBMS stood in for that file system and sold in large numbers. Over time RDBMS got better by adding more and more tuning features, but what made most difference was that cpus got faster, memory got faster, buses got faster and disk got faster. Even so RDBMS perform poorly. But they are good enough for most circumstances. Once you get to very large databases they blow out and products that are purpose-built for performance (Teradata, Netezza, GreenPlum, Vertica, etc.) take over. Most of these provide SQL interfaces but are not really relational databases. Object databases or hybrids like Caché outperform RDBMS in most real-world applications, sometimes by a long way.
- The Flaws in The Relational Data Model Part 1. In general the relational model cannot represent some data structures that are actually quite common. They are “outside the model.” These include; ordered lists, hierarchies, trees, text and a whole variety of complex objects (like a web page, for example.) As a very specific example of this, the relational model cannot store a parts explosion of unknown depth and access it via SQL. When you get awkward data structures that don’t fit the model you can usually fix the problem by storing the whole structure at the item level and accessing it with a stored procedure. This is what object/relational databases did. It’s a useful capability that completely violates relational theory. Yet again the data is stored in a completely application dependent way. You can define completely artificial tables that don’t correspond to a real data entity to get round some of these problems. And you are yet again application-dependent.
- The Flaws in The Relational Data Model Part 2. The Relational Model assumes that the data model is static or doesn’t cater for its change. Thus when you have arrived at a data structure that you think is right for a table there is nothing in the model to cater for the fact that it might change. This is a problem that you run into, for example, in clinical trials systems in the pharmaceuticals field. Medical measurements that need to be taken vary over the course of a trial so the “medical statistic” entity changes dynamically. You have to do something “irregular” to cater for that.
- The Flaws in The Relational Data Model Part 3. Ultimately the primary fault with relational database is that it offers no semantics beyond the simple semantics represented by the database schema. It can only provide semantics for items that can be represented as rows in a table. Beyond that it offers very little. Since we naturally use semantics to access data (get me the deeds of the house, find me that email from John, where’s my recording of that Beatles song, etc.) the lack of access semantics is where it runs out of steam. It doesn’t stretch beyond a simple schema.
- The Relational Model of Data Never Dominated Anyway. Estimates vary, but it is generally agreed that somewhere between 70% and 95% of the world’s data is stored only in poorly structured or unstructured formats such as: word processing documents, spreadsheets, HTML files and e-mail. The truth is that Relational database never did really dominate. It was rejected out of hand, year after year, as an effective store for many types of data.
Having said all that, we can now answer the question:
Is Relational database doomed?
Not really.
It is not any more doomed that the old network databases and hierarchical databases of the mainframe era that still store huge amounts of data. The only thing that is doomed is the idea that the relational model offers an effective general solution to the problem of storing data in a convenient and accessible form.
It never did and it never will do.















Um, Teradata, Greenplum, and Netezza are all very much relational databases.
Teradata is a full RDBMS with an optimizer. It offers full ANSI compliant transaction capabilities – always has. It is not a query-only product like a Red Brick.
The relational model is not a physical model (i.e it isn’t about b-trees and optimizers), it is a logical model:
“A data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data. The operators allow us to model its behavior.”
The relational model is explicitly independent of the physical layer.
As for Greenplum, the PostgreSQL ‘Object’ part of ‘Object-Relational DBMS’ is thin (inheritable tables) and hardly justification for claiming it as a non-relational database. And there is nothing magical about map reduce which renders databases that can do it necessarily non-relational, considering it is easily duplicated with materialzed views and triggers.
They are relational at the logical level, because they offer SQL interfaces, but many object databases do that too. They are not built with traditional RDBMS b-tree architectures that depend upon an optimizer, so they are not relational at the physical level. Greenplum is based on the Postgres object relational engine and supports MapReduce access as well as SQL. Also they are all query-based products built for query workloads. Relational databases offer full transactional capabilities.