Is an Enterprise Data Warehouse Still Required for Business Intelligence?
By Colin White, BI Research
The number of business intelligence (BI) solutions appearing in the marketplace is steadily increasing. Most of these solutions still employ the services of a traditional enterprise data warehouse, but an increasing number do not. In some operational BI applications, for example, event data volumes and/or the need for fast action times may prevent the data from being persisted in a data warehouse before it is analyzed. These latter applications do not replace the enterprise data warehouse, but often work in conjunction with it – the results from the analytical processing may be stored in the enterprise data warehouse, for example.
As the BI industry evolves, so too does the role of the traditional enterprise data warehouse. The title of this article is deliberately provocative. My objective is to encourage BI professionals to consider the role of the data warehouse in new BI projects. In the past, the enterprise data warehouse has been the cornerstone for such projects, but I believe in many situations this is no longer true.
A Historical Perspective
Essentially there are three main types of IT processing involved in running the business: business transaction processing, business intelligence processing, and business collaboration processing. Business transaction applications run day-to-day business operations, while BI applications analyze those operations with the objective of optimizing and improving them. Collaboration systems enable business users to share information and expertise about business operations.
Prior to the introduction of the concept of business intelligence, most companies analyzed their business operations using decision support applications that queried and reported directly on data stored in business transaction databases. There were several problems with this approach. The five key ones are: 1) the data was not usually in a suitable form for reporting, 2) the data often had quality issues, 3) decision support processing degraded business transaction performance, 4) data was often dispersed across many different systems, and 5) there was a general lack of historical information.
Data warehousing was introduced to help solve these data and performance issues. While there is no question that data warehousing helped improve business decision making, it is important to realize, nevertheless, that it was introduced primarily to solve design issues in business transaction systems and also for performance reasons.
The emergence of BI and business performance management (BPM) applications and tools further enhanced business decision making by giving business users simpler interfaces, improved data analysis features, and the ability to compare actual and planned performance. Although BI and BPM applications typically process data in a data warehouse, this is only because of the five issues outlined earlier concerning direct access business transaction data. If these issues could be resolved then there would be no need for a data warehouse.
Traditional Business Intelligence
Traditionally business intelligence has been used for many years for strategic and tactical decision making. This type of processing involves intensive analytical processing of historical and summarized data managed in an enterprise data warehouse. Data performance issues caused by centralizing data in an enterprise data warehouse have led to the creation of data marts, which solve performance problems by spreading the BI processing across multiple data stores.
The problem with data marts is that organizations often build them directly from business transaction databases, rather than the enterprise data warehouse. This is because it is often quicker and easier to build a data mart than to incorporate additional data into the enterprise data warehouse and then build the data mart from the data warehouse. Another problem is that many organizations have more than one “enterprise” data warehouse. Multiple disconnected data warehouses and data marts leads to data consistency issues, which data warehousing was supposed to solve in the first place.
There are many articles written about the problems of building dozens of independent data marts directly from operational data, and I will not discuss these problems here. It is worth pointing out, however, that a data mart solves most of the same issues addressed by an enterprise data warehouse with the exception that decisions sourced from multiple data marts may be inconsistent in the same way that decisions based on multiple business transaction databases can be inconsistent. This multi-source issue could be mitigated by first integrating the business transaction data, for example, into an operational data store or master data store. The issue of historical data would have to be addressed, but this is solvable.
Historical Data and Current Data
The distinction between current data and historical data should be easy to define, but it is not. Data in a business transaction data store is usually current, while data in a data warehouse is usually considered to be historical. The issue here is, “What is meant by current and historical?” Let’s look at an example.
If I have multiple telephone accounts with a telephone company, then (in simplistic terms) the company will have a record showing my customer data and a record for each of my accounts listing the telephone number, account balance, billing data, and so forth. When I make a telephone call, send a text message, or access the Internet, I create a call detail record (CDR) that can be collected with other CDRs to analyze customer calling patterns, detect fraud, etc.
At any given moment in time the customer and account records will show the latest information about my current status. This can be considered to be current data. As this data is updated with new address and account data, the old data may be captured into a data warehouse for analysis purposes. The capturing process may be done at particular moments in a time (a snapshot) or continuously, depending on how the data will be analyzed. Regardless, the data in the data warehouse is historical.
The CDR data is a different situation. In general, once I make a phone call, the CDR for that call never changes. When a telephone company captures the CDR into a data store for analysis, is the data current or historical data? The answer is it is current data, because, even though the data ages over time, it is always the current version of the data. What name do we give the CDR data store? I suspect most people would call it a data mart. Is this the correct term? Does it really matter what we call it, other than the fact some people have the need to give things labels? Regardless, we most probably don’t want to keep the CDR data in an enterprise data warehouse. It is useful, however, to keep a historical record of the CDR for analysis results because this shows trends and patterns over time.
If the CDR information is used to detect fraud, then the quicker the analysis can be done the faster fraud can be detected. The BI application can process the data in flight as it flows through the system or can analyze it in a persistent data store that is updated continuously with the CDR data. Data mining of past CDR records can help set the business rules up for doing this detection. This fraud detection application is a good example of an operational BI application.
Operational Business Intelligence
There are a growing number of operational business applications similar to those described above for CDR analysis. The business benefit of these applications is that they can help companies become more agile by analyzing data during intra-day operations. The ultimate example of this type of processing is algorithmic trading, which employs event analytics to optimize trading operations. The response time of these analyses is a fraction of a second. In this type of processing, it is not feasible, or even necessary, to store the huge volumes of data involved in a data warehouse. The results of BI processing, however, may be kept for future use.
The model for many operational BI applications is capture data, analyze data, persist results, i.e., analyze data before persisting it. This is different from the traditional BI model of capture data, persist data, analyze data, persist results. BI applications that analyze web traffic and business activity on commercial web sites to track buying trends, optimize prices, etc. is another example of operational BI. Although these applications analyze data in flight, they may also use historical data in a data warehouse to assist in the analysis.
Why is this discussion important? The main reason is at present business intelligence is synonymous with data warehousing. This thinking is wrong and needs to be changed. Data warehousing is a component of BI, but BI may employ data in other data stores. In some cases a BI application may not even use data managed in a data warehouse. The tight connection between BI and data warehousing is causing terms such as virtual data warehousing to be used to describe other types of BI processing. These terms are unnecessary and just confuse everybody.
Another issue is that people have forgotten that data warehousing was created to overcome deficiencies in business transaction systems. Many of these issues are now solvable. My concern is that data warehousing has become a system in its own right and companies are now extending the data warehouse into other application areas such as master data management and content management. This is completely the wrong direction and must be argued against.
The bottom line is that data warehousing is still an important component of business intelligence, but it is no longer the foundation on which all BI projects have to be built.




















You have hit on what I would contend is the heart of intelligence, analysis. Without analysis, we simply have data drawn together and possibly massaged a bit via an ETL process.
Whether one needs a DW could depend on performance requirements. If you persist the data before the analysis, then you have developed a “cache” for some processed data. If you can use the persistent data again to save analysis time then you have improved the timeliness of the system.
Great Article Colin. As IT professionals (or other) we always need to maintain our questioning attitude and be willing to change with the time. A blog of mine from a few months ago titled “You Already have Business Intelligence” challenges traditional thoughts as well http://www.theoutspokendataguy.com/2010/09/you-already-have-business-intelligence.html
Excellent. EDW has been misused by marketing departments of technology vendors to be a synonym of a “huge RDBMS used for BI”. Data volume becomes then the main parameter. As you point out: performance is one aspect (of EDW) for which there are alternatives today. If performance has been your principal motivation to set up an EDW (implying that other stuff like data quality, history etc. are not relevant) then you might want to look into those alternatives. Actually, seems to be in line on what SAP is doing – http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/21575.
It should not have been a surprise as I was sucked in by the sound content and flow of this piece, to then realize who had written it — the same man upon which we’d based our early architectures for data warehousing.
That same implementation, now a decade and a half removed, even then challenged the concept of either historical or strategic — and it was telecom. The biggest distinction was that it was informational vs. operational — but our implementation was clearly tactical vs. strategic and needed to be as near real-time as we could push it — we used it to evaluate the data (based on algorithms honed from data mining) and more appropriately segment call center leads for highest-value use — given that regulation prevented calling a household more than once in a 3 month period, the goal was to call them for the ‘right’ (highest value) campaign.
What we call it all is less important than what the terms help us accomplish in reaching a common understanding — so I am in total agreement with your observations and grateful for them (given that I no longer spin in these circles). Even as we were in the throes of ‘making up’ what data warehousing was, we were challenging its precepts. Thanks for continuing that tradition.
hello,
thanks for the article.
At first, in my opinion, there is no “extra intelligence” in BI, it’s just about information reporting and analysis… We start talking about intelligence only when we do some advanced data Mining. So in your processes classification, i think it’s more suitable to talk about Business Information process instead of Business Intelligence Process.
Now back to the needs for data warehousing. I’am a bit sceptical that we do not need EDW any more. Please give us alternatives to Data warehousing concerning the following five key points
1) the data was not usually in a suitable form for reporting,
2) the data often had quality issues,
3) decision support processing degraded business transaction performance, 4) data was often dispersed across many different systems, and
5) there was a general lack of historical information.
thanks a lot.
You definitely have a point in your conclusion that in many cases there is no need for an expensive Data Warehouse. Maybe a simple Data Collection System is more appropriate.
My own 10-year experience with our DW is that we don’t use it for cross-examinations (as was the plan in the beginning), we simply analyze each branch for itself, we don’t change the information for quality issues in the DW just report back to the Operational Systems. Conclusion is, our DW give us sufficient data quality for report generation and querys although the costs for running the whole machinery is much to high.
Today we sure would have choosen a more inexpensive solution, SAS, SPSS, Stata etc, to do analysis on Data Set’s from the Operational Systems.
Thanks for challenging the status quo yet again. My own recent experience has led me to question not so much the value of the EDW, but rather the fundamental design of the EDW schema. Let me elaborate. We can choose either a ‘Inmon’ 3NF DW, or a ‘Kimball’ dimensional DW. Kimball has shown that the dimensional DW is the set union of the datamarts that might otherwise be spun-off as separate caches. These days we can create the impression of datamarts without creating and populating separate MDDBs. I recently did this using Oracle 11g and OBIEE – Oracle 11g cube-organised materialised views and query-rewrite in particular. OBIEE furnished the multi-dimensional perception, and 11g query-rewrite provided the performance. Our new data warehouse superseded an Inmon 3NF DW that had been running for 6 years. The Inmon NF database required substantial expertise to query successfully – the joys of timestamps in primary keys! This had led to the need to employ literally hundreds of ‘data analysts’ – every manager had hired their own ‘data guy/girl’ in order to meet their management information needs. By structuring the DW using dimensional structures, managers were able to access the DW data directly. Massive business benefits followed.
The crucial difference hinges on who gets to use the DW directly. Consequently I think the Inmon 3NF is now obsolete. The Kimball dimensional DW contains the same depth and breadth of data, and can be used directly by the business community to create their own ad-hoc queries. The Inmon 3NF schema is expensive to create and own, harder to build, contains no more data than the dimensional DW and is a specialist-only database that cannot be used directly by business users for ad-hoc queries. The worst problem with Inmon 3NF is the creation of a ‘data priesthood’ that tends the database and ministers to supplicants, something we definitely need to get away from!
Regarding whether or not data is analysed then persisted or persisted then analysed, we need our tools to get to a point where it is a configuration question, not an architectural question. Given the ability of map-reduce techniques to process massive volumes of data in real-time, I feel we are almost there.