A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.

IMPORTANT: Please read 101 Guide to Data Warehousing for a more elaborate discussion on this subject

This definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.

Data warehousing arises in an organisation's need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation.

The practical reality of most organisations is that their data infrastructure is made up by a collection of heterogeneous systems. For example, an organisation might have one system that handles customer-relationship, a system that handles employees, systems that handles sales data or production data, yet another system for finance and budgeting data etc. In practice, these systems are often poorly or not at all integrated and simple questions like: "How much time did sales person A spend on customer C, how much did we sell to Customer C, was customer C happy with the provided service, Did Customer C pay his bills" can be very hard to answer, even though the information is available "somewhere" in the different data systems.

Another problem is that ERP systems are designed to support relevant operations. For example, a finance system might keep track of every single stamp bought; When it was ordered, when it was delivered, when it was paid and the system might offer accounting principles (like double bookkeeping) that further complicates the data model. Such information is great for the person in charge of buying "stamps" or the accountant trying to sort out an irregularity, but the CEO is definitely not interested in such detailed information, the CEO wants to know stuff like "What's the cost?", "What's the revenue?", "did our latest initiative reduce costs?".

Data Warehousing

Yet another problem might be that the organisation is, internally, in disagreement about which data is correct. For example, the sales department might have one view of its costs, while the finance department has another view of that cost. In such cases the organisation can spend unlimited time discussing who's got the correct view of the data.

It is partly the purpose of Datawarehousing to bridge such problems. It is important to note that in Datawarehousing the source data systems are considered as given: It is not the task of the datawarehousing consultant to figure out, that since the problem is that the CRM system identifies a person by initials, while the Employee-Time-Management system identifies a person by full name while the ERP system identifies a person by social security number; and since a person can change his name: things do not work and the organization should invest in and implement one or two new systems to handle CRM, ERP etc. in a more consistent manner.

Rather, the datawarehousing consultant is charged with making the data appear consistent, integrated and consolidated despite the problems in the underlying source systems. The datawarehousing consultant achieves this by employing different datawarehousing techniques, creating one or more new data repositories (i.e. the datawarehouse) whose data model(s) support the needed reporting and analysis.


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Business Intelligence

    In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence. He defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."

  • A road-map on Testing in Data Warehouse

    Testing in data warehouse projects are till date a less explored area. However, if not done properly, this can be a major reason for data warehousing project failures - especially in user acceptance phase. Given here a mind-map that will help a...

  • What is Data Warehousing?

    A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.

  • Enterprise Data Warehouse Data Reconciliation Methodology

    An enterprise data warehouse often fetches records from several disparate systems and store them centrally in an enterprise-wide warehouse. But what is the guarantee that the...

  • OLTP and OLAP

    OLTP stands for On-line Transaction Processing and OLAP stands for On-line Analytical Processing. We may have heard these definition many times. But do we really understand the difference between them. Let's explore further on these two kinds of...

  • Why people Hate Project Managers – A must read for would-be managers

    "Project Managers" are inevitable. Love them or hate them, but if you are in a project, you have to accept them. They are Omnipresent in any project. They intervene too much on technical things without much knowledge. They create unrealistic...

  • What is a data warehouse - A 101 guide to modern data warehousing

    This article discusses data warehousing from a holistic standpoint and quickly touches upon all the relevant concepts that one needs to know. Start here if you do not know where to start from.

  • Data Retention and Purging in a Data Warehouse

    By the typical definition of data warehouse, we expect the data warehouse to be non-volatile in nature for its entire design life time. As long as it remain operation, all data loaded in the data warehouse should remain there for the purpose of...

  • Top 5 Challenges of Data Warehousing

    Data warehousing projects are one of its kinds. All data warehousing projects do not pose same challenges and not all of them are complex but they are always different. This article illustrates the top 5 challenges that often plague modern data...

  • Top 10 things you must know before designing a data warehouse

    This paper outlines some of the most important (and equally neglected) things that one must consider before and during the design phase of a data warehouse. In our experience, we have seen data warehouse designers often miss out on these items...