A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.

IMPORTANT: Please read 101 Guide to Data Warehousing for a more elaborate discussion on this subject

This definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.

Data warehousing arises in an organisation's need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation.

The practical reality of most organisations is that their data infrastructure is made up by a collection of heterogeneous systems. For example, an organisation might have one system that handles customer-relationship, a system that handles employees, systems that handles sales data or production data, yet another system for finance and budgeting data etc. In practice, these systems are often poorly or not at all integrated and simple questions like: "How much time did sales person A spend on customer C, how much did we sell to Customer C, was customer C happy with the provided service, Did Customer C pay his bills" can be very hard to answer, even though the information is available "somewhere" in the different data systems.

Another problem is that ERP systems are designed to support relevant operations. For example, a finance system might keep track of every single stamp bought; When it was ordered, when it was delivered, when it was paid and the system might offer accounting principles (like double bookkeeping) that further complicates the data model. Such information is great for the person in charge of buying "stamps" or the accountant trying to sort out an irregularity, but the CEO is definitely not interested in such detailed information, the CEO wants to know stuff like "What's the cost?", "What's the revenue?", "did our latest initiative reduce costs?".

Data Warehousing

Yet another problem might be that the organisation is, internally, in disagreement about which data is correct. For example, the sales department might have one view of its costs, while the finance department has another view of that cost. In such cases the organisation can spend unlimited time discussing who's got the correct view of the data.

It is partly the purpose of Datawarehousing to bridge such problems. It is important to note that in Datawarehousing the source data systems are considered as given: It is not the task of the datawarehousing consultant to figure out, that since the problem is that the CRM system identifies a person by initials, while the Employee-Time-Management system identifies a person by full name while the ERP system identifies a person by social security number; and since a person can change his name: things do not work and the organization should invest in and implement one or two new systems to handle CRM, ERP etc. in a more consistent manner.

Rather, the datawarehousing consultant is charged with making the data appear consistent, integrated and consolidated despite the problems in the underlying source systems. The datawarehousing consultant achieves this by employing different datawarehousing techniques, creating one or more new data repositories (i.e. the datawarehouse) whose data model(s) support the needed reporting and analysis.

Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.