Now that we know the basic approach to do dimensional modeling from our earlier article, let us spend some time to understand various possible schema in dimensional modeling.

Requirement of different design schema

In Dimensional modeling, we can create different schema to suit our requirements. We need various schema to accomplish several things like accommodating hierarchies of a dimension or maintaining change histories of information etc. In this article we will discuss about 3 different schema, namely - Star, Snowflake and Conformed and we will also discuss how hierarchical information are modeled in these schemata. We will reserve the discussion on maintaining change histories for our next article.

Storing hierarchical information in dimension tables

From our previous article, we already know what is a dimension. Simply put, a dimension is something that qualifies a measure (number). For example, if I say, "McDonalds sell 5000" - that won't make any sense. But if I say, "McDonalds sell 5000 burgers per month" - then that would make perfect sense. Here, "burger" and "month" are the members of dimensions and they are qualifying the number 5000 in this sentence.

It is important to notice that "burger" and "month" are not dimension themselves - they are just the members of the dimensions "food" and "time" respectively. "Burger" is just one of many different "food" that McDonalds sell and "month" is just one of different units by which time is measured. Typically a dimension will have several members and those members will be stored in separate rows in the dimension table. So the "food" dimension table of McDonalds will have one row for burger, one row for fries, one row for "drinks" etc. Similarly, "time" dimension may contain 12 different months as the members of that dimension.

Often we may find that there are hierarchical relations among the members of a dimension. That is certain members of the dimension can be grouped under one group whereas other members can be grouped into a separate group. Consider this - french fries and twister fries both are "fries" and hence can be grouped under the same group "fries". Similarly chicken burger and fish burger both can be grouped as "burger".

French Fries Twister Fries
French Fries Twister Fries

This type of hierarchical relations can be stored in the model by following two different approaches. We can either store them in the same "food" dimension table (star schema approach) or we can create a separate dimension table in addition to "food" dimension - just to store the type of the foods (snowflake schema approach).

STAR SCHEMA DESIGN

Star schema is the most simple kind of schema where one fact table is present in the center of the schema surrounded by multiple dimension tables.

In a star schema all the dimension tables are connected only with the fact table and no dimension table is connected with any other dimension table.

star schema design

Benefit of Star Schema Design

Star schema is probably most popular schema in dimensional modeling because of its simplicity and flexibility. In a Star schema design, any information can be obtained just by traversing a single join, which means this type of schema will be ideal for information retrieval (faster query processing). Here, note that all the hierarchies (or levels) of the members of a dimension are stored in the single dimension table - that means, lets say if you wish to group (veggie burger and chicken burger) in "burger" category and (french fries and twister fries) in "fries" category, you have to store that category information in the same dimension table.

Star schema provides a de-normalized design

Storing Hierarchy in star schema

As depicted above, we will store hierarchical information in a flattened pattern in the single dimension table in star schema. So our food dimension table will look like this: Food

KEYNAMETYPE
1Chicken BurgerBurger
2Veggie BurgerBurger
3French FriesFries
4Twister FriesFries

SNOW-FLAKE SCHEMA DESIGN

Snow flake schema is just like star schema but the difference is, here one or more dimension tables are connected with other dimension table as well as with the central fact table. See the example of snowflake schema below.

Here we are storing the information in two dimension tables instead of one. We are storing the food type in one dimension ("type" table as shown below) and food in other dimension. This is a snowflake design. Type

KEYTYPE_NAME
1Burger
2Fries
Food
KEYTYPE_KEYNAME
11Chicken Burger
21Veggie Burger
32French Fries
42Twister Fries

If you are familiar with the concept of data normalization, you can understand that snow flaking actually increase the level of normalization in the data. This has obvious disadvantage in terms of information retrieval since we need to read more tables (and traverse more SQL joins) in order to get the same information. Example, if you wish to find out all the food, food type sold from store 1, the SQL queries from star and snowflake schemata will be like below:

Snow Flake Schema

SQL Query For Star Schema

SELECT DISTINCT f.name, f.type
FROM food f, sales_fact t
WHERE f.key = t.food_key
AND t.store_key = 1

SQL Query For SnowFlake Schema

SELECT DISTINCT f.name, tp.type_name
FROM food f, type tp, sales_fact t
WHERE f.key = t.food_key
AND f.type_key = tp.key
AND t.store_key = 1

As you can see in this example, compared to star schema, snowflake schema requires one more join (to connect one more table) to retrieve the same information. This is why snowflake schema is not good performance wise.

Then why do we use snowflake schema? Let me give a quick and short answer to that. I won't explain it in detail right now but I will leave it to you for your comprehension. The reason we do it is, suppose we have another fact table with granularity store, food type and day. This fact will use the key of "type" dimension table instead of "food" dimension table. Unless you have this dimension table in your schema, you won't get the "type" key. This is the reason we need to snowflake the "food" dimension to "type" dimension.

In our next article we will talk about preserving history in dimension tables (slowly or rapidly changing dimensions etc.).


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • What is Data Warehousing?

    A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.

  • Top 10 things to avoid in DWBI project management

    Watch this space...

  • What is a data warehouse - A 101 guide to modern data warehousing

    This article discusses data warehousing from a holistic standpoint and quickly touches upon all the relevant concepts that one needs to know. Start here if you do not know where to start from.

  • Enterprise Data Warehouse Data Reconciliation Methodology

    An enterprise data warehouse often fetches records from several disparate systems and store them centrally in an enterprise-wide warehouse. But what is the guarantee that the...

  • Decision Support System (DSS)

    Decision Support System (DSS) is a class of information systems (including but not limited to computerized systems) that support business and organizational decision-making activities. A properly designed DSS is an interactive software-based system...

  • Why people Hate Project Managers – A must read for would-be managers

    "Project Managers" are inevitable. Love them or hate them, but if you are in a project, you have to accept them. They are Omnipresent in any project. They intervene too much on technical things without much knowledge. They create unrealistic...

  • Data Retention and Purging in a Data Warehouse

    By the typical definition of data warehouse, we expect the data warehouse to be non-volatile in nature for its entire design life time. As long as it remain operation, all data loaded in the data warehouse should remain there for the purpose of...

  • Business Intelligence

    In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence. He defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."

  • Top 10 things you must know before designing a data warehouse

    This paper outlines some of the most important (and equally neglected) things that one must consider before and during the design phase of a data warehouse. In our experience, we have seen data warehouse designers often miss out on these items...

  • A road-map on Testing in Data Warehouse

    Testing in data warehouse projects are till date a less explored area. However, if not done properly, this can be a major reason for data warehousing project failures - especially in user acceptance phase. Given here a mind-map that will help a...