Dimensional Modeling Schema
Now that we know the basic approach to do dimensional modeling from our earlier article, let us spend some time to understand various possible schema in dimensional modeling.
Requirement of different design schema
In Dimensional modeling, we can create different schema to suit our requirements. We need various schema to accomplish several things like accommodating hierarchies of a dimension or maintaining change histories of information etc. In this article we will discuss about 3 different schema, namely - Star, Snowflake and Conformed and we will also discuss how hierarchical information are modeled in these schemata. We will reserve the discussion on maintaining change histories for our next article.
Storing hierarchical information in dimension tables
From our previous article, we already know what is a dimension. Simply put, a dimension is something that qualifies a measure (number). For example, if I say, "McDonalds sell 5000" - that won't make any sense. But if I say, "McDonalds sell 5000 burgers per month" - then that would make perfect sense. Here, "burger" and "month" are the members of dimensions and they are qualifying the number 5000 in this sentence.
It is important to notice that "burger" and "month" are not dimension themselves - they are just the members of the dimensions "food" and "time" respectively. "Burger" is just one of many different "food" that McDonalds sell and "month" is just one of different units by which time is measured. Typically a dimension will have several members and those members will be stored in separate rows in the dimension table. So the "food" dimension table of McDonalds will have one row for burger, one row for fries, one row for "drinks" etc. Similarly, "time" dimension may contain 12 different months as the members of that dimension.
Often we may find that there are hierarchical relations among the members of a dimension. That is certain members of the dimension can be grouped under one group whereas other members can be grouped into a separate group. Consider this - french fries and twister fries both are "fries" and hence can be grouped under the same group "fries". Similarly chicken burger and fish burger both can be grouped as "burger".
This type of hierarchical relations can be stored in the model by following two different approaches. We can either store them in the same "food" dimension table (star schema approach) or we can create a separate dimension table in addition to "food" dimension - just to store the type of the foods (snowflake schema approach).
STAR SCHEMA DESIGN
Star schema is the most simple kind of schema where one fact table is present in the center of the schema surrounded by multiple dimension tables.
In a star schema all the dimension tables are connected only with the fact table and no dimension table is connected with any other dimension table.
Benefit of Star Schema Design
Star schema is probably most popular schema in dimensional modeling because of its simplicity and flexibility. In a Star schema design, any information can be obtained just by traversing a single join, which means this type of schema will be ideal for information retrieval (faster query processing). Here, note that all the hierarchies (or levels) of the members of a dimension are stored in the single dimension table - that means, lets say if you wish to group (veggie burger and chicken burger) in "burger" category and (french fries and twister fries) in "fries" category, you have to store that category information in the same dimension table.
Star schema provides a de-normalized design
Storing Hierarchy in star schema
As depicted above, we will store hierarchical information in a flattened pattern in the single dimension table in star schema. So our food dimension table will look like this:
Food
KEY | NAME | TYPE |
1 | Chicken Burger | Burger |
2 | Veggie Burger | Burger |
3 | French Fries | Fries |
4 | Twister Fries | Fries |
SNOW-FLAKE SCHEMA DESIGN
Snow flake schema is just like star schema but the difference is, here one or more dimension tables are connected with other dimension table as well as with the central fact table. See the example of snowflake schema below.
Here we are storing the information in two dimension tables instead of one. We are storing the food type in one dimension ("type" table as shown below) and food in other dimension. This is a snowflake design.
Type
KEY | TYPE_NAME |
1 | Burger |
2 | Fries |
Food
KEY | TYPE_KEY | NAME |
1 | 1 | Chicken Burger |
2 | 1 | Veggie Burger |
3 | 2 | French Fries |
4 | 2 | Twister Fries |
If you are familiar with the concept of data normalization, you can understand that snow flaking actually increase the level of normalization in the data. This has obvious disadvantage in terms of information retrieval since we need to read more tables (and traverse more SQL joins) in order to get the same information. Example, if you wish to find out all the food, food type sold from store 1, the SQL queries from star and snowflake schemata will be like below:
SQL Query For Star Schema
SELECT DISTINCT
f.name, f.type
FROM food f, sales_fact t
WHERE f.key = t.food_key
AND t.store_key = 1
SQL Query For SnowFlake Schema
SELECT DISTINCT
f.name, tp.type_name
FROM food f, type tp, sales_fact t
WHERE f.key = t.food_key
AND f.type_key = tp.key
AND t.store_key = 1
As you can see in this example, compared to star schema, snowflake schema requires one more join (to connect one more table) to retrieve the same information. This is why snowflake schema is not good performance wise.
Then why do we use snowflake schema? Let me give a quick and short answer to that. I won't explain it in detail right now but I will leave it to you for your comprehension.
The reason we do it is, suppose we have another fact table with granularity store, food type and day. This fact will use the key of "type" dimension table instead of "food" dimension table. Unless you have this dimension table in your schema, you won't get the "type" key. This is the reason we need to snowflake the "food" dimension to "type" dimension.
In our next article we will talk about preserving history in dimension tables (slowly or rapidly changing dimensions etc).