History Preserving in Dimensional Modeling
In our earlier article we have seen how to design a simple dimensional data model for a point-of-sale system (as an example we took the case of McDonald's fast-food shop). In this article we will begin with the same model and we will see how we may enhance the model to store historical changes in the attributes of dimension table.
Nothing Lasts Forever
One of the important objectives while doing data modeling is, to develop a model which can capture the states of the system with respect to time. You know, nothing lasts forever! Product prices change over time, people change their addresses, marital status, employers and even their names. If you are doing data modeling for a data warehouse — where we are particularly interested about historical analysis - it is crucial that we develop some method of capturing these changes in our data model. As an example, let's say we store the price of products in the "Food" dimension table that we created earlier and we want to be able to capture the historical changes in "Food" price. In this article we will see what change we need to do in our data model to be able to do this.
Note: The simple "Food" dimension we created earlier did not have any "Price" information. But to illustrate the point of this article, we will add a "price" column to our "Food" dimension table. So henceforth our "Food" dimension table will look like this:
Food
KEY | NAME | TYPE_KEY | PRICE |
1 | Chicken Burger | 1 | 3.70 |
2 | Veggie Burger | 1 | 3.20 |
3 | French Fries | 2 | 2.00 |
4 | Twister Fries | 2 | 2.20 |
In case if you have not read my previous article and wondering what "TYPE_KEY" means, this is a foreign key coming from one other table that contains the type of the food e.g., Burger, Fries etc. Also notice, above table only tells us the price of the food as of current point in time. It does not tell us what the price was, let's say 6 months ago. If the price of Veggie Burger changes from $3.20 to $3.25 tomorrow, the new price will be updated in the table and then we will have no way to know what was the earlier price. So our objective is to change the above table structure in such a way so that we can store all the historical and future prices of the foods.
Types of Changing Dimensions
There are a few different ways to store the historical changes of values in data model. And any particular way you want to adopt will depend on the <b>type</b> of changing dimension. For example, some dimensions can change quite rapidly, some dimensions do not change at all but most dimensions change very slowly. That is why we can differentiate dimensions in these 3 types depicted below.
Unchanging Dimension
There are some dimensions that do not change at all. For example, let's say you have created a dimension table called "Gender". Below are the structure and data of this dimension table:
Gender
ID | VALUE |
1 | MALE |
2 | FEMALE |
The "Value" column in the above dimension is the attribute of this table that won't normally change. This is an unchanging dimension - "male" will be always called "male" and "female" will be always called "female". Off course, for some crazy reason, one may wish to change the texts "Male"/"Female" to something else e.g. "man"/"woman". But that's really not a change that we should be concerned about as such changes do not alter the "meaning" of the attribute (the words man/male still mean the same thing). So if some changes need to be done, we can simply update the "Value" column in dimension table. For all practical intent and purpose, this dimension remains as an "Unchanging dimension".
Slowly Changing Dimension
Here comes the most popular dimension - "slowly changing dimension". These are the dimensions where one or more attributes can change slowly with respect to time. Look at the "food" dimension from our earlier example. "Price" is one such attribute which is variable in this dimension. But "price" of french fries or burgers do not change very often, may be they change once in a season. This is an example of slowly changing dimension.
Let me give you one more example. Let's say you have created a dimension table on employees. And in the "employee" dimension you have a column called "Marital_Status". This can definitely change (from unmarried to married for example) with respect to time. But again, like the previous example, this is a slowly changing attribute. Doesn't change so often.
Later in the next article, we will see how to make necessary changes in our dimension table design to store history for such slowly changing dimensions.
Rapidly Changing Dimensions
If you design a dimension table that has a rapidly changing attribute, then your dimension table will become rapidly changing dimension.
As for example, let's say you have a "Subscriber" dimension where you store the details of all the subscribers to a particular pre-paid mobile service plan. You have a "status" column in the "Subscriber" dimension table which can have several different values based on the current account balance of the subscriber. For example, if your balance is less than $0.1, the status becomes "No Outgoing call". If your balance is less than $5, the status becomes "Restricted Call Service". If your balance is less than $10, the status becomes "No Long Distance Call" and if the balance is greater than $10 then status becomes "Full Service", etc. Every month, the status of any subscriber keeps on changing multiple times based on his or her account balance thereby making the "Subscribers" dimension one rapidly changing dimension.
One must remember the way we design a rapidly changing dimension is often quite different from the way we design a slowly changing dimension. In the next article however, we will only look into designing of slowly changing dimension.