Dimensional Model

Top 50 DWBI Interview Questions with Answers - Part 3

Saurav Mitra

Continuation to our collection of Data Warehouse Conceptual Questions.

What is a factless fact?

A fact table that does not contain any measure is called a factless fact. This table will only contain keys from different dimension tables. This is often used to resolve a many-to-many cardinality issue.

Explanatory Note:

Consider a school, where a single student may be taught by many teachers and a single teacher may have many students. To model this situation in dimensional model, one might introduce a fact-less-fact table joining teacher and student keys. Such a fact table will then be able to answer queries like,

  1. Who are the students taught by a specific teacher.
  2. Which teacher teaches maximum students.
  3. Which student has highest number of teachers.etc. etc.

What is a coverage fact?

A factless fact table can only answer 'optimistic' queries (positive query) but can not answer a negative query. Again consider the illustration in the above example. A fact-less fact containing the keys of tutors and students can not answer a query like below,

  1. Which teacher did not teach any student?
  2. Which student was not taught by any teacher?

Why not? Because fact-less fact table only stores the positive scenarios (like student being taught by a tutor) but if there is a student who is not being taught by a teacher, then that student's key does not appear in this table, thereby reducing the coverage of the table.

Coverage fact table attempts to answer this - often by adding an extra flag column. Flag = 0 indicates a negative condition and flag = 1 indicates a positive condition. To understand this better, let's consider a class where there are 100 students and 5 teachers. So coverage fact table will ideally store 100 X 5 = 500 records (all combinations) and if a certain teacher is not teaching a certain student, the corresponding flag for that record will be 0.

What are incident and snapshot facts

A fact table stores some kind of measurements. Usually these measurements are stored (or captured) against a specific time and these measurements vary with respect to time. Now it might so happen that the business might not able to capture all of its measures always for every point in time. Then those unavailable measurements can be kept empty (Null) or can be filled up with the last available measurements. The first case is the example of incident fact and the second one is the example of snapshot fact.

What is aggregation and what is the benefit of aggregation?

A data warehouse usually captures data with same degree of details as available in source. The "degree of detail" is termed as granularity. But all reporting requirements from that data warehouse do not need the same degree of details.

To understand this, let's consider an example from retail business. A certain retail chain has 500 shops across Europe. All the shops record detail level transactions regarding the products they sale and those data are captured in a data warehouse.

Each shop manager can access the data warehouse and they can see which products are sold by whom and in what quantity on any given date. Thus the data warehouse helps the shop managers with the detail level data that can be used for inventory management, trend prediction etc.

Now think about the CEO of that retail chain. He does not really care about which certain sales girl in London sold the highest number of chopsticks or which shop is the best seller of 'brown breads'. All he is interested is, perhaps to check the percentage increase of his revenue margin across Europe. Or may be year to year sales growth on eastern Europe. Such data is aggregated in nature. Because Sales of goods in East Europe is derived by summing up the individual sales data from each shop in East Europe.

Therefore, to support different levels of data warehouse users, data aggregation is needed.

What is slicing-dicing?

Slicing means showing the slice of a data, given a certain set of dimension (e.g. Product) and value (e.g. Brown Bread) and measures (e.g. sales).

Dicing means viewing the slice with respect to different dimensions and in different level of aggregations.

Slicing and dicing operations are part of pivoting.

What is drill-through?

Drill through is the process of going to the detail level data from summary data.

Consider the above example on retail shops. If the CEO finds out that sales in East Europe has declined this year compared to last year, he then might want to know the root cause of the decrease. For this, he may start drilling through his report to more detail level and eventually find out that even though individual shop sales has actually increased, the overall sales figure has decreased because a certain shop in Turkey has stopped operating the business. The detail level of data, which CEO was not much interested on earlier, has this time helped him to pinpoint the root cause of decline sales. And the method he has followed to obtain the details from the aggregated data is called drill through.