SQL is a language for accessing and manipulating database standardized by ANSI. To be successful with database-centric applications (which includes most of the applications Data Warehousing domain), one must be strong enough in SQL. In this article, we will learn more about SQL by breaking the subject in the form of several question-answer sessions commonly asked in Interviews.
In this multi part tutorial we will learn the basics of dimensional modeling and we will see how to use this modeling technique in real life scenario. At the end of this tutorial you will become a confident dimensional data modeler.
In our earlier article we have discussed the need of storing historical information in dimensional tables. We have also learnt about various types of changing dimensions. In this article we will pick "slowly changing dimension" only and learn in detail about various types of slowly changing dimensions and how to design them.
Handling rapidly changing dimensions are tricky due to various performance implications. This article attempts to provide some methodologies on handling rapidly changing dimensions in a data warehouse.
A model is an abstraction of some aspect of a problem. A data model is a model that describes how data is represented and accessed, usually for a database. The construction of a data model is one of the most difficult tasks of software engineering and is often pivotal to the success or failure of a project.
Performance of a data warehouse is as important as the correctness of data in the data warehouse because unacceptable performance may render the data warehouse as useless. There is this increasing awareness about the fact that it’s much effective to build the performance from the beginning rather than to tune the performance at the end. In this article we have a few points that you may consider for optimally building the data model of a data warehouse. We will only consider performance considerations for dimensional modeling.
This article is the continuation of the article "Top 50 DWBI Interview Questions with Answers"
Continuation to our collection of Data Warehouse Conceptual Questions.
An enterprise data warehouse often fetches records from several disparate systems and store them centrally in an enterprise-wide warehouse. But what is the guarantee that the quality of data will not degrade in the process of centralization?
Sure enough, you have heard the term, "Big Data" many times before. There is no dearth of information in the Internet and printed medium about this. But guess what, this term still remains vaguely defined and poorly understood. This essay is our effort to describe big data in simple technical language, stripping-off all the marketing lingo and sales jargons. Shall we begin?
In my previous article – “Fools guide to Big Data” – we have discussed about the origin of Bigdata and the need of big data analytics. We have also noted that Big Data is data that is too large, complex and dynamic for any conventional data tools (such as RDBMS) to compute, store, manage and analyze within a practical timeframe. In the next few articles, we will familiarize ourselves with the tools and techniques for processing Bigdata.
"We have a simple data warehouse that takes data from a few RDBMS source systems and load the data in dimension and fact tables of the warehouse. I wonder why we have a staging layer in between. Why can’t we process everything on the fly and push them in the data warehouse?"
In this "DWBI Concepts' Original article", we put Oracle database and Informatica PowerCentre to lock horns to prove which one of them handles data SORTing operation faster. This article gives a crucial insight to application developer in order to take informed decision regarding performance tuning.
This article is a comprehensive guide to the techniques and methodologies available for tuning the performance of Informatica PowerCentre ETL tool. It's a one stop performance tuning manual for Informatica.
To me, look-up is the single most important (and difficult) transformation that we need to consider while tuning performance of Informatica jobs. The choice and use of correct type of Look-Up can dramatically vary the session performance in Informatica. So let’s delve deeper into this.
Joiner transformation allows you to join two heterogeneous sources in the Informatica mapping. You can use this transformation to perform INNER and OUTER joins between two input streams. For performance reasons, I recommend you ONLY use JOINER transformation if any of the following condition is true –
Similar to what we discussed regarding the Performance Tuning of Joiner Transformation, the basic rule for tuning aggregator is to avoid aggregator transformation altogether unless...